Updating uuid to 0.8

Fixed #664 (#667 )
Removed references to u8 and old documentation
2026-01-12 12:02:54 +00:00 · 2019-10-24 10:31:16 +09:00 · 2019-10-22 09:34:10 +09:00 · 2019-10-21 10:50:53 +09:00 · 2019-10-07 10:05:12 +09:00 · 2019-10-04 17:07:49 +09:00
8 changed files with 150 additions and 49 deletions
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -34,7 +34,7 @@ itertools = "0.8"
 levenshtein_automata = {version="0.1", features=["fst_automaton"]}
 notify = {version="4", optional=true}
 bit-set = "0.5"
-uuid = { version = "0.7.2", features = ["v4", "serde"] }
+uuid = { version = "0.8", features = ["v4", "serde"] }
 crossbeam = "0.7"
 futures = "0.1"
 futures-cpupool = "0.1"
--- a/README.md
+++ b/README.md
@@ -21,9 +21,9 @@
 [![Become a patron](https://c5.patreon.com/external/logo/become_a_patron_button.png)](https://www.patreon.com/fulmicoton)


-**Tantivy** is a **full text search engine library** written in rust.
+**Tantivy** is a **full text search engine library** written in Rust.

-It is closer to [Apache Lucene](https://lucene.apache.org/) than to [Elasticsearch](https://www.elastic.co/products/elasticsearch) and [Apache Solr](https://lucene.apache.org/solr/) in the sense it is not
+It is closer to [Apache Lucene](https://lucene.apache.org/) than to [Elasticsearch](https://www.elastic.co/products/elasticsearch) or [Apache Solr](https://lucene.apache.org/solr/) in the sense it is not
 an off-the-shelf search engine server, but rather a crate that can be used
 to build such a search engine.

@@ -31,7 +31,7 @@ Tantivy is, in fact, strongly inspired by Lucene's design.

 # Benchmark

-Tantivy is typically faster than Lucene, but the results will depend on 
+Tantivy is typically faster than Lucene, but the results depend on 
 the nature of the queries in your workload.

 The following [benchmark](https://tantivy-search.github.io/bench/) break downs 
@@ -40,19 +40,19 @@ performance for different type of queries / collection.
 # Features

 - Full-text search
- Configurable tokenizer. (stemming available for 17 latin languages. Third party support for Chinese ([tantivy-jieba](https://crates.io/crates/tantivy-jieba) and [cang-jie](https://crates.io/crates/cang-jie)) and [Japanese](https://crates.io/crates/tantivy-tokenizer-tiny-segmenter)
+- Configurable tokenizer (stemming available for 17 Latin languages with third party support for Chinese ([tantivy-jieba](https://crates.io/crates/tantivy-jieba) and [cang-jie](https://crates.io/crates/cang-jie)) and [Japanese](https://crates.io/crates/tantivy-tokenizer-tiny-segmenter))
 - Fast (check out the :racehorse: :sparkles: [benchmark](https://tantivy-search.github.io/bench/) :sparkles: :racehorse:)
 - Tiny startup time (<10ms), perfect for command line tools
- BM25 scoring (the same as lucene)
- Natural query language `(michael AND jackson) OR "king of pop"`
- Phrase queries search (`"michael jackson"`)
+- BM25 scoring (the same as Lucene)
+- Natural query language (e.g. `(michael AND jackson) OR "king of pop"`)
+- Phrase queries search (e.g. `"michael jackson"`)
 - Incremental indexing
 - Multithreaded indexing (indexing English Wikipedia takes < 3 minutes on my desktop)
 - Mmap directory
- SIMD integer compression when the platform/CPU includes the SSE2 instruction set.
- Single valued and multivalued u64, i64 and f64 fast fields (equivalent of doc values in Lucene)
+- SIMD integer compression when the platform/CPU includes the SSE2 instruction set
+- Single valued and multivalued u64, i64, and f64 fast fields (equivalent of doc values in Lucene)
 - `&[u8]` fast fields
- Text, i64, u64, f64, dates and hierarchical facet fields
+- Text, i64, u64, f64, dates, and hierarchical facet fields
 - LZ4 compressed document store
 - Range queries
 - Faceted search
@@ -61,43 +61,42 @@ performance for different type of queries / collection.

 # Non-features

- Distributed search is out of the scope of tantivy. That being said, tantivy is meant as a
+- Distributed search is out of the scope of Tantivy. That being said, Tantivy is a
 library upon which one could build a distributed search. Serializable/mergeable collector state for instance, 
-are within the scope of tantivy.
+are within the scope of Tantivy.

 # Supported OS and compiler

-Tantivy works on stable rust (>= 1.27) and supports Linux, MacOS and Windows.
+Tantivy works on stable Rust (>= 1.27) and supports Linux, MacOS, and Windows.

 # Getting started

- [tantivy's simple search example](https://tantivy-search.github.io/examples/basic_search.html)
- [tantivy-cli and its tutorial](https://github.com/tantivy-search/tantivy-cli).
-`tantivy-cli` is an actual command line interface that makes it easy for you to create a search engine,
-index documents and search via the CLI or a small server with a REST API.
-It will walk you through getting a wikipedia search engine up and running in a few minutes.
- [reference doc for the last released version](https://docs.rs/tantivy/)
+- [Tantivy's simple search example](https://tantivy-search.github.io/examples/basic_search.html)
+- [tantivy-cli and its tutorial](https://github.com/tantivy-search/tantivy-cli) - `tantivy-cli` is an actual command line interface that makes it easy for you to create a search engine,
+index documents, and search via the CLI or a small server with a REST API.
+It walks you through getting a wikipedia search engine up and running in a few minutes.
+- [Reference doc for the last released version](https://docs.rs/tantivy/)

 # How can I support this project?

 There are many ways to support this project. 

- Use tantivy and tell us about your experience on [gitter](https://gitter.im/tantivy-search/tantivy) or by email (paul.masurel@gmail.com)
+- Use Tantivy and tell us about your experience on [Gitter](https://gitter.im/tantivy-search/tantivy) or by email (paul.masurel@gmail.com)
 - Report bugs
 - Write a blog post
 - Help with documentation by asking questions or submitting PRs
- Contribute code (you can join [our gitter](https://gitter.im/tantivy-search/tantivy) )
- Talk about tantivy around you
+- Contribute code (you can join [our Gitter](https://gitter.im/tantivy-search/tantivy))
+- Talk about Tantivy around you
 - Drop a word on on [![Say Thanks!](https://img.shields.io/badge/Say%20Thanks-!-1EAEDB.svg)](https://saythanks.io/to/fulmicoton) or even [![Become a patron](https://c5.patreon.com/external/logo/become_a_patron_button.png)](https://www.patreon.com/fulmicoton)

 # Contributing code

-We use the GitHub Pull Request workflow - reference a GitHub ticket and/or include a comprehensive commit message when opening a PR.
+We use the GitHub Pull Request workflow: reference a GitHub ticket and/or include a comprehensive commit message when opening a PR.

 ## Clone and build locally

-Tantivy compiles on stable rust but requires `Rust >= 1.27`.
-To check out and run tests, you can simply run :
+Tantivy compiles on stable Rust but requires `Rust >= 1.27`.
+To check out and run tests, you can simply run:

 ```bash
    git clone https://github.com/tantivy-search/tantivy.git
@@ -108,7 +107,7 @@ To check out and run tests, you can simply run :
 ## Run tests

 Some tests will not run with just `cargo test` because of `fail-rs`.
-To run the tests exhaustively, run `./run-tests.sh`
+To run the tests exhaustively, run `./run-tests.sh`.

 ## Debug

@@ -116,13 +115,13 @@ You might find it useful to step through the programme with a debugger.

 ### A failing test

-Make sure you haven't run `cargo clean` after the most recent `cargo test` or `cargo build` to guarantee that `target/` dir exists. Use this bash script to find the most name of the most recent debug build of tantivy and run it under rust-gdb.
+Make sure you haven't run `cargo clean` after the most recent `cargo test` or `cargo build` to guarantee that the `target/` directory exists. Use this bash script to find the name of the most recent debug build of Tantivy and run it under `rust-gdb`:

 ```bash
 find target/debug/ -maxdepth 1 -executable -type f -name "tantivy*" -printf '%TY-%Tm-%Td %TT %p\n' | sort -r | cut -d " " -f 3 | xargs -I RECENT_DBG_TANTIVY rust-gdb RECENT_DBG_TANTIVY
 ```

-Now that you are in rust-gdb, you can set breakpoints on lines and methods that match your source-code and run the debug executable with flags that you normally pass to `cargo test` to like this
+Now that you are in `rust-gdb`, you can set breakpoints on lines and methods that match your source code and run the debug executable with flags that you normally pass to `cargo test` like this:

 ```bash
 $gdb run --test-threads 1 --test $NAME_OF_TEST
@@ -130,7 +129,7 @@ $gdb run --test-threads 1 --test $NAME_OF_TEST

 ### An example

-By default, rustc compiles everything in the `examples/` dir in debug mode. This makes it easy for you to make examples to reproduce bugs.
+By default, `rustc` compiles everything in the `examples/` directory in debug mode. This makes it easy for you to make examples to reproduce bugs:

 ```bash
 rust-gdb target/debug/examples/$EXAMPLE_NAME
--- a/query-grammar/src/occur.rs
+++ b/query-grammar/src/occur.rs
@@ -2,7 +2,7 @@ use std::fmt;
 use std::fmt::Write;

 /// Defines whether a term in a query must be present,
-/// should be present or must not be present.
+/// should be present or must be not present.
 #[derive(Debug, Clone, Hash, Copy, Eq, PartialEq)]
 pub enum Occur {
    /// For a given document to be considered for scoring,
--- a/src/core/segment_id.rs
+++ b/src/core/segment_id.rs
@@ -76,7 +76,7 @@ impl SegmentId {
 }

 /// Error type used when parsing a `SegmentId` from a string fails.
-pub struct SegmentIdParseError(uuid::parser::ParseError);
+pub struct SegmentIdParseError(uuid::Error);

 impl Error for SegmentIdParseError {}

--- a/src/lib.rs
+++ b/src/lib.rs
@@ -212,15 +212,13 @@ pub type Score = f32;
 pub type SegmentLocalId = u32;

 impl DocAddress {
-    /// Return the segment ordinal.
-    /// The segment ordinal is an id identifying the segment
-    /// hosting the document. It is only meaningful, in the context
-    /// of a searcher.
+    /// Return the segment ordinal id that identifies the segment
+    /// hosting the document in the `Searcher` it is called from.
    pub fn segment_ord(self) -> SegmentLocalId {
        self.0
    }

-    /// Return the segment local `DocId`
+    /// Return the segment-local `DocId`
    pub fn doc(self) -> DocId {
        self.1
    }
@@ -229,11 +227,11 @@ impl DocAddress {
 /// `DocAddress` contains all the necessary information
 /// to identify a document given a `Searcher` object.
 ///
-/// It consists in an id identifying its segment, and
-/// its segment-local `DocId`.
+/// It consists of an id identifying its segment, and
+/// a segment-local `DocId`.
 ///
 /// The id used for the segment is actually an ordinal
-/// in the list of segment hold by a `Searcher`.
+/// in the list of `Segment`s held by a `Searcher`.
 #[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]
 pub struct DocAddress(pub SegmentLocalId, pub DocId);

--- a/src/query/boolean_query/boolean_query.rs
+++ b/src/query/boolean_query/boolean_query.rs
@@ -9,7 +9,8 @@ use crate::Result;
 use crate::Searcher;
 use std::collections::BTreeSet;

-/// The boolean query combines a set of queries
+/// The boolean query returns a set of documents
+/// that matches the Boolean combination of constituent subqueries.
 ///
 /// The documents matched by the boolean query are
 /// those which
@@ -19,6 +20,113 @@ use std::collections::BTreeSet;
 /// `MustNot` occurence.
 /// * match at least one of the subqueries that is not
 /// a `MustNot` occurence.
+///
+///
+/// You can combine other query types and their `Occur`ances into one `BooleanQuery`
+///
+/// ```rust
+///use tantivy::collector::Count;
+///use tantivy::doc;
+///use tantivy::query::{BooleanQuery, Occur, PhraseQuery, Query, TermQuery};
+///use tantivy::schema::{IndexRecordOption, Schema, TEXT};
+///use tantivy::Term;
+///use tantivy::{Index, Result};
+///
+///fn main() -> Result<()> {
+///    let mut schema_builder = Schema::builder();
+///    let title = schema_builder.add_text_field("title", TEXT);
+///    let body = schema_builder.add_text_field("body", TEXT);
+///    let schema = schema_builder.build();
+///    let index = Index::create_in_ram(schema);
+///    {
+///        let mut index_writer = index.writer(3_000_000)?;
+///        index_writer.add_document(doc!(
+///            title => "The Name of the Wind",
+///        ));
+///        index_writer.add_document(doc!(
+///            title => "The Diary of Muadib",
+///        ));
+///        index_writer.add_document(doc!(
+///            title => "A Dairy Cow",
+///            body => "hidden",
+///        ));
+///        index_writer.add_document(doc!(
+///            title => "A Dairy Cow",
+///            body => "found",
+///        ));
+///        index_writer.add_document(doc!(
+///            title => "The Diary of a Young Girl",
+///        ));
+///        index_writer.commit().unwrap();
+///    }
+///
+///    let reader = index.reader()?;
+///    let searcher = reader.searcher();
+///
+///    // Make TermQuery's for "girl" and "diary" in the title
+///    let girl_term_query: Box<dyn Query> = Box::new(TermQuery::new(
+///        Term::from_field_text(title, "girl"),
+///        IndexRecordOption::Basic,
+///    ));
+///    let diary_term_query: Box<dyn Query> = Box::new(TermQuery::new(
+///        Term::from_field_text(title, "diary"),
+///        IndexRecordOption::Basic,
+///    ));
+///    // A TermQuery with "found" in the body
+///    let body_term_query: Box<dyn Query> = Box::new(TermQuery::new(
+///        Term::from_field_text(body, "found"),
+///        IndexRecordOption::Basic,
+///    ));
+///    // TermQuery "diary" must and "girl" must not be present
+///    let queries_with_occurs1 = vec![
+///        (Occur::Must, diary_term_query.box_clone()),
+///        (Occur::MustNot, girl_term_query),
+///    ];
+///    // Make a BooleanQuery equivalent to
+///    // title:+diary title:-girl
+///    let diary_must_and_girl_mustnot = BooleanQuery::from(queries_with_occurs1);
+///    let count1 = searcher.search(&diary_must_and_girl_mustnot, &Count)?;
+///    assert_eq!(count1, 1);
+///
+///    // TermQuery for "cow" in the title
+///    let cow_term_query: Box<dyn Query> = Box::new(TermQuery::new(
+///        Term::from_field_text(title, "cow"),
+///        IndexRecordOption::Basic,
+///    ));
+///    // "title:diary OR title:cow"
+///    let title_diary_or_cow = BooleanQuery::from(vec![
+///        (Occur::Should, diary_term_query.box_clone()),
+///        (Occur::Should, cow_term_query),
+///    ]);
+///    let count2 = searcher.search(&title_diary_or_cow, &Count)?;
+///    assert_eq!(count2, 4);
+///
+///    // Make a `PhraseQuery` from a vector of `Term`s
+///    let phrase_query: Box<dyn Query> = Box::new(PhraseQuery::new(vec![
+///        Term::from_field_text(title, "dairy"),
+///        Term::from_field_text(title, "cow"),
+///    ]));
+///    // You can combine subqueries of different types into 1 BooleanQuery:
+///    // `TermQuery` and `PhraseQuery`
+///    // "title:diary OR "dairy cow"
+///    let term_of_phrase_query = BooleanQuery::from(vec![
+///        (Occur::Should, diary_term_query.box_clone()),
+///        (Occur::Should, phrase_query.box_clone()),
+///    ]);
+///    let count3 = searcher.search(&term_of_phrase_query, &Count)?;
+///    assert_eq!(count3, 4);
+///
+///    // You can nest one BooleanQuery inside another
+///    // body:found AND ("title:diary OR "dairy cow")
+///    let nested_query = BooleanQuery::from(vec![
+///        (Occur::Must, body_term_query),
+///        (Occur::Must, Box::new(term_of_phrase_query))
+///    ]);
+///    let count4 = searcher.search(&nested_query, &Count)?;
+///    assert_eq!(count4, 1);
+///    Ok(())
+///}
+/// ```
 #[derive(Debug)]
 pub struct BooleanQuery {
    subqueries: Vec<(Occur, Box<dyn Query>)>,
--- a/src/query/phrase_query/phrase_query.rs
+++ b/src/query/phrase_query/phrase_query.rs
@@ -40,7 +40,7 @@ impl PhraseQuery {
        PhraseQuery::new_with_offset(terms_with_offset)
    }

-    /// Creates a new `PhraseQuery` given a list of terms and there offsets.
+    /// Creates a new `PhraseQuery` given a list of terms and their offsets.
    ///
    /// Can be used to provide custom offset for each term.
    pub fn new_with_offset(mut terms: Vec<(usize, Term)>) -> PhraseQuery {
@@ -73,7 +73,7 @@ impl PhraseQuery {
            .collect::<Vec<Term>>()
    }

-    /// Returns the `PhraseWeight` for the given phrase query given a specific `searcher`.  
+    /// Returns the `PhraseWeight` for the given phrase query given a specific `searcher`.
    ///
    /// This function is the same as `.weight(...)` except it returns
    /// a specialized type `PhraseWeight` instead of a Boxed trait.
--- a/src/schema/field.rs
+++ b/src/schema/field.rs
@@ -3,12 +3,8 @@ use std::io;
 use std::io::Read;
 use std::io::Write;

-/// `Field` is actually a `u8` identifying a `Field`
-/// The schema is in charge of holding mapping between field names
-/// to `Field` objects.
-///
-/// Because the field id is a `u8`, tantivy can only have at most `255` fields.
-/// Value 255 is reserved.
+/// `Field` is represented by an unsigned 32-bit integer type
+/// The schema holds the mapping between field names and `Field` objects.
 #[derive(Copy, Clone, Debug, PartialEq, PartialOrd, Eq, Ord, Hash, Serialize, Deserialize)]
 pub struct Field(pub u32);
Author	SHA1	Message	Date
Paul Masurel	3b7932e389	Updating uuid to 0.8	2019-10-24 10:31:16 +09:00
petr-tik	1187a02a3e	Fixed #664 (#667 ) Removed references to u8 and old documentation	2019-10-22 09:34:10 +09:00
Andrew Banchich	f6c525b19e	Fix grammar / punctuation (#668 )	2019-10-21 10:50:53 +09:00
petr-tik	4a8f7712f3	Add a doctest to BooleanQuery (#630 ) * Add a doctest to BooleanQuery Closes #446 Mark a function that is only used in tests to be compiled for tests only Fix doc-comments in a couple of related files * Minor corrections remove whitespace, fix typos, add explicit dyn marker * WIP: BooleanQuery doc test Trying to nest several BooleanQueries together * Addressed old review rust 2018 edition + make function available to everyone * Box the previous query to resolve the type error * Rework wording in DocAdress document strings * Reworded and restructured the docstring	2019-10-07 10:05:12 +09:00
Paul Masurel	2f867aad17	Fix bench (#663 ) * fmt * Fixing bench compilation	2019-10-04 17:07:49 +09:00
Paul Masurel	5c6580eb15	fmt (#661 )	2019-10-04 12:10:01 +09:00