mirror of
https://github.com/quickwit-oss/tantivy.git
synced 2026-05-23 19:50:42 +00:00
@@ -13,11 +13,11 @@ It is strongly inspired by Lucene's design.
|
||||
# Features
|
||||
|
||||
- configurable indexing (optional term frequency and position indexing)
|
||||
- Tf-Idf scoring
|
||||
- tf-idf scoring
|
||||
- Basic query language
|
||||
- Incremental indexing
|
||||
- Multithreaded indexing (indexing en wikipedia takes 4mn on my desktop)
|
||||
- Mmap based
|
||||
- Multithreaded indexing (indexing English Wikipedia takes 4 minutes on my desktop)
|
||||
- mmap based
|
||||
- SIMD integer compression
|
||||
- u32 fast fields (equivalent of doc values in Lucene)
|
||||
- LZ4 compressed document store
|
||||
@@ -35,7 +35,7 @@ It will walk you through getting a wikipedia search engine up and running in a f
|
||||
|
||||
Tantivy has a git submodule called `simdcomp`.
|
||||
After cloning the repository, you will need to initialize and update
|
||||
the submodules. The project can then be build using `cargo`.
|
||||
the submodules. The project can then be built using `cargo`.
|
||||
|
||||
git clone git@github.com:fulmicoton/tantivy.git
|
||||
git submodule init
|
||||
|
||||
@@ -25,7 +25,7 @@ fn run_example(index_path: &Path) -> tantivy::Result<()> {
|
||||
|
||||
// # Defining the schema
|
||||
//
|
||||
// Tantivy index require to have a very strict schema.
|
||||
// The Tantivy index requires a very strict schema.
|
||||
// The schema declares which fields are in the index,
|
||||
// and for each field, its type and "the way it should
|
||||
// be indexed".
|
||||
@@ -47,7 +47,7 @@ fn run_example(index_path: &Path) -> tantivy::Result<()> {
|
||||
// `STORED` means that the field will also be saved
|
||||
// in a compressed, row-oriented key-value store.
|
||||
// This store is useful to reconstruct the
|
||||
// document that were selected during the search phase.
|
||||
// documents that were selected during the search phase.
|
||||
schema_builder.add_text_field("title", TEXT | STORED);
|
||||
|
||||
// Our first field is body.
|
||||
@@ -64,29 +64,29 @@ fn run_example(index_path: &Path) -> tantivy::Result<()> {
|
||||
// Let's create a brand new index.
|
||||
//
|
||||
// This will actually just save a meta.json
|
||||
// with our schema the directory.
|
||||
// with our schema in the directory.
|
||||
let index = try!(Index::create(index_path, schema.clone()));
|
||||
|
||||
|
||||
|
||||
// To insert document we need an index writer.
|
||||
// There shall be only one writer at a time.
|
||||
// Besides, this single `IndexWriter` is already
|
||||
// There must be only one writer at a time.
|
||||
// This single `IndexWriter` is already
|
||||
// multithreaded.
|
||||
//
|
||||
// Here we used a buffer of 1 GB. Using a bigger
|
||||
// Here we use a buffer of 1 GB. Using a bigger
|
||||
// heap for the indexer can increase its throughput.
|
||||
// This buffer will be split between the indexing
|
||||
// threads.
|
||||
let mut index_writer = try!(index.writer(1_000_000_000));
|
||||
|
||||
// Let's now index our documents!
|
||||
// Let's index our documents!
|
||||
// We first need a handle on the title and the body field.
|
||||
|
||||
|
||||
// ### Create a document "manually".
|
||||
//
|
||||
// We can create a document manually, by setting adding the fields
|
||||
// We can create a document manually, by setting the fields
|
||||
// one by one in a Document object.
|
||||
let title = schema.get_field("title").unwrap();
|
||||
let body = schema.get_field("body").unwrap();
|
||||
@@ -122,7 +122,7 @@ fn run_example(index_path: &Path) -> tantivy::Result<()> {
|
||||
// This is an example, so we will only index 3 documents
|
||||
// here. You can check out tantivy's tutorial to index
|
||||
// the English wikipedia. Tantivy's indexing is rather fast.
|
||||
// Indexing 5 millions articles of the English wikipedia takes
|
||||
// Indexing 5 million articles of the English wikipedia takes
|
||||
// around 4 minutes on my computer!
|
||||
|
||||
|
||||
@@ -131,56 +131,56 @@ fn run_example(index_path: &Path) -> tantivy::Result<()> {
|
||||
// At this point our documents are not searchable.
|
||||
//
|
||||
//
|
||||
// We need to call .commit() explicitely to force the
|
||||
// We need to call .commit() explicitly to force the
|
||||
// index_writer to finish processing the documents in the queue,
|
||||
// flush the current index on the disk, and advertise
|
||||
// flush the current index to the disk, and advertise
|
||||
// the existence of new documents.
|
||||
//
|
||||
// This call is blocking.
|
||||
try!(index_writer.commit());
|
||||
|
||||
// If `.commit()` returns correctly, then all of the
|
||||
// documents have been added before are guaranteed to be
|
||||
// documents that have been added are guaranteed to be
|
||||
// persistently indexed.
|
||||
//
|
||||
// In the scenario of a crash or a power failure,
|
||||
// tantivy behaves as if it rollbacked to its last
|
||||
// tantivy behaves as if has rolled back to its last
|
||||
// commit.
|
||||
|
||||
|
||||
// # Searching
|
||||
//
|
||||
// Let's search our index. This starts
|
||||
// Let's search our index. We start
|
||||
// by creating a searcher. There can be more
|
||||
// than one searcher at a time.
|
||||
//
|
||||
// You are supposed to acquire a search
|
||||
// You should create a searcher
|
||||
// every time you start a "search query".
|
||||
let searcher = index.searcher();
|
||||
|
||||
// The query parser can interpret human queries.
|
||||
// Here, if the user does not specify which
|
||||
// field he wants to search, tantivy will search
|
||||
// field they want to search, tantivy will search
|
||||
// in both title and body.
|
||||
let query_parser = QueryParser::new(index.schema(), vec!(title, body));
|
||||
|
||||
// QueryParser may fail if the query is not in the right
|
||||
// format. For user facing applications, this can be a problem.
|
||||
// A ticket has been filled regarding this problem.
|
||||
// A ticket has been opened regarding this problem.
|
||||
let query = try!(query_parser.parse_query("sea whale"));
|
||||
|
||||
|
||||
// A query defines a set of documents, as
|
||||
// well as the way they should be scored.
|
||||
//
|
||||
// Query created by the query parser are scoring according
|
||||
// A query created by the query parser is scored according
|
||||
// to a metric called Tf-Idf, and will consider
|
||||
// any document matching at least one of our terms.
|
||||
|
||||
// ### Collectors
|
||||
//
|
||||
// We are not interested in all of the document but
|
||||
// only in the top 10. Keep track of our top 10 best documents
|
||||
// We are not interested in all of the documents but
|
||||
// only in the top 10. Keeping track of our top 10 best documents
|
||||
// is the role of the TopCollector.
|
||||
|
||||
let mut top_collector = TopCollector::with_limit(10);
|
||||
@@ -188,14 +188,14 @@ fn run_example(index_path: &Path) -> tantivy::Result<()> {
|
||||
// We can now perform our query.
|
||||
try!(query.search(&searcher, &mut top_collector));
|
||||
|
||||
// Our top collector now contains are 10
|
||||
// Our top collector now contains the 10
|
||||
// most relevant doc ids...
|
||||
let doc_addresses = top_collector.docs();
|
||||
|
||||
// The actual documents still need to be
|
||||
// retrieved from Tantivy's store.
|
||||
//
|
||||
// Since body was not configured as stored,
|
||||
// Since the body field was not configured as stored,
|
||||
// the document returned will only contain
|
||||
// a title.
|
||||
|
||||
@@ -205,4 +205,4 @@ fn run_example(index_path: &Path) -> tantivy::Result<()> {
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
|
||||
@@ -5,13 +5,13 @@ use SegmentReader;
|
||||
use SegmentLocalId;
|
||||
|
||||
/// `CountCollector` collector only counts how many
|
||||
/// document are matching the query.
|
||||
/// documents match the query.
|
||||
pub struct CountCollector {
|
||||
count: usize,
|
||||
}
|
||||
|
||||
impl CountCollector {
|
||||
/// Returns the count of document that where
|
||||
/// Returns the count of documents that were
|
||||
/// collected.
|
||||
pub fn count(&self,) -> usize {
|
||||
self.count
|
||||
@@ -20,8 +20,7 @@ impl CountCollector {
|
||||
|
||||
impl Default for CountCollector {
|
||||
fn default() -> CountCollector {
|
||||
CountCollector {
|
||||
count: 0,
|
||||
CountCollector {count: 0,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -20,16 +20,16 @@ pub use self::chained_collector::chain;
|
||||
///
|
||||
///
|
||||
/// For instance,
|
||||
/// - keeping track of the top 10 best documents
|
||||
/// - computing a break down over a fast field
|
||||
/// - computing the number of documents matching the query
|
||||
///
|
||||
/// - keeping track of the top 10 best documents
|
||||
/// - computing a breakdown over a fast field
|
||||
/// - computing the number of documents matching the query
|
||||
///
|
||||
/// Queries are in charge of pushing the `DocSet` to the collector.
|
||||
///
|
||||
/// As they work on multiple segment, they first inform
|
||||
/// the collector of a change in segment and then
|
||||
/// call the collect method to push document to the collector.
|
||||
/// As they work on multiple segments, they first inform
|
||||
/// the collector of a change in a segment and then
|
||||
/// call the `collect` method to push the document to the collector.
|
||||
///
|
||||
/// Temporally, our collector will receive calls
|
||||
/// - `.set_segment(0, segment_reader_0)`
|
||||
@@ -45,10 +45,10 @@ pub use self::chained_collector::chain;
|
||||
///
|
||||
/// Segments are not guaranteed to be visited in any specific order.
|
||||
pub trait Collector {
|
||||
/// `set_segment` is called before starting enumerating
|
||||
/// `set_segment` is called before beginning to enumerate
|
||||
/// on this segment.
|
||||
fn set_segment(&mut self, segment_local_id: SegmentLocalId, segment: &SegmentReader) -> io::Result<()>;
|
||||
/// The query pushes scored document to the collector via this method.
|
||||
/// The query pushes the scored document to the collector via this method.
|
||||
fn collect(&mut self, scored_doc: ScoredDoc);
|
||||
}
|
||||
|
||||
@@ -57,7 +57,7 @@ impl<'a, C: Collector> Collector for &'a mut C {
|
||||
fn set_segment(&mut self, segment_local_id: SegmentLocalId, segment: &SegmentReader) -> io::Result<()> {
|
||||
(*self).set_segment(segment_local_id, segment)
|
||||
}
|
||||
/// The query pushes scored document to the collector via this method.
|
||||
/// The query pushes the scored document to the collector via this method.
|
||||
fn collect(&mut self, scored_doc: ScoredDoc) {
|
||||
(*self).collect(scored_doc);
|
||||
}
|
||||
@@ -120,10 +120,10 @@ pub mod tests {
|
||||
|
||||
|
||||
|
||||
/// Collects in order all of the fast field for all of the
|
||||
/// doc of the `DocSet`
|
||||
/// Collects in order all of the fast fields for all of the
|
||||
/// doc in the `DocSet`
|
||||
///
|
||||
/// This collector is essentially useful for tests.
|
||||
/// This collector is mainly useful for tests.
|
||||
pub struct FastFieldTestCollector {
|
||||
vals: Vec<u32>,
|
||||
field: Field,
|
||||
|
||||
@@ -5,7 +5,7 @@ use SegmentReader;
|
||||
use SegmentLocalId;
|
||||
|
||||
|
||||
/// Multicollector makes it possible to collect on more than one collector
|
||||
/// Multicollector makes it possible to collect on more than one collector.
|
||||
/// It should only be used for use cases where the Collector types is unknown
|
||||
/// at compile time.
|
||||
/// If the type of the collectors is known, you should prefer to use `ChainedCollector`.
|
||||
@@ -60,4 +60,4 @@ mod tests {
|
||||
assert_eq!(count_collector.count(), 3);
|
||||
assert!(top_collector.at_capacity());
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -50,7 +50,7 @@ pub struct TopCollector {
|
||||
|
||||
impl TopCollector {
|
||||
|
||||
/// Creates a top collector, with a number of document of "limit"
|
||||
/// Creates a top collector, with a number of documents equal to "limit".
|
||||
///
|
||||
/// # Panics
|
||||
/// The method panics if limit is 0
|
||||
@@ -65,9 +65,9 @@ impl TopCollector {
|
||||
}
|
||||
}
|
||||
|
||||
/// Returns the decreasingly sorted K-best documents.
|
||||
/// Returns K best documents sorted in decreasing order.
|
||||
///
|
||||
/// Calling this method will triggers the sort.
|
||||
/// Calling this method triggers the sort.
|
||||
/// The result of the sort is not cached.
|
||||
pub fn docs(&self) -> Vec<DocAddress> {
|
||||
self.score_docs()
|
||||
@@ -76,9 +76,9 @@ impl TopCollector {
|
||||
.collect()
|
||||
}
|
||||
|
||||
/// Returns the decreasingly sorted K-best ScoredDocument.
|
||||
/// Returns K best ScoredDocument sorted in decreasing order.
|
||||
///
|
||||
/// Calling this method will triggers the sort.
|
||||
/// Calling this method triggers the sort.
|
||||
/// The result of the sort is not cached.
|
||||
pub fn score_docs(&self) -> Vec<(Score, DocAddress)> {
|
||||
let mut scored_docs: Vec<GlobalScoredDoc> = self.heap
|
||||
@@ -90,9 +90,9 @@ impl TopCollector {
|
||||
.map(|GlobalScoredDoc(score, doc_address)| (score, doc_address))
|
||||
.collect()
|
||||
}
|
||||
|
||||
/// Return true iff at least K document have gone through
|
||||
/// the collector.
|
||||
|
||||
/// Return true iff at least K documents have gone through
|
||||
/// the collector.
|
||||
#[inline]
|
||||
pub fn at_capacity(&self, ) -> bool {
|
||||
self.heap.len() >= self.limit
|
||||
@@ -176,8 +176,8 @@ mod tests {
|
||||
.collect();
|
||||
assert_eq!(docs, vec!(7, 1, 5, 3));
|
||||
}
|
||||
|
||||
|
||||
|
||||
|
||||
}
|
||||
|
||||
#[test]
|
||||
@@ -185,4 +185,4 @@ mod tests {
|
||||
fn test_top_0() {
|
||||
TopCollector::with_limit(0);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -90,7 +90,7 @@ impl Index {
|
||||
/// Creates a new index in a temp directory.
|
||||
///
|
||||
/// The index will use the `MMapDirectory` in a newly created directory.
|
||||
/// The temp directory will be destroyed automatically when the Index object
|
||||
/// The temp directory will be destroyed automatically when the `Index` object
|
||||
/// is destroyed.
|
||||
///
|
||||
/// The temp directory is only used for testing the `MmapDirectory`.
|
||||
@@ -100,7 +100,7 @@ impl Index {
|
||||
Index::from_directory(directory, schema)
|
||||
}
|
||||
|
||||
/// Creates a new index given a directory and an IndexMeta.
|
||||
/// Creates a new index given a directory and an `IndexMeta`.
|
||||
fn create_from_metas(directory: Box<Directory>, metas: IndexMeta) -> Result<Index> {
|
||||
let schema = metas.schema.clone();
|
||||
let index = Index {
|
||||
@@ -160,7 +160,7 @@ impl Index {
|
||||
|
||||
/// Marks the segment as published.
|
||||
// TODO find a rusty way to hide that, while keeping
|
||||
// it visible for IndexWriters.
|
||||
// it visible for `IndexWriter`s.
|
||||
pub fn publish_segments(&mut self,
|
||||
segment_ids: &[SegmentId],
|
||||
docstamp: u64) -> Result<()> {
|
||||
@@ -204,7 +204,7 @@ impl Index {
|
||||
|
||||
}
|
||||
|
||||
/// Return a segment object given a segment_id
|
||||
/// Return a segment object given a `segment_id`
|
||||
///
|
||||
/// The segment may or may not exist.
|
||||
fn segment(&self, segment_id: SegmentId) -> Segment {
|
||||
@@ -246,7 +246,7 @@ impl Index {
|
||||
/// Either
|
||||
// - it fails, in which case an error is returned,
|
||||
/// and the `meta.json` remains untouched,
|
||||
/// - it success, and `meta.json` is written
|
||||
/// - it succeeds, and `meta.json` is written
|
||||
/// and flushed.
|
||||
pub fn save_metas(&mut self,) -> Result<()> {
|
||||
let mut w = Vec::new();
|
||||
@@ -286,9 +286,9 @@ impl Index {
|
||||
///
|
||||
/// This method should be called every single time a search
|
||||
/// query is performed.
|
||||
/// The searcher are taken from a pool of `NUM_SEARCHERS` searchers.
|
||||
/// The searchers are taken from a pool of `NUM_SEARCHERS` searchers.
|
||||
/// If no searcher is available
|
||||
/// it may block.
|
||||
/// this may block.
|
||||
///
|
||||
/// The same searcher must be used for a given query, as it ensures
|
||||
/// the use of a consistent segment set.
|
||||
@@ -313,4 +313,4 @@ impl Clone for Index {
|
||||
searcher_pool: self.searcher_pool.clone(),
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -25,7 +25,7 @@ use schema::TextIndexingOptions;
|
||||
use error::Error;
|
||||
|
||||
|
||||
/// Entrypoint to access all of the datastructures of the `Segment`
|
||||
/// Entry point to access all of the datastructures of the `Segment`
|
||||
///
|
||||
/// - term dictionary
|
||||
/// - postings
|
||||
@@ -34,8 +34,8 @@ use error::Error;
|
||||
/// - field norm reader
|
||||
///
|
||||
/// The segment reader has a very low memory footprint,
|
||||
/// as close to all of the memory data is in Mmapped.
|
||||
///
|
||||
/// as close to all of the memory data is mmapped.
|
||||
///
|
||||
pub struct SegmentReader {
|
||||
segment_info: SegmentInfo,
|
||||
segment_id: SegmentId,
|
||||
@@ -51,7 +51,7 @@ pub struct SegmentReader {
|
||||
impl SegmentReader {
|
||||
/// Returns the highest document id ever attributed in
|
||||
/// this segment + 1.
|
||||
/// Today, `tantivy` does not handle deletes so, it happens
|
||||
/// Today, `tantivy` does not handle deletes, so it happens
|
||||
/// to also be the number of documents in the index.
|
||||
pub fn max_doc(&self) -> DocId {
|
||||
self.segment_info.max_doc
|
||||
@@ -233,7 +233,7 @@ impl SegmentReader {
|
||||
self.read_postings(term, segment_posting_option)
|
||||
}
|
||||
|
||||
/// Returns the term info of associated with the term.
|
||||
/// Returns the term info associated with the term.
|
||||
pub fn get_term_info(&self, term: &Term) -> Option<TermInfo> {
|
||||
self.term_infos.get(term.as_slice())
|
||||
}
|
||||
|
||||
@@ -9,7 +9,7 @@ use std::marker::Sync;
|
||||
|
||||
/// Write-once read many (WORM) abstraction for where tantivy's index should be stored.
|
||||
///
|
||||
/// There is currently two implementations of `Directory`
|
||||
/// There are currently two implementations of `Directory`
|
||||
///
|
||||
/// - The [`MMapDirectory`](struct.MmapDirectory.html), this
|
||||
/// should be your default choice.
|
||||
@@ -20,19 +20,19 @@ pub trait Directory: fmt::Debug + Send + Sync + 'static {
|
||||
|
||||
/// Opens a virtual file for read.
|
||||
///
|
||||
/// Once a virtualfile is open, its data may not
|
||||
/// Once a virtual file is open, its data may not
|
||||
/// change.
|
||||
///
|
||||
/// Specifically, subsequent write or flush should
|
||||
/// have no effect the returned `ReadOnlySource` object.
|
||||
/// Specifically, subsequent writes or flushes should
|
||||
/// have no effect on the returned `ReadOnlySource` object.
|
||||
fn open_read(&self, path: &Path) -> result::Result<ReadOnlySource, FileError>;
|
||||
|
||||
/// Removes a file
|
||||
///
|
||||
/// Removing a file will not affect eventual
|
||||
/// Removing a file will not affect an eventual
|
||||
/// existing ReadOnlySource pointing to it.
|
||||
///
|
||||
/// Removing a non existing files, yields a
|
||||
/// Removing a nonexistent file, yields a
|
||||
/// `FileError::DoesNotExist`.
|
||||
fn delete(&self, path: &Path) -> result::Result<(), FileError>;
|
||||
|
||||
@@ -44,28 +44,28 @@ pub trait Directory: fmt::Debug + Send + Sync + 'static {
|
||||
/// same path should return a `ReadOnlySource`.
|
||||
///
|
||||
/// Write operations may be aggressively buffered.
|
||||
/// The client of this trait is in charge to call flush
|
||||
/// The client of this trait is responsible for calling flush
|
||||
/// to ensure that subsequent `read` operations
|
||||
/// will take in account preceding `write` operations.
|
||||
/// will take into account preceding `write` operations.
|
||||
///
|
||||
/// Flush operation should also be persistent.
|
||||
///
|
||||
/// User shall not rely on `Drop` triggering `flush`.
|
||||
/// The user shall not rely on `Drop` triggering `flush`.
|
||||
/// Note that `RAMDirectory` will panic! if `flush`
|
||||
/// was not called.
|
||||
///
|
||||
/// The file may not previously exists.
|
||||
/// The file may not previously exist.
|
||||
fn open_write(&mut self, path: &Path) -> Result<WritePtr, OpenWriteError>;
|
||||
|
||||
/// Atomically replace the content of a file by data.
|
||||
/// Atomically replace the content of a file with data.
|
||||
///
|
||||
/// This calls ensure that reads can never *observe*
|
||||
/// a partially written file.
|
||||
///
|
||||
/// The file may or may not previously exists.
|
||||
/// The file may or may not previously exist.
|
||||
fn atomic_write(&mut self, path: &Path, data: &[u8]) -> io::Result<()>;
|
||||
|
||||
/// Clone the directory and boxes the clone
|
||||
/// Clones the directory and boxes the clone
|
||||
fn box_clone(&self) -> Box<Directory>;
|
||||
}
|
||||
|
||||
|
||||
@@ -47,7 +47,7 @@ impl MmapDirectory {
|
||||
/// Creates a new MmapDirectory in a temporary directory.
|
||||
///
|
||||
/// This is mostly useful to test the MmapDirectory itself.
|
||||
/// For your unit test, prefer the RAMDirectory.
|
||||
/// For your unit tests, prefer the RAMDirectory.
|
||||
pub fn create_from_tempdir() -> io::Result<MmapDirectory> {
|
||||
let tempdir = try!(TempDir::new("index"));
|
||||
let tempdir_path = PathBuf::from(tempdir.path());
|
||||
@@ -81,7 +81,7 @@ impl MmapDirectory {
|
||||
}
|
||||
|
||||
/// Joins a relative_path to the directory `root_path`
|
||||
/// to create proper complete `filepath`.
|
||||
/// to create a proper complete `filepath`.
|
||||
fn resolve_path(&self, relative_path: &Path) -> PathBuf {
|
||||
self.root_path.join(relative_path)
|
||||
}
|
||||
|
||||
@@ -11,7 +11,7 @@ use directory::error::{OpenWriteError, FileError};
|
||||
use directory::WritePtr;
|
||||
use super::shared_vec_slice::SharedVecSlice;
|
||||
|
||||
/// Writer associated to the `RAMDirectory`
|
||||
/// Writer associated with the `RAMDirectory`
|
||||
///
|
||||
/// The Writer just writes a buffer.
|
||||
///
|
||||
@@ -133,9 +133,9 @@ impl fmt::Debug for RAMDirectory {
|
||||
}
|
||||
|
||||
|
||||
/// Directory storing everything in anonymous memory.
|
||||
/// A Directory storing everything in anonymous memory.
|
||||
///
|
||||
/// It's main purpose is unit test.
|
||||
/// It is mainly meant for unit testing.
|
||||
/// Writes are only made visible upon flushing.
|
||||
///
|
||||
#[derive(Clone)]
|
||||
@@ -161,7 +161,7 @@ impl Directory for RAMDirectory {
|
||||
fn open_write(&mut self, path: &Path) -> Result<WritePtr, OpenWriteError> {
|
||||
let path_buf = PathBuf::from(path);
|
||||
let vec_writer = VecWriter::new(path_buf.clone(), self.fs.clone());
|
||||
// force the creation of the file to mimick the MMap directory.
|
||||
// force the creation of the file to mimic the MMap directory.
|
||||
if try!(self.fs.write(path_buf.clone(), &Vec::new())) {
|
||||
Err(OpenWriteError::FileAlreadyExists(path_buf))
|
||||
}
|
||||
|
||||
@@ -96,7 +96,7 @@ pub use postings::SegmentPostingsOption;
|
||||
|
||||
|
||||
/// u32 identifying a document within a segment.
|
||||
/// Document gets their doc id assigned incrementally,
|
||||
/// Documents have their doc id assigned incrementally,
|
||||
/// as they are added in the segment.
|
||||
pub type DocId = u32;
|
||||
|
||||
|
||||
@@ -4,7 +4,7 @@ use std::borrow::BorrowMut;
|
||||
use std::cmp::Ordering;
|
||||
|
||||
|
||||
/// Expressed the outcome of a call to `DocSet`'s `.skip_next(...)`.
|
||||
/// Expresses the outcome of a call to `DocSet`'s `.skip_next(...)`.
|
||||
#[derive(PartialEq, Eq, Debug)]
|
||||
pub enum SkipResult {
|
||||
/// target was in the docset
|
||||
@@ -24,8 +24,8 @@ pub trait DocSet {
|
||||
/// element.
|
||||
fn advance(&mut self,) -> bool;
|
||||
|
||||
/// After skipping position, the iterator in such a way `.doc()`
|
||||
/// will return a value greater or equal to target.
|
||||
/// After skipping, position the iterator in such a way that `.doc()`
|
||||
/// will return a value greater than or equal to target.
|
||||
///
|
||||
/// SkipResult expresses whether the `target value` was reached, overstepped,
|
||||
/// or if the `DocSet` was entirely consumed without finding any value
|
||||
@@ -97,4 +97,4 @@ impl<'a, TDocSet: DocSet> DocSet for &'a mut TDocSet {
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
|
||||
|
||||
@@ -12,8 +12,8 @@ use common::HasLen;
|
||||
/// as well as the list of term positions.
|
||||
///
|
||||
/// Its main implementation is `SegmentPostings`,
|
||||
/// but some other implementation mocking SegmentPostings exists,
|
||||
/// in order to help merging segment or for testing.
|
||||
/// but other implementations mocking SegmentPostings exist,
|
||||
/// in order to help when merging segments or for testing.
|
||||
pub trait Postings: DocSet {
|
||||
/// Returns the term frequency
|
||||
fn term_freq(&self,) -> u32;
|
||||
|
||||
@@ -29,7 +29,7 @@ pub enum ParsingError {
|
||||
|
||||
/// Tantivy's Query parser
|
||||
///
|
||||
/// The language covered by the current is extremely simple.
|
||||
/// The language covered by the current parser is extremely simple.
|
||||
///
|
||||
/// * simple terms: "e.g.: `Barack Obama` are simply analyzed using
|
||||
/// tantivy's `StandardTokenizer`, hence becoming `["barack", "obama"]`.
|
||||
@@ -44,7 +44,7 @@ pub enum ParsingError {
|
||||
///
|
||||
/// This behavior is slower, but is not a bad idea if the user is sorting
|
||||
/// by relevance : The user typically just scans through the first few
|
||||
/// documents in order of decreasing relevance and will stop when the document
|
||||
/// documents in order of decreasing relevance and will stop when the documents
|
||||
/// are not relevant anymore.
|
||||
/// Making it possible to make this behavior customizable is tracked in
|
||||
/// [issue #27](https://github.com/fulmicoton/tantivy/issues/27).
|
||||
@@ -135,9 +135,9 @@ impl QueryParser {
|
||||
/// Parse a query
|
||||
///
|
||||
/// Note that `parse_query` returns an error if the input
|
||||
/// not a valid query.
|
||||
/// is not a valid query.
|
||||
///
|
||||
/// There is currently no lenient mode for the query parse
|
||||
/// There is currently no lenient mode for the query parser
|
||||
/// which makes it a bad choice for a public/broad user search engine.
|
||||
///
|
||||
/// Implementing a lenient mode for this query parser is tracked
|
||||
|
||||
@@ -4,7 +4,7 @@
|
||||
# Schema definition
|
||||
|
||||
Tantivy has a very strict schema.
|
||||
The schema defines information about the fields your index contains, that is for each field :
|
||||
The schema defines information about the fields your index contains, that is, for each field :
|
||||
|
||||
* the field name (may only contain letters `[a-zA-Z]`, number `[0-9]`, and `_`)
|
||||
* the type of the field (currently only `text` and `u32` are supported)
|
||||
@@ -37,20 +37,20 @@ let schema = schema_builder.build();
|
||||
|
||||
We can split the problem of generating a search result page into two phases :
|
||||
|
||||
* identifying the list of 10 or so document to be displayed (Conceptually `query -> doc_ids[]`)
|
||||
* identifying the list of 10 or so documents to be displayed (Conceptually `query -> doc_ids[]`)
|
||||
* for each of these documents, retrieving the information required to generate the serp page. (`doc_ids[] -> Document[]`)
|
||||
|
||||
In the first phase, the hability to search for documents by the given field, is determined by the [`TextIndexingOptions`](enum.TextIndexingOptions.html) of our
|
||||
In the first phase, the ability to search for documents by the given field is determined by the [`TextIndexingOptions`](enum.TextIndexingOptions.html) of our
|
||||
[`TextOptions`](struct.TextOptions.html).
|
||||
|
||||
The effect of each possible settings is described more in detail [`TextIndexingOptions`](enum.TextIndexingOptions.html).
|
||||
The effect of each possible setting is described more in detail [`TextIndexingOptions`](enum.TextIndexingOptions.html).
|
||||
|
||||
On the other hand setting the field as stored or not determines whether the field should be returned when [`searcher.doc(doc_address)`](../struct.Searcher.html#method.doc)
|
||||
is called.
|
||||
|
||||
### Shortcuts
|
||||
|
||||
For convenience, a few special value of `TextOptions` for your convenience.
|
||||
For convenience, a few special values of `TextOptions`.
|
||||
They can be composed using the `|` operator.
|
||||
The example can be rewritten :
|
||||
|
||||
@@ -82,7 +82,7 @@ Just like for Text fields (see above),
|
||||
setting the field as stored defines whether the field will be
|
||||
returned when [`searcher.doc(doc_address)`](../struct.Searcher.html#method.doc) is called,
|
||||
and setting the field as indexed means that we will be able perform queries such as `num_stars:10`.
|
||||
Note that contrary to text fields, u32 can only be indexed in one way for the moment.
|
||||
Note that unlike text fields, u32 can only be indexed in one way for the moment.
|
||||
This may change when we will start supporting range queries.
|
||||
|
||||
The `fast` option on the other hand is specific to u32 fields, and is only relevant
|
||||
|
||||
@@ -15,7 +15,7 @@ use std::fmt;
|
||||
|
||||
|
||||
/// Tantivy has a very strict schema.
|
||||
/// You need to specify in advance, whether a field is indexed or not,
|
||||
/// You need to specify in advance whether a field is indexed or not,
|
||||
/// stored or not, and RAM-based or not.
|
||||
///
|
||||
/// This is done by creating a schema object, and
|
||||
@@ -483,4 +483,4 @@ mod tests {
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user