diff --git a/doc/src/SUMMARY.md b/doc/src/SUMMARY.md index a280d19b7..c1da67a69 100644 --- a/doc/src/SUMMARY.md +++ b/doc/src/SUMMARY.md @@ -4,8 +4,9 @@ [Avant Propos](./avant-propos.md) +- [Schema](./schema.md) +- [Indexing](./indexing.md) - [Segments](./basis.md) -- [Defining your schema](./schema.md) - [Facetting](./facetting.md) - [Innerworkings](./innerworkings.md) - [Inverted index](./inverted_index.md) diff --git a/doc/src/avant-propos.md b/doc/src/avant-propos.md index 485afd178..7af000311 100644 --- a/doc/src/avant-propos.md +++ b/doc/src/avant-propos.md @@ -31,4 +31,3 @@ relevancy, collapsing, highlighting, spatial search. index from a different format. Tantivy exposes a lot of low level API to do all of these things. - diff --git a/doc/src/indexing.md b/doc/src/indexing.md new file mode 100644 index 000000000..e69de29bb diff --git a/doc/src/schema.md b/doc/src/schema.md index eb661bd69..b19f8d966 100644 --- a/doc/src/schema.md +++ b/doc/src/schema.md @@ -1 +1,50 @@ -# Defining your schema +# Schema + +When starting a new project using tantivy, your first step will be to your schema. Be aware that changing it will probably require you to reindex all of your data. +It is strongly recommended you keep the means to iterate through your original data when this happens. + +If not specified otherwise, tantivy does not keep a raw version of your data, +so the good practise is to rely on a distinct storage to store your +raw documents. + +The schema defines both the type of the fields you are indexing, but also the type of indexing you want to apply to them. The set of search operations that you will be able to perform depends on the way you set up your schema. + +Here is what defining your schema could look like. + +```Rust +use tantivy::schema::{Schema, TEXT, STORED, INT_INDEXED}; + +let mut schema_builder = SchemaBuilder::default(); +let text_field = schema_builder.add_text_field("name", TEXT | STORED); +let tag_field = schema_builder.add_facet_field("tags"); +let timestamp_field = schema_buider.add_u64_field("timestamp", INT_INDEXED) +let schema = schema_builder.build(); +``` + +Notice how adding a new field to your schema builder +follows the following pattern : + +```verbatim + schema_builder.add__field("", ); +``` + +This method returns a `Field` handle that will be used for all kind of + +# Field types + +Tantivy currently supports only 4 types. + +- `text` (understand `&str`) +- `u64` and `i64` +- `HierarchicalFacet` + +Let's go into their specificities. + +# Text + +Full-text search is the bread and butter of search engine. +The key idea is fairly simple. Your text is broken apart into tokens (that's +what we call tokenization). Tantivy then keeps track of the list of the documents containing each token. + +In order to increase recall you might want to normalize tokens. For instance, +you most likely want to lowercase your tokens so that documents match the query `cat` regardless of whether your they contain the token `cat` or `Cat`.