Compare commits

...

9 Commits

Author SHA1 Message Date
Adam Reichold
4fd2b22b69 Make allocating field names avoidable for range and exists queries.
If the field names are statically known, `Cow::Borrowed(&'static str)` can
handle them without allocations. The general case is still handled by
`Cow::Owned(String)`.
2024-01-26 17:31:44 +01:00
Paul Masurel
9b7f3a55cf Bumped census version 2024-01-26 19:32:02 +09:00
PSeitz
1dacdb6c85 add histogram agg test on empty index (#2306) 2024-01-23 16:27:34 +01:00
François Massot
30483310ca Minor improvement of README.md (#2305)
* Update README.md

* Remove useless paragraph

* Wording.
2024-01-19 17:46:48 +09:00
Tushar
e1d18b5114 chore: Expose TopDocs::order_by_u64_field again (#2282) 2024-01-18 05:58:24 +01:00
trinity-1686a
108f30ba23 allow newline where we allow space in query parser (#2302)
fix regression from the new parser
2024-01-17 14:38:35 +01:00
PSeitz
5943ee46bd Truncate keys to u16::MAX in term hashmap (#2299)
Truncate keys to u16::MAX, instead e.g. storing 0 bytes for keys with length u16::MAX + 1

The term hashmap has a hidden API contract to only accept terms with lenght up u16::MAX.
2024-01-11 10:19:12 +01:00
PSeitz
f95a76293f add memory arena test (#2298)
* add memory arena test

* add assert

* Update stacker/src/memory_arena.rs

Co-authored-by: Paul Masurel <paul@quickwit.io>

---------

Co-authored-by: Paul Masurel <paul@quickwit.io>
2024-01-11 07:18:48 +01:00
Paul Masurel
014328e378 Fix bug that can cause get_docids_for_value_range to panic. (#2295)
* Fix bug that can cause `get_docids_for_value_range` to panic.

When `selected_docid_range.end == num_rows`, we would get a panic
as we try to access a non-existing blockmeta.

This PR accepts calls to rank with any value.
For any value above num_rows we simply return non_null_rows.

Fixes #2293

* add tests, merge variables

---------

Co-authored-by: Pascal Seitz <pascal.seitz@gmail.com>
2024-01-09 14:52:20 +01:00
17 changed files with 261 additions and 118 deletions

View File

@@ -38,7 +38,7 @@ crossbeam-channel = "0.5.4"
rust-stemmers = "1.2.0" rust-stemmers = "1.2.0"
downcast-rs = "1.2.0" downcast-rs = "1.2.0"
bitpacking = { version = "0.9.2", default-features = false, features = ["bitpacker4x"] } bitpacking = { version = "0.9.2", default-features = false, features = ["bitpacker4x"] }
census = "0.4.0" census = "0.4.2"
rustc-hash = "1.1.0" rustc-hash = "1.1.0"
thiserror = "1.0.30" thiserror = "1.0.30"
htmlescape = "0.3.1" htmlescape = "0.3.1"

View File

@@ -5,19 +5,18 @@
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Crates.io](https://img.shields.io/crates/v/tantivy.svg)](https://crates.io/crates/tantivy) [![Crates.io](https://img.shields.io/crates/v/tantivy.svg)](https://crates.io/crates/tantivy)
![Tantivy](https://tantivy-search.github.io/logo/tantivy-logo.png) <img src="https://tantivy-search.github.io/logo/tantivy-logo.png" alt="Tantivy, the fastest full-text search engine library written in Rust" height="250">
**Tantivy** is a **full-text search engine library** written in Rust. ## Fast full-text search engine library written in Rust
It is closer to [Apache Lucene](https://lucene.apache.org/) than to [Elasticsearch](https://www.elastic.co/products/elasticsearch) or [Apache Solr](https://lucene.apache.org/solr/) in the sense it is not **If you are looking for an alternative to Elasticsearch or Apache Solr, check out [Quickwit](https://github.com/quickwit-oss/quickwit), our distributed search engine built on top of Tantivy.**
an off-the-shelf search engine server, but rather a crate that can be used
to build such a search engine. Tantivy is closer to [Apache Lucene](https://lucene.apache.org/) than to [Elasticsearch](https://www.elastic.co/products/elasticsearch) or [Apache Solr](https://lucene.apache.org/solr/) in the sense it is not
an off-the-shelf search engine server, but rather a crate that can be used to build such a search engine.
Tantivy is, in fact, strongly inspired by Lucene's design. Tantivy is, in fact, strongly inspired by Lucene's design.
If you are looking for an alternative to Elasticsearch or Apache Solr, check out [Quickwit](https://github.com/quickwit-oss/quickwit), our search engine built on top of Tantivy. ## Benchmark
# Benchmark
The following [benchmark](https://tantivy-search.github.io/bench/) breakdowns The following [benchmark](https://tantivy-search.github.io/bench/) breakdowns
performance for different types of queries/collections. performance for different types of queries/collections.
@@ -28,7 +27,7 @@ Your mileage WILL vary depending on the nature of queries and their load.
Details about the benchmark can be found at this [repository](https://github.com/quickwit-oss/search-benchmark-game). Details about the benchmark can be found at this [repository](https://github.com/quickwit-oss/search-benchmark-game).
# Features ## Features
- Full-text search - Full-text search
- Configurable tokenizer (stemming available for 17 Latin languages) with third party support for Chinese ([tantivy-jieba](https://crates.io/crates/tantivy-jieba) and [cang-jie](https://crates.io/crates/cang-jie)), Japanese ([lindera](https://github.com/lindera-morphology/lindera-tantivy), [Vaporetto](https://crates.io/crates/vaporetto_tantivy), and [tantivy-tokenizer-tiny-segmenter](https://crates.io/crates/tantivy-tokenizer-tiny-segmenter)) and Korean ([lindera](https://github.com/lindera-morphology/lindera-tantivy) + [lindera-ko-dic-builder](https://github.com/lindera-morphology/lindera-ko-dic-builder)) - Configurable tokenizer (stemming available for 17 Latin languages) with third party support for Chinese ([tantivy-jieba](https://crates.io/crates/tantivy-jieba) and [cang-jie](https://crates.io/crates/cang-jie)), Japanese ([lindera](https://github.com/lindera-morphology/lindera-tantivy), [Vaporetto](https://crates.io/crates/vaporetto_tantivy), and [tantivy-tokenizer-tiny-segmenter](https://crates.io/crates/tantivy-tokenizer-tiny-segmenter)) and Korean ([lindera](https://github.com/lindera-morphology/lindera-tantivy) + [lindera-ko-dic-builder](https://github.com/lindera-morphology/lindera-ko-dic-builder))
@@ -54,11 +53,11 @@ Details about the benchmark can be found at this [repository](https://github.com
- Searcher Warmer API - Searcher Warmer API
- Cheesy logo with a horse - Cheesy logo with a horse
## Non-features ### Non-features
Distributed search is out of the scope of Tantivy, but if you are looking for this feature, check out [Quickwit](https://github.com/quickwit-oss/quickwit/). Distributed search is out of the scope of Tantivy, but if you are looking for this feature, check out [Quickwit](https://github.com/quickwit-oss/quickwit/).
# Getting started ## Getting started
Tantivy works on stable Rust and supports Linux, macOS, and Windows. Tantivy works on stable Rust and supports Linux, macOS, and Windows.
@@ -68,7 +67,7 @@ index documents, and search via the CLI or a small server with a REST API.
It walks you through getting a Wikipedia search engine up and running in a few minutes. It walks you through getting a Wikipedia search engine up and running in a few minutes.
- [Reference doc for the last released version](https://docs.rs/tantivy/) - [Reference doc for the last released version](https://docs.rs/tantivy/)
# How can I support this project? ## How can I support this project?
There are many ways to support this project. There are many ways to support this project.
@@ -79,16 +78,16 @@ There are many ways to support this project.
- Contribute code (you can join [our Discord server](https://discord.gg/MT27AG5EVE)) - Contribute code (you can join [our Discord server](https://discord.gg/MT27AG5EVE))
- Talk about Tantivy around you - Talk about Tantivy around you
# Contributing code ## Contributing code
We use the GitHub Pull Request workflow: reference a GitHub ticket and/or include a comprehensive commit message when opening a PR. We use the GitHub Pull Request workflow: reference a GitHub ticket and/or include a comprehensive commit message when opening a PR.
Feel free to update CHANGELOG.md with your contribution. Feel free to update CHANGELOG.md with your contribution.
## Tokenizer ### Tokenizer
When implementing a tokenizer for tantivy depend on the `tantivy-tokenizer-api` crate. When implementing a tokenizer for tantivy depend on the `tantivy-tokenizer-api` crate.
## Clone and build locally ### Clone and build locally
Tantivy compiles on stable Rust. Tantivy compiles on stable Rust.
To check out and run tests, you can simply run: To check out and run tests, you can simply run:
@@ -99,7 +98,7 @@ cd tantivy
cargo test cargo test
``` ```
# Companies Using Tantivy ## Companies Using Tantivy
<p align="left"> <p align="left">
<img align="center" src="doc/assets/images/etsy.png" alt="Etsy" height="25" width="auto" />&nbsp; <img align="center" src="doc/assets/images/etsy.png" alt="Etsy" height="25" width="auto" />&nbsp;
@@ -111,7 +110,7 @@ cargo test
<img align="center" src="doc/assets/images/element-dark-theme.png#gh-dark-mode-only" alt="Element.io" height="25" width="auto" /> <img align="center" src="doc/assets/images/element-dark-theme.png#gh-dark-mode-only" alt="Element.io" height="25" width="auto" />
</p> </p>
# FAQ ## FAQ
### Can I use Tantivy in other languages? ### Can I use Tantivy in other languages?

View File

@@ -126,18 +126,18 @@ impl ColumnIndex {
} }
} }
pub fn docid_range_to_rowids(&self, doc_id: Range<DocId>) -> Range<RowId> { pub fn docid_range_to_rowids(&self, doc_id_range: Range<DocId>) -> Range<RowId> {
match self { match self {
ColumnIndex::Empty { .. } => 0..0, ColumnIndex::Empty { .. } => 0..0,
ColumnIndex::Full => doc_id, ColumnIndex::Full => doc_id_range,
ColumnIndex::Optional(optional_index) => { ColumnIndex::Optional(optional_index) => {
let row_start = optional_index.rank(doc_id.start); let row_start = optional_index.rank(doc_id_range.start);
let row_end = optional_index.rank(doc_id.end); let row_end = optional_index.rank(doc_id_range.end);
row_start..row_end row_start..row_end
} }
ColumnIndex::Multivalued(multivalued_index) => { ColumnIndex::Multivalued(multivalued_index) => {
let end_docid = doc_id.end.min(multivalued_index.num_docs() - 1) + 1; let end_docid = doc_id_range.end.min(multivalued_index.num_docs() - 1) + 1;
let start_docid = doc_id.start.min(end_docid); let start_docid = doc_id_range.start.min(end_docid);
let row_start = multivalued_index.start_index_column.get_val(start_docid); let row_start = multivalued_index.start_index_column.get_val(start_docid);
let row_end = multivalued_index.start_index_column.get_val(end_docid); let row_end = multivalued_index.start_index_column.get_val(end_docid);

View File

@@ -21,8 +21,6 @@ const DENSE_BLOCK_THRESHOLD: u32 =
const ELEMENTS_PER_BLOCK: u32 = u16::MAX as u32 + 1; const ELEMENTS_PER_BLOCK: u32 = u16::MAX as u32 + 1;
const BLOCK_SIZE: RowId = 1 << 16;
#[derive(Copy, Clone, Debug)] #[derive(Copy, Clone, Debug)]
struct BlockMeta { struct BlockMeta {
non_null_rows_before_block: u32, non_null_rows_before_block: u32,
@@ -109,8 +107,8 @@ struct RowAddr {
#[inline(always)] #[inline(always)]
fn row_addr_from_row_id(row_id: RowId) -> RowAddr { fn row_addr_from_row_id(row_id: RowId) -> RowAddr {
RowAddr { RowAddr {
block_id: (row_id / BLOCK_SIZE) as u16, block_id: (row_id / ELEMENTS_PER_BLOCK) as u16,
in_block_row_id: (row_id % BLOCK_SIZE) as u16, in_block_row_id: (row_id % ELEMENTS_PER_BLOCK) as u16,
} }
} }
@@ -185,8 +183,13 @@ impl Set<RowId> for OptionalIndex {
} }
} }
/// Any value doc_id is allowed.
/// In particular, doc_id = num_rows.
#[inline] #[inline]
fn rank(&self, doc_id: DocId) -> RowId { fn rank(&self, doc_id: DocId) -> RowId {
if doc_id >= self.num_docs() {
return self.num_non_nulls();
}
let RowAddr { let RowAddr {
block_id, block_id,
in_block_row_id, in_block_row_id,
@@ -200,13 +203,15 @@ impl Set<RowId> for OptionalIndex {
block_meta.non_null_rows_before_block + block_offset_row_id block_meta.non_null_rows_before_block + block_offset_row_id
} }
/// Any value doc_id is allowed.
/// In particular, doc_id = num_rows.
#[inline] #[inline]
fn rank_if_exists(&self, doc_id: DocId) -> Option<RowId> { fn rank_if_exists(&self, doc_id: DocId) -> Option<RowId> {
let RowAddr { let RowAddr {
block_id, block_id,
in_block_row_id, in_block_row_id,
} = row_addr_from_row_id(doc_id); } = row_addr_from_row_id(doc_id);
let block_meta = self.block_metas[block_id as usize]; let block_meta = *self.block_metas.get(block_id as usize)?;
let block = self.block(block_meta); let block = self.block(block_meta);
let block_offset_row_id = match block { let block_offset_row_id = match block {
Block::Dense(dense_block) => dense_block.rank_if_exists(in_block_row_id), Block::Dense(dense_block) => dense_block.rank_if_exists(in_block_row_id),
@@ -491,7 +496,7 @@ fn deserialize_optional_index_block_metadatas(
non_null_rows_before_block += num_non_null_rows; non_null_rows_before_block += num_non_null_rows;
} }
block_metas.resize( block_metas.resize(
((num_rows + BLOCK_SIZE - 1) / BLOCK_SIZE) as usize, ((num_rows + ELEMENTS_PER_BLOCK - 1) / ELEMENTS_PER_BLOCK) as usize,
BlockMeta { BlockMeta {
non_null_rows_before_block, non_null_rows_before_block,
start_byte_offset, start_byte_offset,

View File

@@ -39,7 +39,8 @@ pub trait Set<T> {
/// ///
/// # Panics /// # Panics
/// ///
/// May panic if rank is greater than the number of elements in the Set. /// May panic if rank is greater or equal to the number of
/// elements in the Set.
fn select(&self, rank: T) -> T; fn select(&self, rank: T) -> T;
/// Creates a brand new select cursor. /// Creates a brand new select cursor.

View File

@@ -3,6 +3,30 @@ use proptest::strategy::Strategy;
use proptest::{prop_oneof, proptest}; use proptest::{prop_oneof, proptest};
use super::*; use super::*;
use crate::{ColumnarReader, ColumnarWriter, DynamicColumnHandle};
#[test]
fn test_optional_index_bug_2293() {
// tests for panic in docid_range_to_rowids for docid == num_docs
test_optional_index_with_num_docs(ELEMENTS_PER_BLOCK - 1);
test_optional_index_with_num_docs(ELEMENTS_PER_BLOCK);
test_optional_index_with_num_docs(ELEMENTS_PER_BLOCK + 1);
}
fn test_optional_index_with_num_docs(num_docs: u32) {
let mut dataframe_writer = ColumnarWriter::default();
dataframe_writer.record_numerical(100, "score", 80i64);
let mut buffer: Vec<u8> = Vec::new();
dataframe_writer
.serialize(num_docs, None, &mut buffer)
.unwrap();
let columnar = ColumnarReader::open(buffer).unwrap();
assert_eq!(columnar.num_columns(), 1);
let cols: Vec<DynamicColumnHandle> = columnar.read_columns("score").unwrap();
assert_eq!(cols.len(), 1);
let col = cols[0].open().unwrap();
col.column_index().docid_range_to_rowids(0..num_docs);
}
#[test] #[test]
fn test_dense_block_threshold() { fn test_dense_block_threshold() {
@@ -35,7 +59,7 @@ proptest! {
#[test] #[test]
fn test_with_random_sets_simple() { fn test_with_random_sets_simple() {
let vals = 10..BLOCK_SIZE * 2; let vals = 10..ELEMENTS_PER_BLOCK * 2;
let mut out: Vec<u8> = Vec::new(); let mut out: Vec<u8> = Vec::new();
serialize_optional_index(&vals, 100, &mut out).unwrap(); serialize_optional_index(&vals, 100, &mut out).unwrap();
let null_index = open_optional_index(OwnedBytes::new(out)).unwrap(); let null_index = open_optional_index(OwnedBytes::new(out)).unwrap();
@@ -171,7 +195,7 @@ fn test_optional_index_rank() {
test_optional_index_rank_aux(&[0u32, 1u32]); test_optional_index_rank_aux(&[0u32, 1u32]);
let mut block = Vec::new(); let mut block = Vec::new();
block.push(3u32); block.push(3u32);
block.extend((0..BLOCK_SIZE).map(|i| i + BLOCK_SIZE + 1)); block.extend((0..ELEMENTS_PER_BLOCK).map(|i| i + ELEMENTS_PER_BLOCK + 1));
test_optional_index_rank_aux(&block); test_optional_index_rank_aux(&block);
} }
@@ -185,8 +209,8 @@ fn test_optional_index_iter_empty_one() {
fn test_optional_index_iter_dense_block() { fn test_optional_index_iter_dense_block() {
let mut block = Vec::new(); let mut block = Vec::new();
block.push(3u32); block.push(3u32);
block.extend((0..BLOCK_SIZE).map(|i| i + BLOCK_SIZE + 1)); block.extend((0..ELEMENTS_PER_BLOCK).map(|i| i + ELEMENTS_PER_BLOCK + 1));
test_optional_index_iter_aux(&block, 3 * BLOCK_SIZE); test_optional_index_iter_aux(&block, 3 * ELEMENTS_PER_BLOCK);
} }
#[test] #[test]

View File

@@ -101,7 +101,7 @@ pub trait ColumnValues<T: PartialOrd = u64>: Send + Sync {
row_id_hits: &mut Vec<RowId>, row_id_hits: &mut Vec<RowId>,
) { ) {
let row_id_range = row_id_range.start..row_id_range.end.min(self.num_vals()); let row_id_range = row_id_range.start..row_id_range.end.min(self.num_vals());
for idx in row_id_range.start..row_id_range.end { for idx in row_id_range {
let val = self.get_val(idx); let val = self.get_val(idx);
if value_range.contains(&val) { if value_range.contains(&val) {
row_id_hits.push(idx); row_id_hits.push(idx);

View File

@@ -81,8 +81,8 @@ where
T: InputTakeAtPosition + Clone, T: InputTakeAtPosition + Clone,
<T as InputTakeAtPosition>::Item: AsChar + Clone, <T as InputTakeAtPosition>::Item: AsChar + Clone,
{ {
opt_i(nom::character::complete::space0)(input) opt_i(nom::character::complete::multispace0)(input)
.map(|(left, (spaces, errors))| (left, (spaces.expect("space0 can't fail"), errors))) .map(|(left, (spaces, errors))| (left, (spaces.expect("multispace0 can't fail"), errors)))
} }
pub(crate) fn space1_infallible<T>(input: T) -> JResult<T, Option<T>> pub(crate) fn space1_infallible<T>(input: T) -> JResult<T, Option<T>>
@@ -90,7 +90,7 @@ where
T: InputTakeAtPosition + Clone + InputLength, T: InputTakeAtPosition + Clone + InputLength,
<T as InputTakeAtPosition>::Item: AsChar + Clone, <T as InputTakeAtPosition>::Item: AsChar + Clone,
{ {
opt_i(nom::character::complete::space1)(input).map(|(left, (spaces, mut errors))| { opt_i(nom::character::complete::multispace1)(input).map(|(left, (spaces, mut errors))| {
if spaces.is_none() { if spaces.is_none() {
errors.push(LenientErrorInternal { errors.push(LenientErrorInternal {
pos: left.input_len(), pos: left.input_len(),

View File

@@ -3,7 +3,7 @@ use std::iter::once;
use nom::branch::alt; use nom::branch::alt;
use nom::bytes::complete::tag; use nom::bytes::complete::tag;
use nom::character::complete::{ use nom::character::complete::{
anychar, char, digit1, none_of, one_of, satisfy, space0, space1, u32, anychar, char, digit1, multispace0, multispace1, none_of, one_of, satisfy, u32,
}; };
use nom::combinator::{eof, map, map_res, opt, peek, recognize, value, verify}; use nom::combinator::{eof, map, map_res, opt, peek, recognize, value, verify};
use nom::error::{Error, ErrorKind}; use nom::error::{Error, ErrorKind};
@@ -65,7 +65,7 @@ fn word_infallible(delimiter: &str) -> impl Fn(&str) -> JResult<&str, Option<&st
|inp| { |inp| {
opt_i_err( opt_i_err(
preceded( preceded(
space0, multispace0,
recognize(many1(satisfy(|c| { recognize(many1(satisfy(|c| {
!c.is_whitespace() && !delimiter.contains(c) !c.is_whitespace() && !delimiter.contains(c)
}))), }))),
@@ -225,10 +225,10 @@ fn term_group(inp: &str) -> IResult<&str, UserInputAst> {
map( map(
tuple(( tuple((
terminated(field_name, space0), terminated(field_name, multispace0),
delimited( delimited(
tuple((char('('), space0)), tuple((char('('), multispace0)),
separated_list0(space1, tuple((opt(occur_symbol), term_or_phrase))), separated_list0(multispace1, tuple((opt(occur_symbol), term_or_phrase))),
char(')'), char(')'),
), ),
)), )),
@@ -250,7 +250,7 @@ fn term_group_precond(inp: &str) -> IResult<&str, (), ()> {
(), (),
peek(tuple(( peek(tuple((
field_name, field_name,
space0, multispace0,
char('('), // when we are here, we know it can't be anything but a term group char('('), // when we are here, we know it can't be anything but a term group
))), ))),
)(inp) )(inp)
@@ -259,7 +259,7 @@ fn term_group_precond(inp: &str) -> IResult<&str, (), ()> {
fn term_group_infallible(inp: &str) -> JResult<&str, UserInputAst> { fn term_group_infallible(inp: &str) -> JResult<&str, UserInputAst> {
let (mut inp, (field_name, _, _, _)) = let (mut inp, (field_name, _, _, _)) =
tuple((field_name, space0, char('('), space0))(inp).expect("precondition failed"); tuple((field_name, multispace0, char('('), multispace0))(inp).expect("precondition failed");
let mut terms = Vec::new(); let mut terms = Vec::new();
let mut errs = Vec::new(); let mut errs = Vec::new();
@@ -305,7 +305,7 @@ fn exists(inp: &str) -> IResult<&str, UserInputLeaf> {
UserInputLeaf::Exists { UserInputLeaf::Exists {
field: String::new(), field: String::new(),
}, },
tuple((space0, char('*'))), tuple((multispace0, char('*'))),
)(inp) )(inp)
} }
@@ -314,7 +314,7 @@ fn exists_precond(inp: &str) -> IResult<&str, (), ()> {
(), (),
peek(tuple(( peek(tuple((
field_name, field_name,
space0, multispace0,
char('*'), // when we are here, we know it can't be anything but a exists char('*'), // when we are here, we know it can't be anything but a exists
))), ))),
)(inp) )(inp)
@@ -323,7 +323,7 @@ fn exists_precond(inp: &str) -> IResult<&str, (), ()> {
fn exists_infallible(inp: &str) -> JResult<&str, UserInputAst> { fn exists_infallible(inp: &str) -> JResult<&str, UserInputAst> {
let (inp, (field_name, _, _)) = let (inp, (field_name, _, _)) =
tuple((field_name, space0, char('*')))(inp).expect("precondition failed"); tuple((field_name, multispace0, char('*')))(inp).expect("precondition failed");
let exists = UserInputLeaf::Exists { field: field_name }.into(); let exists = UserInputLeaf::Exists { field: field_name }.into();
Ok((inp, (exists, Vec::new()))) Ok((inp, (exists, Vec::new())))
@@ -349,7 +349,7 @@ fn literal_no_group_infallible(inp: &str) -> JResult<&str, Option<UserInputAst>>
alt_infallible( alt_infallible(
( (
( (
value((), tuple((tag("IN"), space0, char('[')))), value((), tuple((tag("IN"), multispace0, char('[')))),
map(set_infallible, |(set, errs)| (Some(set), errs)), map(set_infallible, |(set, errs)| (Some(set), errs)),
), ),
( (
@@ -430,8 +430,8 @@ fn range(inp: &str) -> IResult<&str, UserInputLeaf> {
// check for unbounded range in the form of <5, <=10, >5, >=5 // check for unbounded range in the form of <5, <=10, >5, >=5
let elastic_unbounded_range = map( let elastic_unbounded_range = map(
tuple(( tuple((
preceded(space0, alt((tag(">="), tag("<="), tag("<"), tag(">")))), preceded(multispace0, alt((tag(">="), tag("<="), tag("<"), tag(">")))),
preceded(space0, range_term_val()), preceded(multispace0, range_term_val()),
)), )),
|(comparison_sign, bound)| match comparison_sign { |(comparison_sign, bound)| match comparison_sign {
">=" => (UserInputBound::Inclusive(bound), UserInputBound::Unbounded), ">=" => (UserInputBound::Inclusive(bound), UserInputBound::Unbounded),
@@ -444,7 +444,7 @@ fn range(inp: &str) -> IResult<&str, UserInputLeaf> {
); );
let lower_bound = map( let lower_bound = map(
separated_pair(one_of("{["), space0, range_term_val()), separated_pair(one_of("{["), multispace0, range_term_val()),
|(boundary_char, lower_bound)| { |(boundary_char, lower_bound)| {
if lower_bound == "*" { if lower_bound == "*" {
UserInputBound::Unbounded UserInputBound::Unbounded
@@ -457,7 +457,7 @@ fn range(inp: &str) -> IResult<&str, UserInputLeaf> {
); );
let upper_bound = map( let upper_bound = map(
separated_pair(range_term_val(), space0, one_of("}]")), separated_pair(range_term_val(), multispace0, one_of("}]")),
|(upper_bound, boundary_char)| { |(upper_bound, boundary_char)| {
if upper_bound == "*" { if upper_bound == "*" {
UserInputBound::Unbounded UserInputBound::Unbounded
@@ -469,8 +469,11 @@ fn range(inp: &str) -> IResult<&str, UserInputLeaf> {
}, },
); );
let lower_to_upper = let lower_to_upper = separated_pair(
separated_pair(lower_bound, tuple((space1, tag("TO"), space1)), upper_bound); lower_bound,
tuple((multispace1, tag("TO"), multispace1)),
upper_bound,
);
map( map(
alt((elastic_unbounded_range, lower_to_upper)), alt((elastic_unbounded_range, lower_to_upper)),
@@ -490,13 +493,16 @@ fn range_infallible(inp: &str) -> JResult<&str, UserInputLeaf> {
word_infallible("]}"), word_infallible("]}"),
space1_infallible, space1_infallible,
opt_i_err( opt_i_err(
terminated(tag("TO"), alt((value((), space1), value((), eof)))), terminated(tag("TO"), alt((value((), multispace1), value((), eof)))),
"missing keyword TO", "missing keyword TO",
), ),
word_infallible("]}"), word_infallible("]}"),
opt_i_err(one_of("]}"), "missing range delimiter"), opt_i_err(one_of("]}"), "missing range delimiter"),
)), )),
|((lower_bound_kind, _space0, lower, _space1, to, upper, upper_bound_kind), errs)| { |(
(lower_bound_kind, _multispace0, lower, _multispace1, to, upper, upper_bound_kind),
errs,
)| {
let lower_bound = match (lower_bound_kind, lower) { let lower_bound = match (lower_bound_kind, lower) {
(_, Some("*")) => UserInputBound::Unbounded, (_, Some("*")) => UserInputBound::Unbounded,
(_, None) => UserInputBound::Unbounded, (_, None) => UserInputBound::Unbounded,
@@ -596,10 +602,10 @@ fn range_infallible(inp: &str) -> JResult<&str, UserInputLeaf> {
fn set(inp: &str) -> IResult<&str, UserInputLeaf> { fn set(inp: &str) -> IResult<&str, UserInputLeaf> {
map( map(
preceded( preceded(
tuple((space0, tag("IN"), space1)), tuple((multispace0, tag("IN"), multispace1)),
delimited( delimited(
tuple((char('['), space0)), tuple((char('['), multispace0)),
separated_list0(space1, map(simple_term, |(_, term)| term)), separated_list0(multispace1, map(simple_term, |(_, term)| term)),
char(']'), char(']'),
), ),
), ),
@@ -667,7 +673,7 @@ fn leaf(inp: &str) -> IResult<&str, UserInputAst> {
alt(( alt((
delimited(char('('), ast, char(')')), delimited(char('('), ast, char(')')),
map(char('*'), |_| UserInputAst::from(UserInputLeaf::All)), map(char('*'), |_| UserInputAst::from(UserInputLeaf::All)),
map(preceded(tuple((tag("NOT"), space1)), leaf), negate), map(preceded(tuple((tag("NOT"), multispace1)), leaf), negate),
literal, literal,
))(inp) ))(inp)
} }
@@ -919,17 +925,17 @@ fn aggregate_infallible_expressions(
fn operand_leaf(inp: &str) -> IResult<&str, (BinaryOperand, UserInputAst)> { fn operand_leaf(inp: &str) -> IResult<&str, (BinaryOperand, UserInputAst)> {
tuple(( tuple((
terminated(binary_operand, space0), terminated(binary_operand, multispace0),
terminated(boosted_leaf, space0), terminated(boosted_leaf, multispace0),
))(inp) ))(inp)
} }
fn ast(inp: &str) -> IResult<&str, UserInputAst> { fn ast(inp: &str) -> IResult<&str, UserInputAst> {
let boolean_expr = map( let boolean_expr = map(
separated_pair(boosted_leaf, space1, many1(operand_leaf)), separated_pair(boosted_leaf, multispace1, many1(operand_leaf)),
|(left, right)| aggregate_binary_expressions(left, right), |(left, right)| aggregate_binary_expressions(left, right),
); );
let whitespace_separated_leaves = map(separated_list1(space1, occur_leaf), |subqueries| { let whitespace_separated_leaves = map(separated_list1(multispace1, occur_leaf), |subqueries| {
if subqueries.len() == 1 { if subqueries.len() == 1 {
let (occur_opt, ast) = subqueries.into_iter().next().unwrap(); let (occur_opt, ast) = subqueries.into_iter().next().unwrap();
match occur_opt.unwrap_or(Occur::Should) { match occur_opt.unwrap_or(Occur::Should) {
@@ -942,9 +948,9 @@ fn ast(inp: &str) -> IResult<&str, UserInputAst> {
}); });
delimited( delimited(
space0, multispace0,
alt((boolean_expr, whitespace_separated_leaves)), alt((boolean_expr, whitespace_separated_leaves)),
space0, multispace0,
)(inp) )(inp)
} }
@@ -969,7 +975,7 @@ fn ast_infallible(inp: &str) -> JResult<&str, UserInputAst> {
} }
pub fn parse_to_ast(inp: &str) -> IResult<&str, UserInputAst> { pub fn parse_to_ast(inp: &str) -> IResult<&str, UserInputAst> {
map(delimited(space0, opt(ast), eof), |opt_ast| { map(delimited(multispace0, opt(ast), eof), |opt_ast| {
rewrite_ast(opt_ast.unwrap_or_else(UserInputAst::empty_query)) rewrite_ast(opt_ast.unwrap_or_else(UserInputAst::empty_query))
})(inp) })(inp)
} }
@@ -1145,6 +1151,7 @@ mod test {
#[test] #[test]
fn test_parse_query_to_ast_binary_op() { fn test_parse_query_to_ast_binary_op() {
test_parse_query_to_ast_helper("a AND b", "(+a +b)"); test_parse_query_to_ast_helper("a AND b", "(+a +b)");
test_parse_query_to_ast_helper("a\nAND b", "(+a +b)");
test_parse_query_to_ast_helper("a OR b", "(?a ?b)"); test_parse_query_to_ast_helper("a OR b", "(?a ?b)");
test_parse_query_to_ast_helper("a OR b AND c", "(?a ?(+b +c))"); test_parse_query_to_ast_helper("a OR b AND c", "(?a ?(+b +c))");
test_parse_query_to_ast_helper("a AND b AND c", "(+a +b +c)"); test_parse_query_to_ast_helper("a AND b AND c", "(+a +b +c)");

View File

@@ -596,10 +596,13 @@ mod tests {
use super::*; use super::*;
use crate::aggregation::agg_req::Aggregations; use crate::aggregation::agg_req::Aggregations;
use crate::aggregation::agg_result::AggregationResults;
use crate::aggregation::tests::{ use crate::aggregation::tests::{
exec_request, exec_request_with_query, exec_request_with_query_and_memory_limit, exec_request, exec_request_with_query, exec_request_with_query_and_memory_limit,
get_test_index_2_segments, get_test_index_from_values, get_test_index_with_num_docs, get_test_index_2_segments, get_test_index_from_values, get_test_index_with_num_docs,
}; };
use crate::aggregation::AggregationCollector;
use crate::query::AllQuery;
#[test] #[test]
fn histogram_test_crooked_values() -> crate::Result<()> { fn histogram_test_crooked_values() -> crate::Result<()> {
@@ -1351,6 +1354,35 @@ mod tests {
}) })
); );
Ok(())
}
#[test]
fn test_aggregation_histogram_empty_index() -> crate::Result<()> {
// test index without segments
let values = vec![];
let index = get_test_index_from_values(false, &values)?;
let agg_req_1: Aggregations = serde_json::from_value(json!({
"myhisto": {
"histogram": {
"field": "score",
"interval": 10.0
},
}
}))
.unwrap();
let collector = AggregationCollector::from_aggs(agg_req_1, Default::default());
let reader = index.reader()?;
let searcher = reader.searcher();
let agg_res: AggregationResults = searcher.search(&AllQuery, &collector).unwrap();
let res: Value = serde_json::from_str(&serde_json::to_string(&agg_res)?)?;
// Make sure the result structure is correct
assert_eq!(res["myhisto"]["buckets"].as_array().unwrap().len(), 0);
Ok(()) Ok(())
} }
} }

View File

@@ -309,7 +309,7 @@ impl TopDocs {
/// ///
/// To comfortably work with `u64`s, `i64`s, `f64`s, or `date`s, please refer to /// To comfortably work with `u64`s, `i64`s, `f64`s, or `date`s, please refer to
/// the [.order_by_fast_field(...)](TopDocs::order_by_fast_field) method. /// the [.order_by_fast_field(...)](TopDocs::order_by_fast_field) method.
fn order_by_u64_field( pub fn order_by_u64_field(
self, self,
field: impl ToString, field: impl ToString,
order: Order, order: Order,

View File

@@ -1,4 +1,4 @@
use core::fmt::Debug; use std::borrow::Cow;
use columnar::{ColumnIndex, DynamicColumn}; use columnar::{ColumnIndex, DynamicColumn};
@@ -14,7 +14,7 @@ use crate::{DocId, Score, TantivyError};
/// All of the matched documents get the score 1.0. /// All of the matched documents get the score 1.0.
#[derive(Clone, Debug)] #[derive(Clone, Debug)]
pub struct ExistsQuery { pub struct ExistsQuery {
field_name: String, field: Cow<'static, str>,
} }
impl ExistsQuery { impl ExistsQuery {
@@ -23,40 +23,42 @@ impl ExistsQuery {
/// This query matches all documents with at least one non-null value in the specified field. /// This query matches all documents with at least one non-null value in the specified field.
/// This constructor never fails, but executing the search with this query will return an /// This constructor never fails, but executing the search with this query will return an
/// error if the specified field doesn't exists or is not a fast field. /// error if the specified field doesn't exists or is not a fast field.
pub fn new_exists_query(field: String) -> ExistsQuery { pub fn new_exists_query<F: Into<Cow<'static, str>>>(field: F) -> ExistsQuery {
ExistsQuery { field_name: field } ExistsQuery {
field: field.into(),
}
} }
} }
impl Query for ExistsQuery { impl Query for ExistsQuery {
fn weight(&self, enable_scoring: EnableScoring) -> crate::Result<Box<dyn Weight>> { fn weight(&self, enable_scoring: EnableScoring) -> crate::Result<Box<dyn Weight>> {
let schema = enable_scoring.schema(); let schema = enable_scoring.schema();
let Some((field, _path)) = schema.find_field(&self.field_name) else { let Some((field, _path)) = schema.find_field(&self.field) else {
return Err(TantivyError::FieldNotFound(self.field_name.clone())); return Err(TantivyError::FieldNotFound(self.field.to_string()));
}; };
let field_type = schema.get_field_entry(field).field_type(); let field_type = schema.get_field_entry(field).field_type();
if !field_type.is_fast() { if !field_type.is_fast() {
return Err(TantivyError::SchemaError(format!( return Err(TantivyError::SchemaError(format!(
"Field {} is not a fast field.", "Field {} is not a fast field.",
self.field_name self.field
))); )));
} }
Ok(Box::new(ExistsWeight { Ok(Box::new(ExistsWeight {
field_name: self.field_name.clone(), field: self.field.clone(),
})) }))
} }
} }
/// Weight associated with the `ExistsQuery` query. /// Weight associated with the `ExistsQuery` query.
pub struct ExistsWeight { pub struct ExistsWeight {
field_name: String, field: Cow<'static, str>,
} }
impl Weight for ExistsWeight { impl Weight for ExistsWeight {
fn scorer(&self, reader: &SegmentReader, boost: Score) -> crate::Result<Box<dyn Scorer>> { fn scorer(&self, reader: &SegmentReader, boost: Score) -> crate::Result<Box<dyn Scorer>> {
let fast_field_reader = reader.fast_fields(); let fast_field_reader = reader.fast_fields();
let dynamic_columns: crate::Result<Vec<DynamicColumn>> = fast_field_reader let dynamic_columns: crate::Result<Vec<DynamicColumn>> = fast_field_reader
.dynamic_column_handles(&self.field_name)? .dynamic_column_handles(&self.field)?
.into_iter() .into_iter()
.map(|handle| handle.open().map_err(|io_error| io_error.into())) .map(|handle| handle.open().map_err(|io_error| io_error.into()))
.collect(); .collect();

View File

@@ -1,3 +1,4 @@
use std::borrow::Cow;
use std::io; use std::io;
use std::net::Ipv6Addr; use std::net::Ipv6Addr;
use std::ops::{Bound, Range}; use std::ops::{Bound, Range};
@@ -68,7 +69,7 @@ use crate::{DateTime, DocId, Score};
/// ``` /// ```
#[derive(Clone, Debug)] #[derive(Clone, Debug)]
pub struct RangeQuery { pub struct RangeQuery {
field: String, field: Cow<'static, str>,
value_type: Type, value_type: Type,
lower_bound: Bound<Vec<u8>>, lower_bound: Bound<Vec<u8>>,
upper_bound: Bound<Vec<u8>>, upper_bound: Bound<Vec<u8>>,
@@ -80,15 +81,15 @@ impl RangeQuery {
/// ///
/// If the value type is not correct, something may go terribly wrong when /// If the value type is not correct, something may go terribly wrong when
/// the `Weight` object is created. /// the `Weight` object is created.
pub fn new_term_bounds( pub fn new_term_bounds<F: Into<Cow<'static, str>>>(
field: String, field: F,
value_type: Type, value_type: Type,
lower_bound: &Bound<Term>, lower_bound: &Bound<Term>,
upper_bound: &Bound<Term>, upper_bound: &Bound<Term>,
) -> RangeQuery { ) -> RangeQuery {
let verify_and_unwrap_term = |val: &Term| val.serialized_value_bytes().to_owned(); let verify_and_unwrap_term = |val: &Term| val.serialized_value_bytes().to_owned();
RangeQuery { RangeQuery {
field, field: field.into(),
value_type, value_type,
lower_bound: map_bound(lower_bound, verify_and_unwrap_term), lower_bound: map_bound(lower_bound, verify_and_unwrap_term),
upper_bound: map_bound(upper_bound, verify_and_unwrap_term), upper_bound: map_bound(upper_bound, verify_and_unwrap_term),
@@ -100,7 +101,7 @@ impl RangeQuery {
/// ///
/// If the field is not of the type `i64`, tantivy /// If the field is not of the type `i64`, tantivy
/// will panic when the `Weight` object is created. /// will panic when the `Weight` object is created.
pub fn new_i64(field: String, range: Range<i64>) -> RangeQuery { pub fn new_i64<F: Into<Cow<'static, str>>>(field: F, range: Range<i64>) -> RangeQuery {
RangeQuery::new_i64_bounds( RangeQuery::new_i64_bounds(
field, field,
Bound::Included(range.start), Bound::Included(range.start),
@@ -115,8 +116,8 @@ impl RangeQuery {
/// ///
/// If the field is not of the type `i64`, tantivy /// If the field is not of the type `i64`, tantivy
/// will panic when the `Weight` object is created. /// will panic when the `Weight` object is created.
pub fn new_i64_bounds( pub fn new_i64_bounds<F: Into<Cow<'static, str>>>(
field: String, field: F,
lower_bound: Bound<i64>, lower_bound: Bound<i64>,
upper_bound: Bound<i64>, upper_bound: Bound<i64>,
) -> RangeQuery { ) -> RangeQuery {
@@ -126,7 +127,7 @@ impl RangeQuery {
.to_owned() .to_owned()
}; };
RangeQuery { RangeQuery {
field, field: field.into(),
value_type: Type::I64, value_type: Type::I64,
lower_bound: map_bound(&lower_bound, make_term_val), lower_bound: map_bound(&lower_bound, make_term_val),
upper_bound: map_bound(&upper_bound, make_term_val), upper_bound: map_bound(&upper_bound, make_term_val),
@@ -138,7 +139,7 @@ impl RangeQuery {
/// ///
/// If the field is not of the type `f64`, tantivy /// If the field is not of the type `f64`, tantivy
/// will panic when the `Weight` object is created. /// will panic when the `Weight` object is created.
pub fn new_f64(field: String, range: Range<f64>) -> RangeQuery { pub fn new_f64<F: Into<Cow<'static, str>>>(field: F, range: Range<f64>) -> RangeQuery {
RangeQuery::new_f64_bounds( RangeQuery::new_f64_bounds(
field, field,
Bound::Included(range.start), Bound::Included(range.start),
@@ -153,8 +154,8 @@ impl RangeQuery {
/// ///
/// If the field is not of the type `f64`, tantivy /// If the field is not of the type `f64`, tantivy
/// will panic when the `Weight` object is created. /// will panic when the `Weight` object is created.
pub fn new_f64_bounds( pub fn new_f64_bounds<F: Into<Cow<'static, str>>>(
field: String, field: F,
lower_bound: Bound<f64>, lower_bound: Bound<f64>,
upper_bound: Bound<f64>, upper_bound: Bound<f64>,
) -> RangeQuery { ) -> RangeQuery {
@@ -164,7 +165,7 @@ impl RangeQuery {
.to_owned() .to_owned()
}; };
RangeQuery { RangeQuery {
field, field: field.into(),
value_type: Type::F64, value_type: Type::F64,
lower_bound: map_bound(&lower_bound, make_term_val), lower_bound: map_bound(&lower_bound, make_term_val),
upper_bound: map_bound(&upper_bound, make_term_val), upper_bound: map_bound(&upper_bound, make_term_val),
@@ -179,8 +180,8 @@ impl RangeQuery {
/// ///
/// If the field is not of the type `u64`, tantivy /// If the field is not of the type `u64`, tantivy
/// will panic when the `Weight` object is created. /// will panic when the `Weight` object is created.
pub fn new_u64_bounds( pub fn new_u64_bounds<F: Into<Cow<'static, str>>>(
field: String, field: F,
lower_bound: Bound<u64>, lower_bound: Bound<u64>,
upper_bound: Bound<u64>, upper_bound: Bound<u64>,
) -> RangeQuery { ) -> RangeQuery {
@@ -190,7 +191,7 @@ impl RangeQuery {
.to_owned() .to_owned()
}; };
RangeQuery { RangeQuery {
field, field: field.into(),
value_type: Type::U64, value_type: Type::U64,
lower_bound: map_bound(&lower_bound, make_term_val), lower_bound: map_bound(&lower_bound, make_term_val),
upper_bound: map_bound(&upper_bound, make_term_val), upper_bound: map_bound(&upper_bound, make_term_val),
@@ -202,8 +203,8 @@ impl RangeQuery {
/// ///
/// If the field is not of the type `ip`, tantivy /// If the field is not of the type `ip`, tantivy
/// will panic when the `Weight` object is created. /// will panic when the `Weight` object is created.
pub fn new_ip_bounds( pub fn new_ip_bounds<F: Into<Cow<'static, str>>>(
field: String, field: F,
lower_bound: Bound<Ipv6Addr>, lower_bound: Bound<Ipv6Addr>,
upper_bound: Bound<Ipv6Addr>, upper_bound: Bound<Ipv6Addr>,
) -> RangeQuery { ) -> RangeQuery {
@@ -213,7 +214,7 @@ impl RangeQuery {
.to_owned() .to_owned()
}; };
RangeQuery { RangeQuery {
field, field: field.into(),
value_type: Type::IpAddr, value_type: Type::IpAddr,
lower_bound: map_bound(&lower_bound, make_term_val), lower_bound: map_bound(&lower_bound, make_term_val),
upper_bound: map_bound(&upper_bound, make_term_val), upper_bound: map_bound(&upper_bound, make_term_val),
@@ -225,7 +226,7 @@ impl RangeQuery {
/// ///
/// If the field is not of the type `u64`, tantivy /// If the field is not of the type `u64`, tantivy
/// will panic when the `Weight` object is created. /// will panic when the `Weight` object is created.
pub fn new_u64(field: String, range: Range<u64>) -> RangeQuery { pub fn new_u64<F: Into<Cow<'static, str>>>(field: F, range: Range<u64>) -> RangeQuery {
RangeQuery::new_u64_bounds( RangeQuery::new_u64_bounds(
field, field,
Bound::Included(range.start), Bound::Included(range.start),
@@ -240,8 +241,8 @@ impl RangeQuery {
/// ///
/// If the field is not of the type `date`, tantivy /// If the field is not of the type `date`, tantivy
/// will panic when the `Weight` object is created. /// will panic when the `Weight` object is created.
pub fn new_date_bounds( pub fn new_date_bounds<F: Into<Cow<'static, str>>>(
field: String, field: F,
lower_bound: Bound<DateTime>, lower_bound: Bound<DateTime>,
upper_bound: Bound<DateTime>, upper_bound: Bound<DateTime>,
) -> RangeQuery { ) -> RangeQuery {
@@ -251,7 +252,7 @@ impl RangeQuery {
.to_owned() .to_owned()
}; };
RangeQuery { RangeQuery {
field, field: field.into(),
value_type: Type::Date, value_type: Type::Date,
lower_bound: map_bound(&lower_bound, make_term_val), lower_bound: map_bound(&lower_bound, make_term_val),
upper_bound: map_bound(&upper_bound, make_term_val), upper_bound: map_bound(&upper_bound, make_term_val),
@@ -263,7 +264,7 @@ impl RangeQuery {
/// ///
/// If the field is not of the type `date`, tantivy /// If the field is not of the type `date`, tantivy
/// will panic when the `Weight` object is created. /// will panic when the `Weight` object is created.
pub fn new_date(field: String, range: Range<DateTime>) -> RangeQuery { pub fn new_date<F: Into<Cow<'static, str>>>(field: F, range: Range<DateTime>) -> RangeQuery {
RangeQuery::new_date_bounds( RangeQuery::new_date_bounds(
field, field,
Bound::Included(range.start), Bound::Included(range.start),
@@ -278,14 +279,14 @@ impl RangeQuery {
/// ///
/// If the field is not of the type `Str`, tantivy /// If the field is not of the type `Str`, tantivy
/// will panic when the `Weight` object is created. /// will panic when the `Weight` object is created.
pub fn new_str_bounds( pub fn new_str_bounds<F: Into<Cow<'static, str>>>(
field: String, field: F,
lower_bound: Bound<&str>, lower_bound: Bound<&str>,
upper_bound: Bound<&str>, upper_bound: Bound<&str>,
) -> RangeQuery { ) -> RangeQuery {
let make_term_val = |val: &&str| val.as_bytes().to_vec(); let make_term_val = |val: &&str| val.as_bytes().to_vec();
RangeQuery { RangeQuery {
field, field: field.into(),
value_type: Type::Str, value_type: Type::Str,
lower_bound: map_bound(&lower_bound, make_term_val), lower_bound: map_bound(&lower_bound, make_term_val),
upper_bound: map_bound(&upper_bound, make_term_val), upper_bound: map_bound(&upper_bound, make_term_val),
@@ -297,7 +298,7 @@ impl RangeQuery {
/// ///
/// If the field is not of the type `Str`, tantivy /// If the field is not of the type `Str`, tantivy
/// will panic when the `Weight` object is created. /// will panic when the `Weight` object is created.
pub fn new_str(field: String, range: Range<&str>) -> RangeQuery { pub fn new_str<F: Into<Cow<'static, str>>>(field: F, range: Range<&str>) -> RangeQuery {
RangeQuery::new_str_bounds( RangeQuery::new_str_bounds(
field, field,
Bound::Included(range.start), Bound::Included(range.start),
@@ -358,7 +359,7 @@ impl Query for RangeQuery {
let lower_bound = map_bound_res(&self.lower_bound, parse_ip_from_bytes)?; let lower_bound = map_bound_res(&self.lower_bound, parse_ip_from_bytes)?;
let upper_bound = map_bound_res(&self.upper_bound, parse_ip_from_bytes)?; let upper_bound = map_bound_res(&self.upper_bound, parse_ip_from_bytes)?;
Ok(Box::new(IPFastFieldRangeWeight::new( Ok(Box::new(IPFastFieldRangeWeight::new(
self.field.to_string(), self.field.clone(),
lower_bound, lower_bound,
upper_bound, upper_bound,
))) )))
@@ -373,14 +374,14 @@ impl Query for RangeQuery {
let lower_bound = map_bound(&self.lower_bound, parse_from_bytes); let lower_bound = map_bound(&self.lower_bound, parse_from_bytes);
let upper_bound = map_bound(&self.upper_bound, parse_from_bytes); let upper_bound = map_bound(&self.upper_bound, parse_from_bytes);
Ok(Box::new(FastFieldRangeWeight::new_u64_lenient( Ok(Box::new(FastFieldRangeWeight::new_u64_lenient(
self.field.to_string(), self.field.clone(),
lower_bound, lower_bound,
upper_bound, upper_bound,
))) )))
} }
} else { } else {
Ok(Box::new(RangeWeight { Ok(Box::new(RangeWeight {
field: self.field.to_string(), field: self.field.clone(),
lower_bound: self.lower_bound.clone(), lower_bound: self.lower_bound.clone(),
upper_bound: self.upper_bound.clone(), upper_bound: self.upper_bound.clone(),
limit: self.limit, limit: self.limit,
@@ -390,7 +391,7 @@ impl Query for RangeQuery {
} }
pub struct RangeWeight { pub struct RangeWeight {
field: String, field: Cow<'static, str>,
lower_bound: Bound<Vec<u8>>, lower_bound: Bound<Vec<u8>>,
upper_bound: Bound<Vec<u8>>, upper_bound: Bound<Vec<u8>>,
limit: Option<u64>, limit: Option<u64>,

View File

@@ -2,6 +2,7 @@
//! We use this variant only if the fastfield exists, otherwise the default in `range_query` is //! We use this variant only if the fastfield exists, otherwise the default in `range_query` is
//! used, which uses the term dictionary + postings. //! used, which uses the term dictionary + postings.
use std::borrow::Cow;
use std::net::Ipv6Addr; use std::net::Ipv6Addr;
use std::ops::{Bound, RangeInclusive}; use std::ops::{Bound, RangeInclusive};
@@ -13,14 +14,18 @@ use crate::{DocId, DocSet, Score, SegmentReader, TantivyError};
/// `IPFastFieldRangeWeight` uses the ip address fast field to execute range queries. /// `IPFastFieldRangeWeight` uses the ip address fast field to execute range queries.
pub struct IPFastFieldRangeWeight { pub struct IPFastFieldRangeWeight {
field: String, field: Cow<'static, str>,
lower_bound: Bound<Ipv6Addr>, lower_bound: Bound<Ipv6Addr>,
upper_bound: Bound<Ipv6Addr>, upper_bound: Bound<Ipv6Addr>,
} }
impl IPFastFieldRangeWeight { impl IPFastFieldRangeWeight {
/// Creates a new IPFastFieldRangeWeight. /// Creates a new IPFastFieldRangeWeight.
pub fn new(field: String, lower_bound: Bound<Ipv6Addr>, upper_bound: Bound<Ipv6Addr>) -> Self { pub fn new(
field: Cow<'static, str>,
lower_bound: Bound<Ipv6Addr>,
upper_bound: Bound<Ipv6Addr>,
) -> Self {
Self { Self {
field, field,
lower_bound, lower_bound,
@@ -171,7 +176,7 @@ pub mod tests {
writer.commit().unwrap(); writer.commit().unwrap();
let searcher = index.reader().unwrap().searcher(); let searcher = index.reader().unwrap().searcher();
let range_weight = IPFastFieldRangeWeight { let range_weight = IPFastFieldRangeWeight {
field: "ips".to_string(), field: Cow::Borrowed("ips"),
lower_bound: Bound::Included(ip_addrs[1]), lower_bound: Bound::Included(ip_addrs[1]),
upper_bound: Bound::Included(ip_addrs[2]), upper_bound: Bound::Included(ip_addrs[2]),
}; };

View File

@@ -2,6 +2,7 @@
//! We use this variant only if the fastfield exists, otherwise the default in `range_query` is //! We use this variant only if the fastfield exists, otherwise the default in `range_query` is
//! used, which uses the term dictionary + postings. //! used, which uses the term dictionary + postings.
use std::borrow::Cow;
use std::ops::{Bound, RangeInclusive}; use std::ops::{Bound, RangeInclusive};
use columnar::{ColumnType, HasAssociatedColumnType, MonotonicallyMappableToU64}; use columnar::{ColumnType, HasAssociatedColumnType, MonotonicallyMappableToU64};
@@ -14,7 +15,7 @@ use crate::{DocId, DocSet, Score, SegmentReader, TantivyError};
/// `FastFieldRangeWeight` uses the fast field to execute range queries. /// `FastFieldRangeWeight` uses the fast field to execute range queries.
#[derive(Clone, Debug)] #[derive(Clone, Debug)]
pub struct FastFieldRangeWeight { pub struct FastFieldRangeWeight {
field: String, field: Cow<'static, str>,
lower_bound: Bound<u64>, lower_bound: Bound<u64>,
upper_bound: Bound<u64>, upper_bound: Bound<u64>,
column_type_opt: Option<ColumnType>, column_type_opt: Option<ColumnType>,
@@ -23,7 +24,7 @@ pub struct FastFieldRangeWeight {
impl FastFieldRangeWeight { impl FastFieldRangeWeight {
/// Create a new FastFieldRangeWeight, using the u64 representation of any fast field. /// Create a new FastFieldRangeWeight, using the u64 representation of any fast field.
pub(crate) fn new_u64_lenient( pub(crate) fn new_u64_lenient(
field: String, field: Cow<'static, str>,
lower_bound: Bound<u64>, lower_bound: Bound<u64>,
upper_bound: Bound<u64>, upper_bound: Bound<u64>,
) -> Self { ) -> Self {
@@ -39,7 +40,7 @@ impl FastFieldRangeWeight {
/// Create a new `FastFieldRangeWeight` for a range of a u64-mappable type . /// Create a new `FastFieldRangeWeight` for a range of a u64-mappable type .
pub fn new<T: HasAssociatedColumnType + MonotonicallyMappableToU64>( pub fn new<T: HasAssociatedColumnType + MonotonicallyMappableToU64>(
field: String, field: Cow<'static, str>,
lower_bound: Bound<T>, lower_bound: Bound<T>,
upper_bound: Bound<T>, upper_bound: Bound<T>,
) -> Self { ) -> Self {
@@ -130,6 +131,7 @@ fn bound_to_value_range<T: MonotonicallyMappableToU64>(
#[cfg(test)] #[cfg(test)]
pub mod tests { pub mod tests {
use std::borrow::Cow;
use std::ops::{Bound, RangeInclusive}; use std::ops::{Bound, RangeInclusive};
use proptest::prelude::*; use proptest::prelude::*;
@@ -214,7 +216,7 @@ pub mod tests {
writer.commit().unwrap(); writer.commit().unwrap();
let searcher = index.reader().unwrap().searcher(); let searcher = index.reader().unwrap().searcher();
let range_query = FastFieldRangeWeight::new_u64_lenient( let range_query = FastFieldRangeWeight::new_u64_lenient(
"test_field".to_string(), Cow::Borrowed("test_field"),
Bound::Included(50_000), Bound::Included(50_000),
Bound::Included(50_002), Bound::Included(50_002),
); );

View File

@@ -189,6 +189,11 @@ struct Page {
impl Page { impl Page {
fn new(page_id: usize) -> Page { fn new(page_id: usize) -> Page {
// We use 32-bits addresses.
// - 20 bits for the in-page addressing
// - 12 bits for the page id.
// This limits us to 2^12 - 1=4095 for the page id.
assert!(page_id < 4096);
Page { Page {
page_id, page_id,
len: 0, len: 0,
@@ -238,6 +243,7 @@ impl Page {
mod tests { mod tests {
use super::MemoryArena; use super::MemoryArena;
use crate::memory_arena::PAGE_SIZE;
#[test] #[test]
fn test_arena_allocate_slice() { fn test_arena_allocate_slice() {
@@ -255,6 +261,31 @@ mod tests {
assert_eq!(arena.slice(addr_b, b.len()), b); assert_eq!(arena.slice(addr_b, b.len()), b);
} }
#[test]
fn test_arena_allocate_end_of_page() {
let mut arena = MemoryArena::default();
// A big block
let len_a = PAGE_SIZE - 2;
let addr_a = arena.allocate_space(len_a);
*arena.slice_mut(addr_a, len_a).last_mut().unwrap() = 1;
// Single bytes
let addr_b = arena.allocate_space(1);
arena.slice_mut(addr_b, 1)[0] = 2;
let addr_c = arena.allocate_space(1);
arena.slice_mut(addr_c, 1)[0] = 3;
let addr_d = arena.allocate_space(1);
arena.slice_mut(addr_d, 1)[0] = 4;
assert_eq!(arena.slice(addr_a, len_a)[len_a - 1], 1);
assert_eq!(arena.slice(addr_b, 1)[0], 2);
assert_eq!(arena.slice(addr_c, 1)[0], 3);
assert_eq!(arena.slice(addr_d, 1)[0], 4);
}
#[derive(Clone, Copy, Debug, Eq, PartialEq)] #[derive(Clone, Copy, Debug, Eq, PartialEq)]
struct MyTest { struct MyTest {
pub a: usize, pub a: usize,

View File

@@ -295,6 +295,8 @@ impl SharedArenaHashMap {
/// will be in charge of returning a default value. /// will be in charge of returning a default value.
/// If the key already as an associated value, then it will be passed /// If the key already as an associated value, then it will be passed
/// `Some(previous_value)`. /// `Some(previous_value)`.
///
/// The key will be truncated to u16::MAX bytes.
#[inline] #[inline]
pub fn mutate_or_create<V>( pub fn mutate_or_create<V>(
&mut self, &mut self,
@@ -308,6 +310,8 @@ impl SharedArenaHashMap {
if self.is_saturated() { if self.is_saturated() {
self.resize(); self.resize();
} }
// Limit the key size to u16::MAX
let key = &key[..std::cmp::min(key.len(), u16::MAX as usize)];
let hash = self.get_hash(key); let hash = self.get_hash(key);
let mut probe = self.probe(hash); let mut probe = self.probe(hash);
let mut bucket = probe.next_probe(); let mut bucket = probe.next_probe();
@@ -379,6 +383,36 @@ mod tests {
} }
assert_eq!(vanilla_hash_map.len(), 2); assert_eq!(vanilla_hash_map.len(), 2);
} }
#[test]
fn test_long_key_truncation() {
// Keys longer than u16::MAX are truncated.
let mut memory_arena = MemoryArena::default();
let mut hash_map: SharedArenaHashMap = SharedArenaHashMap::default();
let key1 = (0..u16::MAX as usize).map(|i| i as u8).collect::<Vec<_>>();
hash_map.mutate_or_create(&key1, &mut memory_arena, |opt_val: Option<u32>| {
assert_eq!(opt_val, None);
4u32
});
// Due to truncation, this key is the same as key1
let key2 = (0..u16::MAX as usize + 1)
.map(|i| i as u8)
.collect::<Vec<_>>();
hash_map.mutate_or_create(&key2, &mut memory_arena, |opt_val: Option<u32>| {
assert_eq!(opt_val, Some(4));
3u32
});
let mut vanilla_hash_map = HashMap::new();
let iter_values = hash_map.iter(&memory_arena);
for (key, addr) in iter_values {
let val: u32 = memory_arena.read(addr);
vanilla_hash_map.insert(key.to_owned(), val);
assert_eq!(key.len(), key1[..].len());
assert_eq!(key, &key1[..])
}
assert_eq!(vanilla_hash_map.len(), 1); // Both map to the same key
}
#[test] #[test]
fn test_empty_hashmap() { fn test_empty_hashmap() {
let memory_arena = MemoryArena::default(); let memory_arena = MemoryArena::default();