doc(python): document the method in fts (#982)

Co-authored-by: prrao87 <prrao87@gmail.com>
Co-authored-by: Prashanth Rao <35005448+prrao87@users.noreply.github.com>
This commit is contained in:
Chang She
2024-03-04 16:42:24 -08:00
committed by Weston Pace
parent b5326d31e9
commit 10089481c0
4 changed files with 87 additions and 16 deletions

View File

@@ -75,21 +75,40 @@ applied on top of the full text search results. This can be invoked via the fami
table.search("puppy").limit(10).where("meta='foo'").to_list()
```
## Syntax
## Phrase queries vs. terms queries
For full-text search you can perform either a phrase query like "the old man and the sea",
or a structured search query like "(Old AND Man) AND Sea".
Double quotes are used to disambiguate.
For full-text search you can specify either a **phrase** query like `"the old man and the sea"`,
or a **terms** search query like `"(Old AND Man) AND Sea"`. For more details on the terms
query syntax, see Tantivy's [query parser rules](https://docs.rs/tantivy/latest/tantivy/query/struct.QueryParser.html).
For example:
!!! tip "Note"
The query parser will raise an exception on queries that are ambiguous. For example, in the query `they could have been dogs OR cats`, `OR` is capitalized so it's considered a keyword query operator. But it's ambiguous how the left part should be treated. So if you submit this search query as is, you'll get `Syntax Error: they could have been dogs OR cats`.
If you intended "they could have been dogs OR cats" as a phrase query, this actually
raises a syntax error since `OR` is a recognized operator. If you make `or` lower case,
this avoids the syntax error. However, it is cumbersome to have to remember what will
conflict with the query syntax. Instead, if you search using
`table.search('"they could have been dogs OR cats"')`, then the syntax checker avoids
checking inside the quotes.
```py
# This raises a syntax error
table.search("they could have been dogs OR cats")
```
On the other hand, lowercasing `OR` to `or` will work, because there are no capitalized logical operators and
the query is treated as a phrase query.
```py
# This works!
table.search("they could have been dogs or cats")
```
It can be cumbersome to have to remember what will cause a syntax error depending on the type of
query you want to perform. To make this simpler, when you want to perform a phrase query, you can
enforce it in one of two ways:
1. Place the double-quoted query inside single quotes. For example, `table.search('"they could have been dogs OR cats"')` is treated as
a phrase query.
2. Explicitly declare the `phrase_query()` method. This is useful when you have a phrase query that
itself contains double quotes. For example, `table.search('the cats OR dogs were not really "pets" at all').phrase_query()`
is treated as a phrase query.
In general, a query that's declared as a phrase query will be wrapped in double quotes during parsing, with nested
double quotes replaced by single quotes.
## Configurations

View File

@@ -24,6 +24,12 @@ pip install lancedb
::: lancedb.query.LanceQueryBuilder
::: lancedb.query.LanceVectorQueryBuilder
::: lancedb.query.LanceFtsQueryBuilder
::: lancedb.query.LanceHybridQueryBuilder
## Embeddings
::: lancedb.embeddings.registry.EmbeddingFunctionRegistry
@@ -62,10 +68,22 @@ pip install lancedb
## Integrations
### Pydantic
## Pydantic
::: lancedb.pydantic.pydantic_to_schema
::: lancedb.pydantic.vector
::: lancedb.pydantic.LanceModel
## Reranking
::: lancedb.rerankers.linear_combination.LinearCombinationReranker
::: lancedb.rerankers.cohere.CohereReranker
::: lancedb.rerankers.colbert.ColbertReranker
::: lancedb.rerankers.cross_encoder.CrossEncoderReranker
::: lancedb.rerankers.openai.OpenaiReranker