chore(python): document phrase queries in fts (#788)

closes #769 

Add unit test and documentation on using quotes to perform a phrase
query
This commit is contained in:
Chang She
2024-01-08 21:49:31 -08:00
committed by Andrew Miracle
parent a649b3b1e4
commit 615c469af2
2 changed files with 33 additions and 0 deletions

View File

@@ -75,6 +75,22 @@ applied on top of the full text search results. This can be invoked via the fami
table.search("puppy").limit(10).where("meta='foo'").to_list()
```
## Syntax
For full-text search you can perform either a phrase query like "the old man and the sea",
or a structured search query like "(Old AND Man) AND Sea".
Double quotes are used to disambiguate.
For example:
If you intended "they could have been dogs OR cats" as a phrase query, this actually
raises a syntax error since `OR` is a recognized operator. If you make `or` lower case,
this avoids the syntax error. However, it is cumbersome to have to remember what will
conflict with the query syntax. Instead, if you search using
`table.search('"they could have been dogs OR cats"')`, then the syntax checker avoids
checking inside the quotes.
## Configurations
By default, LanceDB configures a 1GB heap size limit for creating the index. You can

View File

@@ -162,3 +162,20 @@ def test_null_input(table):
]
)
table.create_fts_index("text")
def test_syntax(table):
# https://github.com/lancedb/lancedb/issues/769
table.create_fts_index("text")
with pytest.raises(ValueError, match="Syntax Error"):
table.search("they could have been dogs OR cats").limit(10).to_list()
# this should work
table.search('"they could have been dogs OR cats"').limit(10).to_list()
# this should work too
table.search('''"the cats OR dogs were not really 'pets' at all"''').limit(
10
).to_list()
with pytest.raises(ValueError, match="Syntax Error"):
table.search('''"the cats OR dogs were not really "pets" at all"''').limit(
10
).to_list()