Add `to_list` to return query results as list of python dict (so we're not too pandas-centric). Closes #555 Add `to_pandas` API and add deprecation warning on `to_df`. Closes #545 Co-authored-by: Chang She <chang@lancedb.com>
1.6 KiB
[EXPERIMENTAL] Full text search
LanceDB now provides experimental support for full text search. This is currently Python only. We plan to push the integration down to Rust in the future to make this available for JS as well.
Installation
To use full text search, you must install the dependency tantivy-py:
tantivy 0.20.1
pip install tantivy==0.20.1
Quickstart
Assume:
tableis a LanceDB Tabletextis the name of theTablecolumn that we want to index
For example,
import lancedb
uri = "data/sample-lancedb"
db = lancedb.connect(uri)
table = db.create_table("my_table",
data=[{"vector": [3.1, 4.1], "text": "Frodo was a happy puppy"},
{"vector": [5.9, 26.5], "text": "There are several kittens playing"}])
To create the index:
table.create_fts_index("text")
To search:
table.search("puppy").limit(10).select(["text"]).to_list()
Which returns a list of dictionaries:
[{'text': 'Frodo was a happy puppy', 'score': 0.6931471824645996}]
LanceDB automatically looks for an FTS index if the input is str.
Multiple text columns
If you have multiple columns to index, pass them all as a list to create_fts_index:
table.create_fts_index(["text1", "text2"])
Note that the search API call does not change - you can search over all indexed columns at once.
Current limitations
-
Currently we do not yet support incremental writes. If you add data after fts index creation, it won't be reflected in search results until you do a full reindex.
-
We currently only support local filesystem paths for the fts index.