Files
lancedb/python
Chang She c8728d4ca1 feat(python): add post filtering for full text search (#739)
Closes #721 

fts will return results as a pyarrow table. Pyarrow tables has a
`filter` method but it does not take sql filter strings (only pyarrow
compute expressions). Instead, we do one of two things to support
`tbl.search("keywords").where("foo=5").limit(10).to_arrow()`:

Default path: If duckdb is available then use duckdb to execute the sql
filter string on the pyarrow table.
Backup path: Otherwise, write the pyarrow table to a lance dataset and
then do `to_table(filter=<filter>)`

Neither is ideal. 
Default path has two issues:
1. requires installing an extra library (duckdb)
2. duckdb mangles some fields (like fixed size list => list)

Backup path incurs a latency penalty (~20ms on ssd) to write the
resultset to disk.

In the short term, once #676 is addressed, we can write the dataset to
"memory://" instead of disk, this makes the post filter evaluate much
quicker (ETA next week).

In the longer term, we'd like to be able to evaluate the filter string
on the pyarrow Table directly, one possibility being that we use
Substrait to generate pyarrow compute expressions from sql string. Or if
there's enough progress on pyarrow, it could support Substrait
expressions directly (no ETA)

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2023-12-27 09:31:04 -08:00
..
2023-10-14 12:38:43 -07:00
2023-03-22 19:46:15 -07:00

LanceDB

A Python library for LanceDB.

Installation

pip install lancedb

Usage

Basic Example

import lancedb
db = lancedb.connect('<PATH_TO_LANCEDB_DATASET>')
table = db.open_table('my_table')
results = table.search([0.1, 0.3]).limit(20).to_list()
print(results)

Development

Create a virtual environment and activate it:

python -m venv venv
. ./venv/bin/activate

Install the necessary packages:

python -m pip install .

To run the unit tests:

pytest

To run linter and automatically fix all errors:

black .
isort .

If any packages are missing, install them with:

pip install <PACKAGE_NAME>

For Windows users, there may be errors when installing packages, so these commands may be helpful:

Activate the virtual environment:

. .\venv\Scripts\activate

You may need to run the installs separately:

pip install -e .[tests]
pip install -e .[dev]

tantivy requires rust to be installed, so install it with conda, as it doesn't support windows installation:

pip install wheel
pip install cargo
conda install rust
pip install tantivy

To run the unit tests:

pytest