2.8 KiB
Integrations
Built on top of Apache Arrow, LanceDB is easy to integrate with the Python ecosystem, including Pandas, PyArrow and DuckDB.
Pandas and PyArrow
First, we need to connect to a LanceDB database.
import lancedb
db = lancedb.connect("/tmp/lancedb")
And write a Pandas DataFrame to LanceDB directly.
import pandas as pd
data = pd.DataFrame({
"vector": [[3.1, 4.1], [5.9, 26.5]],
"item": ["foo", "bar"],
"price": [10.0, 20.0]
})
table = db.create_table("pd_table", data=data)
# Optionally, create a IVF_PQ index
table.create_index(num_partitions=256, num_sub_vectors=96)
You will find detailed instructions of creating dataset and index in Basic Operations and Indexing sections.
We can now perform similarity searches via LanceDB.
# Open the table previously created.
table = db.open_table("pd_table")
query_vector = [100, 100]
# Pandas DataFrame
df = table.search(query_vector).limit(1).to_df()
print(df)
vector item price score
0 [5.9, 26.5] bar 20.0 14257.05957
If you have a simple filter, it's faster to provide a where clause to LanceDB's search query.
If you have more complex criteria, you can always apply the filter to the resulting pandas DataFrame from the search query.
# Apply the filter via LanceDB
results = table.search([100, 100]).where("price < 15").to_df()
assert len(results) == 1
assert results["item"].iloc[0] == "foo"
# Apply the filter via Pandas
df = results = table.search([100, 100]).to_df()
results = df[df.price < 15]
assert len(results) == 1
assert results["item"].iloc[0] == "foo"
DuckDB
LanceDB works with DuckDB via PyArrow integration.
Let us start with installing duckdb and lancedb.
pip install duckdb lancedb
We will re-use the dataset created previously
import lancedb
db = lancedb.connect("/tmp/lancedb")
table = db.open_table("pd_table")
arrow_table = table.to_arrow()
DuckDB can directly query the arrow_table:
In [15]: duckdb.query("SELECT * FROM t")
Out[15]:
┌─────────────┬─────────┬────────┐
│ vector │ item │ price │
│ float[] │ varchar │ double │
├─────────────┼─────────┼────────┤
│ [3.1, 4.1] │ foo │ 10.0 │
│ [5.9, 26.5] │ bar │ 20.0 │
└─────────────┴─────────┴────────┘
In [16]: duckdb.query("SELECT mean(price) FROM t")
Out[16]:
┌─────────────┐
│ mean(price) │
│ double │
├─────────────┤
│ 15.0 │
└─────────────┘