Files
lancedb/docs/src/python/duckdb.md
Will Jones 2447372c1f docs: show DuckDB with dataset, not table (#974)
Using datasets is preferred way to allow filter and projection pushdown,
as well as aggregated larger-than-memory tables.
2024-02-16 09:18:18 -08:00

2.3 KiB

DuckDB

In Python, LanceDB tables can also be queried with DuckDB, an in-process SQL OLAP database. This means you can write complex SQL queries to analyze your data in LanceDB.

This integration is done via Apache Arrow, which provides zero-copy data sharing between LanceDB and DuckDB. DuckDB is capable of passing down column selections and basic filters to LanceDB, reducing the amount of data that needs to be scanned to perform your query. Finally, the integration allows streaming data from LanceDB tables, allowing you to aggregate tables that won't fit into memory. All of this uses the same mechanism described in DuckDB's blog post DuckDB quacks Arrow.

We can demonstrate this by first installing duckdb and lancedb.

pip install duckdb lancedb

We will re-use the dataset created previously:

import lancedb

db = lancedb.connect("data/sample-lancedb")
data = [
    {"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
    {"vector": [5.9, 26.5], "item": "bar", "price": 20.0}
]
table = db.create_table("pd_table", data=data)

To query the table, first call to_lance to convert the table to a "dataset", which is an object that can be queried by DuckDB. Then all you need to do is reference that dataset by the same name in your SQL query.

import duckdb

arrow_table = table.to_lance()

duckdb.query("SELECT * FROM arrow_table")
┌─────────────┬─────────┬────────┐
│   vector    │  item   │ price  │
│   float[]   │ varchar │ double │
├─────────────┼─────────┼────────┤
│ [3.1, 4.1]  │ foo     │   10.0 │
│ [5.9, 26.5] │ bar     │   20.0 │
└─────────────┴─────────┴────────┘

You can very easily run any other DuckDB SQL queries on your data.

duckdb.query("SELECT mean(price) FROM arrow_table")
┌─────────────┐
│ mean(price) │
│   double    │
├─────────────┤
│        15.0 │
└─────────────┘