mirror of
https://github.com/lancedb/lancedb.git
synced 2026-01-06 03:42:57 +00:00
56 lines
1.4 KiB
Markdown
56 lines
1.4 KiB
Markdown
# DuckDB
|
|
|
|
`LanceDB` works with `DuckDB` via [PyArrow integration](https://duckdb.org/docs/guides/python/sql_on_arrow).
|
|
|
|
Let us start with installing `duckdb` and `lancedb`.
|
|
|
|
```shell
|
|
pip install duckdb lancedb
|
|
```
|
|
|
|
We will re-use [the dataset created previously](./arrow.md):
|
|
|
|
```python
|
|
import pandas as pd
|
|
import lancedb
|
|
|
|
db = lancedb.connect("data/sample-lancedb")
|
|
data = pd.DataFrame({
|
|
"vector": [[3.1, 4.1], [5.9, 26.5]],
|
|
"item": ["foo", "bar"],
|
|
"price": [10.0, 20.0]
|
|
})
|
|
table = db.create_table("pd_table", data=data)
|
|
arrow_table = table.to_arrow()
|
|
```
|
|
|
|
`DuckDB` can directly query the `arrow_table`:
|
|
|
|
```python
|
|
import duckdb
|
|
|
|
duckdb.query("SELECT * FROM arrow_table")
|
|
```
|
|
|
|
```
|
|
┌─────────────┬─────────┬────────┐
|
|
│ vector │ item │ price │
|
|
│ float[] │ varchar │ double │
|
|
├─────────────┼─────────┼────────┤
|
|
│ [3.1, 4.1] │ foo │ 10.0 │
|
|
│ [5.9, 26.5] │ bar │ 20.0 │
|
|
└─────────────┴─────────┴────────┘
|
|
```
|
|
|
|
```py
|
|
duckdb.query("SELECT mean(price) FROM arrow_table")
|
|
```
|
|
|
|
```
|
|
┌─────────────┐
|
|
│ mean(price) │
|
|
│ double │
|
|
├─────────────┤
|
|
│ 15.0 │
|
|
└─────────────┘
|
|
``` |