Merge pull request #31 from lancedb/lei/doc

[Doc] Pandas, Parrow, DuckDB integration
2026-05-26 00:10:39 +00:00 · 2023-04-19 14:55:42 -07:00
parent d7fb2b1d6b 23d4e3561f
commit ec197b1855
3 changed files with 147 additions and 0 deletions
--- a/docs/src/integrations.md
+++ b/docs/src/integrations.md
@@ -0,0 +1,111 @@
+# Integrations
+
+Built on top of Apache Arrow, `LanceDB` is easy to integrate with the Python ecosystem, including Pandas, PyArrow and DuckDB.
+
+## Pandas and PyArrow
+
+First, we need to connect to a `LanceDB` database.
+
+``` py
+
+import lancedb
+
+db = lancedb.connect("/tmp/lancedb")
+```
+
+And write a `Pandas DataFrame` to LanceDB directly.
+
+```py
+import pandas as pd
+
+data = pd.DataFrame({
+    "vector": [[3.1, 4.1], [5.9, 26.5]],
+    "item": ["foo", "bar"],
+    "price": [10.0, 20.0]
+})
+table = db.create_table("pd_table", data=data)
+
+# Optionally, create a IVF_PQ index
+table.create_index(num_partitions=256, num_sub_vectors=96)
+```
+
+You will find detailed instructions of creating dataset and index in [Basic Operations](basic.md) and [Indexing](indexing.md)
+sections.
+
+
+We can now perform similarity searches via `LanceDB`.
+
+```py
+# Open the table previously created.
+table = db.open_table("pd_table")
+
+query_vector = [100, 100]
+# Pandas DataFrame
+df = table.search(query_vector).limit(1).to_df()
+print(df)
+```
+
+```
+    vector     item  price        score
+0  [5.9, 26.5]  bar   20.0  14257.05957
+```
+
+If you have a simple filter, it's faster to provide a where clause to `LanceDB`'s search query.
+If you have more complex criteria, you can always apply the filter to the resulting pandas `DataFrame` from the search query.
+
+```python
+
+# Apply the filter via LanceDB
+results = table.search([100, 100]).where("price < 15").to_df()
+assert len(results) == 1
+assert results["item"].iloc[0] == "foo"
+
+# Apply the filter via Pandas
+df = results = table.search([100, 100]).to_df()
+results = df[df.price < 15]
+assert len(results) == 1
+assert results["item"].iloc[0] == "foo"
+```
+
+## DuckDB
+
+`LanceDB` works with `DuckDB` via [PyArrow integration](https://duckdb.org/docs/guides/python/sql_on_arrow).
+
+Let us start with installing `duckdb` and `lancedb`.
+
+```shell
+pip install duckdb lancedb
+```
+
+We will re-use the dataset created previously
+
+```python
+import lancedb
+
+db = lancedb.connect("/tmp/lancedb")
+table = db.open_table("pd_table")
+arrow_table = table.to_arrow()
+```
+
+`DuckDB` can directly query the `arrow_table`:
+
+```python
+In [15]: duckdb.query("SELECT * FROM t")
+Out[15]:
+┌─────────────┬─────────┬────────┐
+│   vector    │  item   │ price  │
+│   float[]   │ varchar │ double │
+├─────────────┼─────────┼────────┤
+│ [3.1, 4.1]  │ foo     │   10.0 │
+│ [5.9, 26.5] │ bar     │   20.0 │
+└─────────────┴─────────┴────────┘
+
+In [16]: duckdb.query("SELECT mean(price) FROM t")
+Out[16]:
+┌─────────────┐
+│ mean(price) │
+│   double    │
+├─────────────┤
+│        15.0 │
+└─────────────┘
+```