Files
lancedb/docs/src/python/pandas_and_pyarrow.md
QianZhu 17c9e9afea docs: add async examples to doc (#1941)
- added sync and async tabs for python examples
- moved python code to tests/docs

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2025-01-07 15:10:25 -08:00

3.1 KiB

Pandas and PyArrow

Because Lance is built on top of Apache Arrow, LanceDB is tightly integrated with the Python data ecosystem, including Pandas and PyArrow. The sequence of steps in a typical workflow is shown below.

Create dataset

First, we need to connect to a LanceDB database.

=== "Sync API"

```python
--8<-- "python/python/tests/docs/test_python.py:import-lancedb"
--8<-- "python/python/tests/docs/test_python.py:connect_to_lancedb"
```

=== "Async API"

```python
--8<-- "python/python/tests/docs/test_python.py:import-lancedb"
--8<-- "python/python/tests/docs/test_python.py:connect_to_lancedb_async"
```

We can load a Pandas DataFrame to LanceDB directly.

=== "Sync API"

```python
--8<-- "python/python/tests/docs/test_python.py:import-pandas"
--8<-- "python/python/tests/docs/test_python.py:create_table_pandas"
```

=== "Async API"

```python
--8<-- "python/python/tests/docs/test_python.py:import-pandas"
--8<-- "python/python/tests/docs/test_python.py:create_table_pandas_async"
```

Similar to the pyarrow.write_dataset() method, LanceDB's db.create_table() accepts data in a variety of forms.

If you have a dataset that is larger than memory, you can create a table with Iterator[pyarrow.RecordBatch] to lazily load the data:

=== "Sync API"

```python
--8<-- "python/python/tests/docs/test_python.py:import-iterable"
--8<-- "python/python/tests/docs/test_python.py:import-pyarrow"
--8<-- "python/python/tests/docs/test_python.py:make_batches"
--8<-- "python/python/tests/docs/test_python.py:create_table_iterable"
```

=== "Async API"

```python
--8<-- "python/python/tests/docs/test_python.py:import-iterable"
--8<-- "python/python/tests/docs/test_python.py:import-pyarrow"
--8<-- "python/python/tests/docs/test_python.py:make_batches"
--8<-- "python/python/tests/docs/test_python.py:create_table_iterable_async"
```

You will find detailed instructions of creating a LanceDB dataset in Getting Started and API sections.

We can now perform similarity search via the LanceDB Python API.

=== "Sync API"

```python
--8<-- "python/python/tests/docs/test_python.py:vector_search"
```

=== "Async API"

```python
--8<-- "python/python/tests/docs/test_python.py:vector_search_async"
```
    vector     item  price    _distance
0  [5.9, 26.5]  bar   20.0  14257.05957

If you have a simple filter, it's faster to provide a where clause to LanceDB's search method. For more complex filters or aggregations, you can always resort to using the underlying DataFrame methods after performing a search.

=== "Sync API"

```python
--8<-- "python/python/tests/docs/test_python.py:vector_search_with_filter"
```

=== "Async API"

```python
--8<-- "python/python/tests/docs/test_python.py:vector_search_with_filter_async"
```