docs: add the async python API to the docs (#1156)

2026-05-26 00:10:39 +00:00 · 2024-03-26 07:54:16 -05:00
parent 98c1e635b3
commit c37a28abbd
13 changed files with 624 additions and 400 deletions
--- a/docs/src/basic.md
+++ b/docs/src/basic.md
@@ -48,11 +48,20 @@

 === "Python"

-      ```python
-      import lancedb
-      uri = "data/sample-lancedb"
-      db = lancedb.connect(uri)
-      ```
+    ```python
+    --8<-- "python/python/tests/docs/test_basic.py:imports"
+    --8<-- "python/python/tests/docs/test_basic.py:connect"
+
+    --8<-- "python/python/tests/docs/test_basic.py:connect_async"
+    ```
+
+    !!! note "Asynchronous Python API"
+
+        The asynchronous Python API is new and has some slight differences compared
+        to the synchronous API.  Feel free to start using the asynchronous version.
+        Once all features have migrated we will start to move the synchronous API to
+        use the same syntax as the asynchronous API.  To help with this migration we
+        have created a [migration guide](migration.md) detailing the differences.

 === "Typescript"

@@ -82,15 +91,14 @@ If you need a reminder of the uri, you can call `db.uri()`.
 ### Create a table from initial data

 If you have data to insert into the table at creation time, you can simultaneously create a
-table and insert the data into it.  The schema of the data will be used as the schema of the
+table and insert the data into it. The schema of the data will be used as the schema of the
 table.

 === "Python"

    ```python
-    tbl = db.create_table("my_table",
-                    data=[{"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
-                          {"vector": [5.9, 26.5], "item": "bar", "price": 20.0}])
+    --8<-- "python/python/tests/docs/test_basic.py:create_table"
+    --8<-- "python/python/tests/docs/test_basic.py:create_table_async"
    ```

    If the table already exists, LanceDB will raise an error by default.
@@ -100,10 +108,8 @@ table.
    You can also pass in a pandas DataFrame directly:

    ```python
-    import pandas as pd
-    df = pd.DataFrame([{"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
-                       {"vector": [5.9, 26.5], "item": "bar", "price": 20.0}])
-    tbl = db.create_table("table_from_df", data=df)
+    --8<-- "python/python/tests/docs/test_basic.py:create_table_pandas"
+    --8<-- "python/python/tests/docs/test_basic.py:create_table_async_pandas"
    ```

 === "Typescript"
@@ -138,15 +144,14 @@ table.

 Sometimes you may not have the data to insert into the table at creation time.
 In this case, you can create an empty table and specify the schema, so that you can add
-data to the table at a later time (as long as it conforms to the schema).  This is
+data to the table at a later time (as long as it conforms to the schema). This is
 similar to a `CREATE TABLE` statement in SQL.

 === "Python"

      ```python
-      import pyarrow as pa
-      schema = pa.schema([pa.field("vector", pa.list_(pa.float32(), list_size=2))])
-      tbl = db.create_table("empty_table", schema=schema)
+      --8<-- "python/python/tests/docs/test_basic.py:create_empty_table"
+      --8<-- "python/python/tests/docs/test_basic.py:create_empty_table_async"
      ```

 === "Typescript"
@@ -168,7 +173,8 @@ Once created, you can open a table as follows:
 === "Python"

    ```python
-    tbl = db.open_table("my_table")
+    --8<-- "python/python/tests/docs/test_basic.py:open_table"
+    --8<-- "python/python/tests/docs/test_basic.py:open_table_async"
    ```

 === "Typescript"
@@ -188,7 +194,8 @@ If you forget the name of your table, you can always get a listing of all table
 === "Python"

    ```python
-    print(db.table_names())
+    --8<-- "python/python/tests/docs/test_basic.py:table_names"
+    --8<-- "python/python/tests/docs/test_basic.py:table_names_async"
    ```

 === "Javascript"
@@ -210,15 +217,8 @@ After a table has been created, you can always add more data to it as follows:
 === "Python"

    ```python
-
-    # Option 1: Add a list of dicts to a table
-    data = [{"vector": [1.3, 1.4], "item": "fizz", "price": 100.0},
-            {"vector": [9.5, 56.2], "item": "buzz", "price": 200.0}]
-    tbl.add(data)
-
-    # Option 2: Add a pandas DataFrame to a table
-    df = pd.DataFrame(data)
-    tbl.add(data)
+    --8<-- "python/python/tests/docs/test_basic.py:add_data"
+    --8<-- "python/python/tests/docs/test_basic.py:add_data_async"
    ```

 === "Typescript"
@@ -240,7 +240,8 @@ Once you've embedded the query, you can find its nearest neighbors as follows:
 === "Python"

    ```python
-    tbl.search([100, 100]).limit(2).to_pandas()
+    --8<-- "python/python/tests/docs/test_basic.py:vector_search"
+    --8<-- "python/python/tests/docs/test_basic.py:vector_search_async"
    ```

    This returns a pandas DataFrame with the results.
@@ -274,7 +275,8 @@ LanceDB allows you to create an ANN index on a table as follows:
 === "Python"

    ```py
-    tbl.create_index()
+    --8<-- "python/python/tests/docs/test_basic.py:create_index"
+    --8<-- "python/python/tests/docs/test_basic.py:create_index_async"
    ```

 === "Typescript"
@@ -286,15 +288,15 @@ LanceDB allows you to create an ANN index on a table as follows:
 === "Rust"

    ```rust
-     --8<-- "rust/lancedb/examples/simple.rs:create_index"
+    --8<-- "rust/lancedb/examples/simple.rs:create_index"
    ```

 !!! note "Why do I need to create an index manually?"
-    LanceDB does not automatically create the ANN index for two reasons. The first is that it's optimized
-    for really fast retrievals via a disk-based index, and the second is that data and query workloads can
-    be very diverse, so there's no one-size-fits-all index configuration. LanceDB provides many parameters
-    to fine-tune index size, query latency and accuracy. See the section on
-    [ANN indexes](ann_indexes.md) for more details.
+LanceDB does not automatically create the ANN index for two reasons. The first is that it's optimized
+for really fast retrievals via a disk-based index, and the second is that data and query workloads can
+be very diverse, so there's no one-size-fits-all index configuration. LanceDB provides many parameters
+to fine-tune index size, query latency and accuracy. See the section on
+[ANN indexes](ann_indexes.md) for more details.

 ## Delete rows from a table

@@ -305,7 +307,8 @@ This can delete any number of rows that match the filter.
 === "Python"

    ```python
-    tbl.delete('item = "fizz"')
+    --8<-- "python/python/tests/docs/test_basic.py:delete_rows"
+    --8<-- "python/python/tests/docs/test_basic.py:delete_rows_async"
    ```

 === "Typescript"
@@ -322,7 +325,7 @@ This can delete any number of rows that match the filter.

 The deletion predicate is a SQL expression that supports the same expressions
 as the `where()` clause (`only_if()` in Rust) on a search. They can be as
-simple or complex as needed.  To see what expressions are supported, see the
+simple or complex as needed. To see what expressions are supported, see the
 [SQL filters](sql.md) section.

 === "Python"
@@ -344,7 +347,8 @@ Use the `drop_table()` method on the database to remove a table.
 === "Python"

      ```python
-      db.drop_table("my_table")
+      --8<-- "python/python/tests/docs/test_basic.py:drop_table"
+      --8<-- "python/python/tests/docs/test_basic.py:drop_table_async"
      ```

      This permanently removes the table and is not recoverable, unlike deleting rows.
--- a/docs/src/migration.md
+++ b/docs/src/migration.md
@@ -0,0 +1,76 @@
+# Rust-backed Client Migration Guide
+
+In an effort to ensure all clients have the same set of capabilities we have begun migrating the
+python and node clients onto a common Rust base library. In python, this new client is part of
+the same lancedb package, exposed as an asynchronous client. Once the asynchronous client has
+reached full functionality we will begin migrating the synchronous library to be a thin wrapper
+around the asynchronous client.
+
+This guide describes the differences between the two APIs and will hopefully assist users
+that would like to migrate to the new API.
+
+## Closeable Connections
+
+The Connection now has a `close` method. You can call this when
+you are done with the connection to eagerly free resources. Currently
+this is limited to freeing/closing the HTTP connection for remote
+connections. In the future we may add caching or other resources to
+native connections so this is probably a good practice even if you
+aren't using remote connections.
+
+In addition, the connection can be used as a context manager which may
+be a more convenient way to ensure the connection is closed.
+
+```python
+import lancedb
+
+async def my_async_fn():
+    with await lancedb.connect_async("my_uri") as db:
+        print(await db.table_names())
+```
+
+It is not mandatory to call the `close` method. If you do not call it
+then the connection will be closed when the object is garbage collected.
+
+## Closeable Table
+
+The Table now also has a `close` method, similar to the connection. This
+can be used to eagerly free the cache used by a Table object. Similar to
+the connection, it can be used as a context manager and it is not mandatory
+to call the `close` method.
+
+### Changes to Table APIs
+
+- Previously `Table.schema` was a property. Now it is an async method.
+- The method `Table.__len__` was removed and `len(table)` will no longer
+  work. Use `Table.count_rows` instead.
+
+### Creating Indices
+
+The `Table.create_index` method is now used for creating both vector indices
+and scalar indices. It currently requires a column name to be specified (the
+column to index). Vector index defaults are now smarter and scale better with
+the size of the data.
+
+To specify index configuration details you will need to specify which kind of
+index you are using.
+
+### Querying
+
+The `Table.search` method has been renamed to `AsyncTable.vector_search` for
+clarity.
+
+## Features not yet supported
+
+The following features are not yet supported by the asynchronous API. However,
+we plan to support them soon.
+
+- You cannot specify an embedding function when creating or opening a table.
+  You must calculate embeddings yourself if using the asynchronous API
+- The merge insert operation is not supported in the asynchronous API
+- Cleanup / compact / optimize indices are not supported in the asynchronous API
+- add / alter columns is not supported in the asynchronous API
+- The asynchronous API does not yet support any full text search or reranking
+  search
+- Remote connections to LanceDb Cloud are not yet supported.
+- The method Table.head is not yet supported.
--- a/docs/src/python/python.md
+++ b/docs/src/python/python.md
@@ -8,17 +8,20 @@ This section contains the API reference for the OSS Python API.
 pip install lancedb
 ```

-## Connection
+The following methods describe the synchronous API client. There
+is also an [asynchronous API client](#connections-asynchronous).
+
+## Connections (Synchronous)

 ::: lancedb.connect

 ::: lancedb.db.DBConnection

-## Table
+## Tables (Synchronous)

 ::: lancedb.table.Table

-## Querying
+## Querying (Synchronous)

 ::: lancedb.query.Query

@@ -86,4 +89,42 @@ pip install lancedb

 ::: lancedb.rerankers.cross_encoder.CrossEncoderReranker

-::: lancedb.rerankers.openai.OpenaiReranker
+::: lancedb.rerankers.openai.OpenaiReranker
+
+## Connections (Asynchronous)
+
+Connections represent a connection to a LanceDb database and
+can be used to create, list, or open tables.
+
+::: lancedb.connect_async
+
+::: lancedb.db.AsyncConnection
+
+## Tables (Asynchronous)
+
+Table hold your actual data as a collection of records / rows.
+
+::: lancedb.table.AsyncTable
+
+## Indices (Asynchronous)
+
+Indices can be created on a table to speed up queries. This section
+lists the indices that LanceDb supports.
+
+::: lancedb.index.BTree
+
+::: lancedb.index.IvfPq
+
+## Querying (Asynchronous)
+
+Queries allow you to return data from your database. Basic queries can be
+created with the [AsyncTable.query][lancedb.table.AsyncTable.query] method
+to return the entire (typically filtered) table. Vector searches return the
+rows nearest to a query vector and can be created with the
+[AsyncTable.vector_search][lancedb.table.AsyncTable.vector_search] method.
+
+::: lancedb.query.AsyncQueryBase
+
+::: lancedb.query.AsyncQuery
+
+::: lancedb.query.AsyncVectorQuery