[Python] Create table with Iterator[RecordBatch] and add docs (#316)

2026-01-08 04:42:57 +00:00 · 2023-07-16 21:45:55 -07:00
parent 7a57cddb2c
commit 088e745e1d
5 changed files with 103 additions and 20 deletions
--- a/docs/src/python/arrow.md
+++ b/docs/src/python/arrow.md
@@ -5,6 +5,8 @@ Built on top of [Apache Arrow](https://arrow.apache.org/),
 `LanceDB` is easy to integrate with the Python ecosystem, including [Pandas](https://pandas.pydata.org/)
 and PyArrow.

+## Create dataset
+
 First, we need to connect to a `LanceDB` database.

 ```py
@@ -27,10 +29,42 @@ data = pd.DataFrame({
 table = db.create_table("pd_table", data=data)
 ```

-You will find detailed instructions of creating dataset and index in
-[Basic Operations](basic.md) and [Indexing](ann_indexes.md)
+Similar to [`pyarrow.write_dataset()`](https://arrow.apache.org/docs/python/generated/pyarrow.dataset.write_dataset.html),
+[db.create_table()](../python/#lancedb.db.DBConnection.create_table) accepts a wide-range of forms of data.
+
+For example, if you have a dataset that is larger than memory size, you can create table with `Iterator[pyarrow.RecordBatch]`,
+to lazily generate data:
+
+```py
+
+from typing import Iterable
+import pyarrow as pa
+import lancedb
+
+def make_batches() -> Iterable[pa.RecordBatch]:
+    for i in range(5):
+        yield pa.RecordBatch.from_arrays(
+            [
+                pa.array([[3.1, 4.1], [5.9, 26.5]]),
+                pa.array(["foo", "bar"]),
+                pa.array([10.0, 20.0]),
+            ],
+            ["vector", "item", "price"])
+
+schema=pa.schema([
+    pa.field("vector", pa.list_(pa.float32())),
+    pa.field("item", pa.utf8()),
+    pa.field("price", pa.float32()),
+])
+
+table = db.create_table("iterable_table", data=make_batches(), schema=schema)
+```
+
+You will find detailed instructions of creating dataset in
+[Basic Operations](../basic.md) and [API](../python/#lancedb.db.DBConnection.create_table)
 sections.

+## Vector Search

 We can now perform similarity search via `LanceDB` Python API.

--- a/docs/src/python/python.md
+++ b/docs/src/python/python.md
@@ -46,10 +46,6 @@ pip install lancedb

 ## Utilities

-::: lancedb.schema.schema_to_dict
-
-::: lancedb.schema.dict_to_schema
-
 ::: lancedb.vector

 ## Integrations