feat: pare down docs to only show API refs (#2770)

This PR does the following: 
- Pare down the docs to only what's needed (Python, JS/TS API docs and a
pointer to Rust docs)
- Styling changes to be more in line with the main website theme

The relative URLs remain unchanged, so assuming CI passes, there should
be no breaking changes from the main docs site that points back here.
This commit is contained in:
Prashanth Rao
2025-11-10 12:04:57 -05:00
committed by GitHub
parent e34f51713a
commit 8e06b8bfe1
180 changed files with 61 additions and 28716 deletions

View File

@@ -1,53 +0,0 @@
# Apache Datafusion
In Python, LanceDB tables can also be queried with [Apache Datafusion](https://datafusion.apache.org/), an extensible query engine written in Rust that uses Apache Arrow as its in-memory format. This means you can write complex SQL queries to analyze your data in LanceDB.
This integration is done via [Datafusion FFI](https://docs.rs/datafusion-ffi/latest/datafusion_ffi/), which provides a native integration between LanceDB and Datafusion.
The Datafusion FFI allows to pass down column selections and basic filters to LanceDB, reducing the amount of scanned data when executing your query. Additionally, the integration allows streaming data from LanceDB tables which allows to do aggregation larger-than-memory.
We can demonstrate this by first installing `datafusion` and `lancedb`.
```shell
pip install datafusion lancedb
```
We will re-use the dataset [created previously](./pandas_and_pyarrow.md):
```python
import lancedb
from datafusion import SessionContext
from lance import FFILanceTableProvider
db = lancedb.connect("data/sample-lancedb")
data = [
{"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
{"vector": [5.9, 26.5], "item": "bar", "price": 20.0}
]
lance_table = db.create_table("lance_table", data)
ctx = SessionContext()
ffi_lance_table = FFILanceTableProvider(
lance_table.to_lance(), with_row_id=True, with_row_addr=True
)
ctx.register_table_provider("ffi_lance_table", ffi_lance_table)
```
The `to_lance` method converts the LanceDB table to a `LanceDataset`, which is accessible to Datafusion through the Datafusion FFI integration layer.
To query the resulting Lance dataset in Datafusion, you first need to register the dataset with Datafusion and then just reference it by the same name in your SQL query.
```python
ctx.table("ffi_lance_table")
ctx.sql("SELECT * FROM ffi_lance_table")
```
```
┌─────────────┬─────────┬────────┬─────────────────┬─────────────────┐
│ vector │ item │ price │ _rowid │ _rowaddr │
│ float[] │ varchar │ double │ bigint unsigned │ bigint unsigned │
├─────────────┼─────────┼────────┼─────────────────┼─────────────────┤
│ [3.1, 4.1] │ foo │ 10.0 │ 0 │ 0 │
│ [5.9, 26.5] │ bar │ 20.0 │ 1 │ 1 │
└─────────────┴─────────┴────────┴─────────────────┴─────────────────┘
```

View File

@@ -1,61 +0,0 @@
# DuckDB
In Python, LanceDB tables can also be queried with [DuckDB](https://duckdb.org/), an in-process SQL OLAP database. This means you can write complex SQL queries to analyze your data in LanceDB.
This integration is done via [Apache Arrow](https://duckdb.org/docs/guides/python/sql_on_arrow), which provides zero-copy data sharing between LanceDB and DuckDB. DuckDB is capable of passing down column selections and basic filters to LanceDB, reducing the amount of data that needs to be scanned to perform your query. Finally, the integration allows streaming data from LanceDB tables, allowing you to aggregate tables that won't fit into memory. All of this uses the same mechanism described in DuckDB's blog post *[DuckDB quacks Arrow](https://duckdb.org/2021/12/03/duck-arrow.html)*.
We can demonstrate this by first installing `duckdb` and `lancedb`.
```shell
pip install duckdb lancedb
```
We will re-use the dataset [created previously](./pandas_and_pyarrow.md):
```python
import lancedb
db = lancedb.connect("data/sample-lancedb")
data = [
{"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
{"vector": [5.9, 26.5], "item": "bar", "price": 20.0}
]
table = db.create_table("pd_table", data=data)
```
The `to_lance` method converts the LanceDB table to a `LanceDataset`, which is accessible to DuckDB through the Arrow compatibility layer.
To query the resulting Lance dataset in DuckDB, all you need to do is reference the dataset by the same name in your SQL query.
```python
import duckdb
arrow_table = table.to_lance()
duckdb.query("SELECT * FROM arrow_table")
```
```
┌─────────────┬─────────┬────────┐
│ vector │ item │ price │
│ float[] │ varchar │ double │
├─────────────┼─────────┼────────┤
│ [3.1, 4.1] │ foo │ 10.0 │
│ [5.9, 26.5] │ bar │ 20.0 │
└─────────────┴─────────┴────────┘
```
You can very easily run any other DuckDB SQL queries on your data.
```py
duckdb.query("SELECT mean(price) FROM arrow_table")
```
```
┌─────────────┐
│ mean(price) │
│ double │
├─────────────┤
│ 15.0 │
└─────────────┘
```

View File

@@ -1,97 +0,0 @@
# Pandas and PyArrow
Because Lance is built on top of [Apache Arrow](https://arrow.apache.org/),
LanceDB is tightly integrated with the Python data ecosystem, including [Pandas](https://pandas.pydata.org/)
and PyArrow. The sequence of steps in a typical workflow is shown below.
## Create dataset
First, we need to connect to a LanceDB database.
=== "Sync API"
```python
--8<-- "python/python/tests/docs/test_python.py:import-lancedb"
--8<-- "python/python/tests/docs/test_python.py:connect_to_lancedb"
```
=== "Async API"
```python
--8<-- "python/python/tests/docs/test_python.py:import-lancedb"
--8<-- "python/python/tests/docs/test_python.py:connect_to_lancedb_async"
```
We can load a Pandas `DataFrame` to LanceDB directly.
=== "Sync API"
```python
--8<-- "python/python/tests/docs/test_python.py:import-pandas"
--8<-- "python/python/tests/docs/test_python.py:create_table_pandas"
```
=== "Async API"
```python
--8<-- "python/python/tests/docs/test_python.py:import-pandas"
--8<-- "python/python/tests/docs/test_python.py:create_table_pandas_async"
```
Similar to the [`pyarrow.write_dataset()`](https://arrow.apache.org/docs/python/generated/pyarrow.dataset.write_dataset.html) method, LanceDB's
[`db.create_table()`](python.md/#lancedb.db.DBConnection.create_table) accepts data in a variety of forms.
If you have a dataset that is larger than memory, you can create a table with `Iterator[pyarrow.RecordBatch]` to lazily load the data:
=== "Sync API"
```python
--8<-- "python/python/tests/docs/test_python.py:import-iterable"
--8<-- "python/python/tests/docs/test_python.py:import-pyarrow"
--8<-- "python/python/tests/docs/test_python.py:make_batches"
--8<-- "python/python/tests/docs/test_python.py:create_table_iterable"
```
=== "Async API"
```python
--8<-- "python/python/tests/docs/test_python.py:import-iterable"
--8<-- "python/python/tests/docs/test_python.py:import-pyarrow"
--8<-- "python/python/tests/docs/test_python.py:make_batches"
--8<-- "python/python/tests/docs/test_python.py:create_table_iterable_async"
```
You will find detailed instructions of creating a LanceDB dataset in
[Getting Started](../basic.md#quick-start) and [API](python.md/#lancedb.db.DBConnection.create_table)
sections.
## Vector search
We can now perform similarity search via the LanceDB Python API.
=== "Sync API"
```python
--8<-- "python/python/tests/docs/test_python.py:vector_search"
```
=== "Async API"
```python
--8<-- "python/python/tests/docs/test_python.py:vector_search_async"
```
```
vector item price _distance
0 [5.9, 26.5] bar 20.0 14257.05957
```
If you have a simple filter, it's faster to provide a `where` clause to LanceDB's `search` method.
For more complex filters or aggregations, you can always resort to using the underlying `DataFrame` methods after performing a search.
=== "Sync API"
```python
--8<-- "python/python/tests/docs/test_python.py:vector_search_with_filter"
```
=== "Async API"
```python
--8<-- "python/python/tests/docs/test_python.py:vector_search_with_filter_async"
```

View File

@@ -1,141 +0,0 @@
# Polars
LanceDB supports [Polars](https://github.com/pola-rs/polars), a blazingly fast DataFrame library for Python written in Rust. Just like in Pandas, the Polars integration is enabled by PyArrow under the hood. A deeper integration between Lance Tables and Polars DataFrames is in progress, but at the moment, you can read a Polars DataFrame into LanceDB and output the search results from a query to a Polars DataFrame.
## Create & Query LanceDB Table
### From Polars DataFrame
First, we connect to a LanceDB database.
=== "Sync API"
```py
--8<-- "python/python/tests/docs/test_python.py:import-lancedb"
--8<-- "python/python/tests/docs/test_python.py:connect_to_lancedb"
```
=== "Async API"
```py
--8<-- "python/python/tests/docs/test_python.py:import-lancedb"
--8<-- "python/python/tests/docs/test_python.py:connect_to_lancedb_async"
```
We can load a Polars `DataFrame` to LanceDB directly.
=== "Sync API"
```py
--8<-- "python/python/tests/docs/test_python.py:import-polars"
--8<-- "python/python/tests/docs/test_python.py:create_table_polars"
```
=== "Async API"
```py
--8<-- "python/python/tests/docs/test_python.py:import-polars"
--8<-- "python/python/tests/docs/test_python.py:create_table_polars_async"
```
We can now perform similarity search via the LanceDB Python API.
=== "Sync API"
```py
--8<-- "python/python/tests/docs/test_python.py:vector_search_polars"
```
=== "Async API"
```py
--8<-- "python/python/tests/docs/test_python.py:vector_search_polars_async"
```
In addition to the selected columns, LanceDB also returns a vector
and also the `_distance` column which is the distance between the query
vector and the returned vector.
```
shape: (1, 4)
┌───────────────┬──────┬───────┬───────────┐
│ vector ┆ item ┆ price ┆ _distance │
│ --- ┆ --- ┆ --- ┆ --- │
│ array[f32, 2] ┆ str ┆ f64 ┆ f32 │
╞═══════════════╪══════╪═══════╪═══════════╡
│ [3.1, 4.1] ┆ foo ┆ 10.0 ┆ 0.0 │
└───────────────┴──────┴───────┴───────────┘
<class 'polars.dataframe.frame.DataFrame'>
```
Note that the type of the result from a table search is a Polars DataFrame.
### From Pydantic Models
Alternately, we can create an empty LanceDB Table using a Pydantic schema and populate it with a Polars DataFrame.
```py
--8<-- "python/python/tests/docs/test_python.py:import-polars"
--8<-- "python/python/tests/docs/test_python.py:import-lancedb-pydantic"
--8<-- "python/python/tests/docs/test_python.py:class_Item"
--8<-- "python/python/tests/docs/test_python.py:create_table_pydantic"
```
The table can now be queried as usual.
```py
--8<-- "python/python/tests/docs/test_python.py:vector_search_polars"
```
```
shape: (1, 4)
┌───────────────┬──────┬───────┬───────────┐
│ vector ┆ item ┆ price ┆ _distance │
│ --- ┆ --- ┆ --- ┆ --- │
│ array[f32, 2] ┆ str ┆ f64 ┆ f32 │
╞═══════════════╪══════╪═══════╪═══════════╡
│ [3.1, 4.1] ┆ foo ┆ 10.0 ┆ 0.02 │
└───────────────┴──────┴───────┴───────────┘
<class 'polars.dataframe.frame.DataFrame'>
```
This result is the same as the previous one, with a DataFrame returned.
## Dump Table to LazyFrame
As you iterate on your application, you'll likely need to work with the whole table's data pretty frequently.
LanceDB tables can also be converted directly into a polars LazyFrame for further processing.
```python
--8<-- "python/python/tests/docs/test_python.py:dump_table_lazyform"
```
Unlike the search result from a query, we can see that the type of the result is a LazyFrame.
```
<class 'polars.lazyframe.frame.LazyFrame'>
```
We can now work with the LazyFrame as we would in Polars, and collect the first result.
```python
--8<-- "python/python/tests/docs/test_python.py:print_table_lazyform"
```
```
shape: (1, 3)
┌───────────────┬──────┬───────┐
│ vector ┆ item ┆ price │
│ --- ┆ --- ┆ --- │
│ array[f32, 2] ┆ str ┆ f64 │
╞═══════════════╪══════╪═══════╡
│ [3.1, 4.1] ┆ foo ┆ 10.0 │
└───────────────┴──────┴───────┘
```
The reason it's beneficial to not convert the LanceDB Table
to a DataFrame is because the table can potentially be way larger
than memory, and Polars LazyFrames allow us to work with such
larger-than-memory datasets by not loading it into memory all at once.

View File

@@ -1,47 +0,0 @@
# Pydantic
[Pydantic](https://docs.pydantic.dev/latest/) is a data validation library in Python.
LanceDB integrates with Pydantic for schema inference, data ingestion, and query result casting.
Using [LanceModel][lancedb.pydantic.LanceModel], users can seamlessly
integrate Pydantic with the rest of the LanceDB APIs.
```python
--8<-- "python/python/tests/docs/test_pydantic_integration.py:imports"
--8<-- "python/python/tests/docs/test_pydantic_integration.py:base_model"
--8<-- "python/python/tests/docs/test_pydantic_integration.py:set_url"
--8<-- "python/python/tests/docs/test_pydantic_integration.py:base_example"
```
## Vector Field
LanceDB provides a [`Vector(dim)`](python.md#lancedb.pydantic.Vector) method to define a
vector Field in a Pydantic Model.
::: lancedb.pydantic.Vector
## Type Conversion
LanceDB automatically convert Pydantic fields to
[Apache Arrow DataType](https://arrow.apache.org/docs/python/generated/pyarrow.DataType.html#pyarrow.DataType).
Current supported type conversions:
| Pydantic Field Type | PyArrow Data Type |
| ------------------- | ----------------- |
| `int` | `pyarrow.int64` |
| `float` | `pyarrow.float64` |
| `bool` | `pyarrow.bool` |
| `str` | `pyarrow.utf8()` |
| `list` | `pyarrow.List` |
| `BaseModel` | `pyarrow.Struct` |
| `Vector(n)` | `pyarrow.FixedSizeList(float32, n)` |
LanceDB supports to create Apache Arrow Schema from a
[Pydantic BaseModel][pydantic.BaseModel]
via [pydantic_to_schema()](python.md#lancedb.pydantic.pydantic_to_schema) method.
::: lancedb.pydantic.pydantic_to_schema

View File

@@ -1,7 +1,6 @@
# Python API Reference
This section contains the API reference for the Python API. There is a
synchronous and an asynchronous API client.
This section contains the API reference for the Python API of [LanceDB](https://github.com/lancedb/lancedb). Both synchronous and asynchronous APIs are available.
The general flow of using the API is:

View File

@@ -1,24 +0,0 @@
# Python API Reference (SaaS)
This section contains the API reference for the LanceDB Cloud Python API.
## Installation
```shell
pip install lancedb
```
## Connection
::: lancedb.connect
::: lancedb.remote.db.RemoteDBConnection
## Table
::: lancedb.remote.table.RemoteTable
options:
filters:
- "!cleanup_old_versions"
- "!compact_files"
- "!optimize"