docs: add links to rust SDK docs, remove references to rust SDK being unstable / experimental (#1131)

This commit is contained in:
Weston Pace
2024-03-19 07:16:48 -07:00
committed by GitHub
parent 2de226220b
commit 0c7809c7a0
13 changed files with 321 additions and 104 deletions

View File

@@ -3,7 +3,7 @@
!!! info "LanceDB can be run in a number of ways:"
* Embedded within an existing backend (like your Django, Flask, Node.js or FastAPI application)
* Connected to directly from a client application like a Jupyter notebook for analytical workloads
* Directly from a client application like a Jupyter notebook for analytical workloads
* Deployed as a remote serverless database
![](assets/lancedb_embedded_explanation.png)
@@ -24,13 +24,11 @@
=== "Rust"
!!! warning "Rust SDK is experimental, might introduce breaking changes in the near future"
```shell
cargo add vectordb
cargo add lancedb
```
!!! info "To use the vectordb create, you first need to install protobuf."
!!! info "To use the lancedb create, you first need to install protobuf."
=== "macOS"
@@ -44,7 +42,7 @@
sudo apt install -y protobuf-compiler libssl-dev
```
!!! info "Please also make sure you're using the same version of Arrow as in the [vectordb crate](https://github.com/lancedb/lancedb/blob/main/Cargo.toml)"
!!! info "Please also make sure you're using the same version of Arrow as in the [lancedb crate](https://github.com/lancedb/lancedb/blob/main/Cargo.toml)"
## Connect to a database
@@ -81,10 +79,11 @@ If you need a reminder of the uri, you can call `db.uri()`.
## Create a table
### Directly insert data to a new table
### Create a table from initial data
If you have data to insert into the table at creation time, you can simultaneously create a
table and insert the data to it.
If you have data to insert into the table at creation time, you can simultaneously create a
table and insert the data into it. The schema of the data will be used as the schema of the
table.
=== "Python"
@@ -120,21 +119,27 @@ table and insert the data to it.
=== "Rust"
```rust
use arrow_schema::{DataType, Schema, Field};
use arrow_array::{RecordBatch, RecordBatchIterator};
--8<-- "rust/lancedb/examples/simple.rs:create_table"
```
If the table already exists, LanceDB will raise an error by default.
If the table already exists, LanceDB will raise an error by default. See
[the mode option](https://docs.rs/lancedb/latest/lancedb/connection/struct.CreateTableBuilder.html#method.mode)
for details on how to overwrite (or open) existing tables instead.
!!! info "Under the hood, LanceDB converts the input data into an Apache Arrow table and persists it to disk using the [Lance format](https://www.github.com/lancedb/lance)."
!!! Providing table records in Rust
The Rust SDK currently expects data to be provided as an Arrow
[RecordBatchReader](https://docs.rs/arrow-array/latest/arrow_array/trait.RecordBatchReader.html)
Support for additional formats (such as serde or polars) is on the roadmap.
!!! info "Under the hood, LanceDB reads in the Apache Arrow data and persists it to disk using the [Lance format](https://www.github.com/lancedb/lance)."
### Create an empty table
Sometimes you may not have the data to insert into the table at creation time.
In this case, you can create an empty table and specify the schema, so that you can add
data to the table at a later time (such that it conforms to the schema).
data to the table at a later time (as long as it conforms to the schema). This is
similar to a `CREATE TABLE` statement in SQL.
=== "Python"
@@ -175,7 +180,7 @@ Once created, you can open a table as follows:
=== "Rust"
```rust
--8<-- "rust/lancedb/examples/simple.rs:open_with_existing_file"
--8<-- "rust/lancedb/examples/simple.rs:open_existing_tbl"
```
If you forget the name of your table, you can always get a listing of all table names:
@@ -254,6 +259,14 @@ Once you've embedded the query, you can find its nearest neighbors as follows:
--8<-- "rust/lancedb/examples/simple.rs:search"
```
!!! Query vectors in Rust
Rust does not yet support automatic execution of embedding functions. You will need to
calculate embeddings yourself. Support for this is on the roadmap and can be tracked at
https://github.com/lancedb/lancedb/issues/994
Query vectors can be provided as Arrow arrays or a Vec/slice of Rust floats.
Support for additional formats (e.g. `polars::series::Series`) is on the roadmap.
By default, LanceDB runs a brute-force scan over dataset to find the K nearest neighbours (KNN).
For tables with more than 50K vectors, creating an ANN index is recommended to speed up search performance.
LanceDB allows you to create an ANN index on a table as follows:
@@ -277,7 +290,7 @@ LanceDB allows you to create an ANN index on a table as follows:
```
!!! note "Why do I need to create an index manually?"
LanceDB does not automatically create the ANN index, for two reasons. The first is that it's optimized
LanceDB does not automatically create the ANN index for two reasons. The first is that it's optimized
for really fast retrievals via a disk-based index, and the second is that data and query workloads can
be very diverse, so there's no one-size-fits-all index configuration. LanceDB provides many parameters
to fine-tune index size, query latency and accuracy. See the section on
@@ -308,8 +321,9 @@ This can delete any number of rows that match the filter.
```
The deletion predicate is a SQL expression that supports the same expressions
as the `where()` clause on a search. They can be as simple or complex as needed.
To see what expressions are supported, see the [SQL filters](sql.md) section.
as the `where()` clause (`only_if()` in Rust) on a search. They can be as
simple or complex as needed. To see what expressions are supported, see the
[SQL filters](sql.md) section.
=== "Python"
@@ -319,6 +333,10 @@ To see what expressions are supported, see the [SQL filters](sql.md) section.
Read more: [vectordb.Table.delete](javascript/interfaces/Table.md#delete)
=== "Rust"
Read more: [lancedb::Table::delete](https://docs.rs/lancedb/latest/lancedb/table/struct.Table.html#method.delete)
## Drop a table
Use the `drop_table()` method on the database to remove a table.