docs: add links to rust SDK docs, remove references to rust SDK being unstable / experimental (#1131)

2026-05-28 17:30:42 +00:00 · 2024-03-19 07:16:48 -07:00
parent abde77eafb
commit 0fe0976a0e
13 changed files with 321 additions and 104 deletions
--- a/docs/src/ann_indexes.md
+++ b/docs/src/ann_indexes.md
@@ -19,39 +19,61 @@ Lance supports `IVF_PQ` index type by default.

 === "Python"

-     Creating indexes is done via the [create_index](https://lancedb.github.io/lancedb/python/#lancedb.table.LanceTable.create_index) method.
+    Creating indexes is done via the [create_index](https://lancedb.github.io/lancedb/python/#lancedb.table.LanceTable.create_index) method.

-     ```python
-     import lancedb
-     import numpy as np
-     uri = "data/sample-lancedb"
-     db = lancedb.connect(uri)
+    ```python
+    import lancedb
+    import numpy as np
+    uri = "data/sample-lancedb"
+    db = lancedb.connect(uri)

-     # Create 10,000 sample vectors
-     data = [{"vector": row, "item": f"item {i}"}
+    # Create 10,000 sample vectors
+    data = [{"vector": row, "item": f"item {i}"}
        for i, row in enumerate(np.random.random((10_000, 1536)).astype('float32'))]

-     # Add the vectors to a table
-     tbl = db.create_table("my_vectors", data=data)
+    # Add the vectors to a table
+    tbl = db.create_table("my_vectors", data=data)

-     # Create and train the index - you need to have enough data in the table for an effective training step
-     tbl.create_index(num_partitions=256, num_sub_vectors=96)
-     ```
+    # Create and train the index - you need to have enough data in the table for an effective training step
+    tbl.create_index(num_partitions=256, num_sub_vectors=96)
+    ```

 === "Typescript"

-     ```typescript
-     --8<--- "docs/src/ann_indexes.ts:import"
+    ```typescript
+    --8<--- "docs/src/ann_indexes.ts:import"

-     --8<-- "docs/src/ann_indexes.ts:ingest"
-     ```
+    --8<-- "docs/src/ann_indexes.ts:ingest"
+    ```

- **metric** (default: "L2"): The distance metric to use. By default it uses euclidean distance "`L2`".
+=== "Rust"
+
+    ```rust
+    --8<-- "rust/lancedb/examples/ivf_pq.rs:create_index"
+    ```
+
+    IVF_PQ index parameters are more fully defined in the [crate docs](https://docs.rs/lancedb/latest/lancedb/index/vector/struct.IvfPqIndexBuilder.html).
+
+The following IVF_PQ paramters can be specified:
+
+- **distance_type**: The distance metric to use. By default it uses euclidean distance "`L2`".
  We also support "cosine" and "dot" distance as well.
- **num_partitions** (default: 256): The number of partitions of the index.
- **num_sub_vectors** (default: 96): The number of sub-vectors (M) that will be created during Product Quantization (PQ).
-  For D dimensional vector, it will be divided into `M` of `D/M` sub-vectors, each of which is presented by
-  a single PQ code.
+- **num_partitions**: The number of partitions in the index. The default is the square root
+  of the number of rows.
+
+!!! note
+
+    In the synchronous python SDK and node's `vectordb` the default is 256. This default has
+    changed in the asynchronous python SDK and node's `lancedb`.
+
+- **num_sub_vectors**: The number of sub-vectors (M) that will be created during Product Quantization (PQ).
+  For D dimensional vector, it will be divided into `M` subvectors with dimension `D/M`, each of which is replaced by
+  a single PQ code. The default is the dimension of the vector divided by 16.
+
+!!! note
+
+    In the synchronous python SDK and node's `vectordb` the default is currently 96. This default has
+    changed in the asynchronous python SDK and node's `lancedb`.

 <figure markdown>
  ![IVF PQ](./assets/ivf_pq.png)
@@ -114,25 +136,33 @@ There are a couple of parameters that can be used to fine-tune the search:

 === "Python"

-     ```python
-     tbl.search(np.random.random((1536))) \
-         .limit(2) \
-         .nprobes(20) \
-         .refine_factor(10) \
-         .to_pandas()
-     ```
+    ```python
+    tbl.search(np.random.random((1536))) \
+        .limit(2) \
+        .nprobes(20) \
+        .refine_factor(10) \
+        .to_pandas()
+    ```

-     ```text
+    ```text
                                              vector       item       _distance
-     0  [0.44949695, 0.8444449, 0.06281311, 0.23338133...  item 1141  103.575333
-     1  [0.48587373, 0.269207, 0.15095535, 0.65531915,...  item 3953  108.393867
-     ```
+    0  [0.44949695, 0.8444449, 0.06281311, 0.23338133...  item 1141  103.575333
+    1  [0.48587373, 0.269207, 0.15095535, 0.65531915,...  item 3953  108.393867
+    ```

 === "Typescript"

-     ```typescript
-     --8<-- "docs/src/ann_indexes.ts:search1"
-     ```
+    ```typescript
+    --8<-- "docs/src/ann_indexes.ts:search1"
+    ```
+
+=== "Rust"
+
+    ```rust
+    --8<-- "rust/lancedb/examples/ivf_pq.rs:search1"
+    ```
+
+    Vector search options are more fully defined in the [crate docs](https://docs.rs/lancedb/latest/lancedb/query/struct.Query.html#method.nearest_to).

 The search will return the data requested in addition to the distance of each item.

@@ -181,7 +211,7 @@ You can select the columns returned by the query using a select clause.
 ### Why do I need to manually create an index?

 Currently, LanceDB does _not_ automatically create the ANN index.
-LanceDB is well-optimized for kNN (exhaustive search) via a disk-based  index. For many use-cases,
+LanceDB is well-optimized for kNN (exhaustive search) via a disk-based index. For many use-cases,
 datasets of the order of ~100K vectors don't require index creation. If you can live with up to
 100ms latency, skipping index creation is a simpler workflow while guaranteeing 100% recall.