mirror of
https://github.com/lancedb/lancedb.git
synced 2026-05-28 17:30:42 +00:00
docs: add links to rust SDK docs, remove references to rust SDK being unstable / experimental (#1131)
This commit is contained in:
@@ -19,39 +19,61 @@ Lance supports `IVF_PQ` index type by default.
|
||||
|
||||
=== "Python"
|
||||
|
||||
Creating indexes is done via the [create_index](https://lancedb.github.io/lancedb/python/#lancedb.table.LanceTable.create_index) method.
|
||||
Creating indexes is done via the [create_index](https://lancedb.github.io/lancedb/python/#lancedb.table.LanceTable.create_index) method.
|
||||
|
||||
```python
|
||||
import lancedb
|
||||
import numpy as np
|
||||
uri = "data/sample-lancedb"
|
||||
db = lancedb.connect(uri)
|
||||
```python
|
||||
import lancedb
|
||||
import numpy as np
|
||||
uri = "data/sample-lancedb"
|
||||
db = lancedb.connect(uri)
|
||||
|
||||
# Create 10,000 sample vectors
|
||||
data = [{"vector": row, "item": f"item {i}"}
|
||||
# Create 10,000 sample vectors
|
||||
data = [{"vector": row, "item": f"item {i}"}
|
||||
for i, row in enumerate(np.random.random((10_000, 1536)).astype('float32'))]
|
||||
|
||||
# Add the vectors to a table
|
||||
tbl = db.create_table("my_vectors", data=data)
|
||||
# Add the vectors to a table
|
||||
tbl = db.create_table("my_vectors", data=data)
|
||||
|
||||
# Create and train the index - you need to have enough data in the table for an effective training step
|
||||
tbl.create_index(num_partitions=256, num_sub_vectors=96)
|
||||
```
|
||||
# Create and train the index - you need to have enough data in the table for an effective training step
|
||||
tbl.create_index(num_partitions=256, num_sub_vectors=96)
|
||||
```
|
||||
|
||||
=== "Typescript"
|
||||
|
||||
```typescript
|
||||
--8<--- "docs/src/ann_indexes.ts:import"
|
||||
```typescript
|
||||
--8<--- "docs/src/ann_indexes.ts:import"
|
||||
|
||||
--8<-- "docs/src/ann_indexes.ts:ingest"
|
||||
```
|
||||
--8<-- "docs/src/ann_indexes.ts:ingest"
|
||||
```
|
||||
|
||||
- **metric** (default: "L2"): The distance metric to use. By default it uses euclidean distance "`L2`".
|
||||
=== "Rust"
|
||||
|
||||
```rust
|
||||
--8<-- "rust/lancedb/examples/ivf_pq.rs:create_index"
|
||||
```
|
||||
|
||||
IVF_PQ index parameters are more fully defined in the [crate docs](https://docs.rs/lancedb/latest/lancedb/index/vector/struct.IvfPqIndexBuilder.html).
|
||||
|
||||
The following IVF_PQ paramters can be specified:
|
||||
|
||||
- **distance_type**: The distance metric to use. By default it uses euclidean distance "`L2`".
|
||||
We also support "cosine" and "dot" distance as well.
|
||||
- **num_partitions** (default: 256): The number of partitions of the index.
|
||||
- **num_sub_vectors** (default: 96): The number of sub-vectors (M) that will be created during Product Quantization (PQ).
|
||||
For D dimensional vector, it will be divided into `M` of `D/M` sub-vectors, each of which is presented by
|
||||
a single PQ code.
|
||||
- **num_partitions**: The number of partitions in the index. The default is the square root
|
||||
of the number of rows.
|
||||
|
||||
!!! note
|
||||
|
||||
In the synchronous python SDK and node's `vectordb` the default is 256. This default has
|
||||
changed in the asynchronous python SDK and node's `lancedb`.
|
||||
|
||||
- **num_sub_vectors**: The number of sub-vectors (M) that will be created during Product Quantization (PQ).
|
||||
For D dimensional vector, it will be divided into `M` subvectors with dimension `D/M`, each of which is replaced by
|
||||
a single PQ code. The default is the dimension of the vector divided by 16.
|
||||
|
||||
!!! note
|
||||
|
||||
In the synchronous python SDK and node's `vectordb` the default is currently 96. This default has
|
||||
changed in the asynchronous python SDK and node's `lancedb`.
|
||||
|
||||
<figure markdown>
|
||||

|
||||
@@ -114,25 +136,33 @@ There are a couple of parameters that can be used to fine-tune the search:
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python
|
||||
tbl.search(np.random.random((1536))) \
|
||||
.limit(2) \
|
||||
.nprobes(20) \
|
||||
.refine_factor(10) \
|
||||
.to_pandas()
|
||||
```
|
||||
```python
|
||||
tbl.search(np.random.random((1536))) \
|
||||
.limit(2) \
|
||||
.nprobes(20) \
|
||||
.refine_factor(10) \
|
||||
.to_pandas()
|
||||
```
|
||||
|
||||
```text
|
||||
```text
|
||||
vector item _distance
|
||||
0 [0.44949695, 0.8444449, 0.06281311, 0.23338133... item 1141 103.575333
|
||||
1 [0.48587373, 0.269207, 0.15095535, 0.65531915,... item 3953 108.393867
|
||||
```
|
||||
0 [0.44949695, 0.8444449, 0.06281311, 0.23338133... item 1141 103.575333
|
||||
1 [0.48587373, 0.269207, 0.15095535, 0.65531915,... item 3953 108.393867
|
||||
```
|
||||
|
||||
=== "Typescript"
|
||||
|
||||
```typescript
|
||||
--8<-- "docs/src/ann_indexes.ts:search1"
|
||||
```
|
||||
```typescript
|
||||
--8<-- "docs/src/ann_indexes.ts:search1"
|
||||
```
|
||||
|
||||
=== "Rust"
|
||||
|
||||
```rust
|
||||
--8<-- "rust/lancedb/examples/ivf_pq.rs:search1"
|
||||
```
|
||||
|
||||
Vector search options are more fully defined in the [crate docs](https://docs.rs/lancedb/latest/lancedb/query/struct.Query.html#method.nearest_to).
|
||||
|
||||
The search will return the data requested in addition to the distance of each item.
|
||||
|
||||
@@ -181,7 +211,7 @@ You can select the columns returned by the query using a select clause.
|
||||
### Why do I need to manually create an index?
|
||||
|
||||
Currently, LanceDB does _not_ automatically create the ANN index.
|
||||
LanceDB is well-optimized for kNN (exhaustive search) via a disk-based index. For many use-cases,
|
||||
LanceDB is well-optimized for kNN (exhaustive search) via a disk-based index. For many use-cases,
|
||||
datasets of the order of ~100K vectors don't require index creation. If you can live with up to
|
||||
100ms latency, skipping index creation is a simpler workflow while guaranteeing 100% recall.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user