review comments

This commit is contained in:
gsilvestrin
2023-04-19 20:23:18 -07:00
parent b19ce10184
commit 3eac75e61a

View File

@@ -1,8 +1,8 @@
# ANN (Approximate Nearest Neighbor) Indexes
You can create an index over your vector data to make search faster. Vector indexes are faster but less
accurate than exhaustive search. LanceDB provides many parameters to fine-tune the index's size, the speed of
queries, and the accuracy of results.
You can create an index over your vector data to make search faster. Vector indexes are faster but less accurate than exhaustive search. LanceDB provides many parameters to fine-tune the index's size, the speed of queries, and the accuracy of results.
Currently, LanceDB does not automatically create the ANN index. In the future we will look to improve this experience and automate index creation and configuration.
## Creating an ANN Index
@@ -28,9 +28,10 @@ tbl.create_index(num_partitions=256, num_sub_vectors=96)
Since `create_index` has a training step, it can take a few minutes to finish for large tables. You can control the index
creation by providing the following parameters:
- **num_partitions**: The number of partitions of the index. A higher number leads to faster queries, but it makes index
generation slower.
- **num_sub_vectors**: The number of subvectors (M) that will be created during Product Quantization (PQ). A larger number makes
- **num_partitions** (default: 256): The number of partitions of the index. The number of partitions should be configured so each partition has 3-5K vectors. For example, a table
with ~1M vectors should use 256 partitions. You can specify arbitrary number of partitions but powers of 2 is most conventional.
A higher number leads to faster queries, but it makes index generation slower.
- **num_sub_vectors** (default: 96): The number of subvectors (M) that will be created during Product Quantization (PQ). A larger number makes
search more accurate, but also makes the index larger and slower to build.
## Querying an ANN Index
@@ -39,9 +40,9 @@ Querying vector indexes is done via the [search](https://lancedb.github.io/lance
There are a couple of parameters that can be used to fine-tune the search:
- **limit**: The amount of results that will be returned
- **nprobes**: The number of probes used. A higher number makes search more accurate but also slower.
- **refine_factor**: Refine the results by reading extra elements and re-ranking them in memory. A higher number makes
- **limit** (default: 10): The amount of results that will be returned
- **nprobes** (default: 20): The number of probes used. A higher number makes search more accurate but also slower.
- **refine_factor** (default: None): Refine the results by reading extra elements and re-ranking them in memory. A higher number makes
search more accurate but also slower.
```python