Merge branch 'main' into changhiskhan/improve-index-docs

This commit is contained in:
Chang She
2023-04-25 21:04:01 -07:00
committed by GitHub
6 changed files with 51 additions and 12 deletions

View File

@@ -34,6 +34,7 @@ tbl.create_index(num_partitions=256, num_sub_vectors=96)
Since `create_index` has a training step, it can take a few minutes to finish for large tables. You can control the index
creation by providing the following parameters:
- **metric** (default: "L2"): The distance metric to use. By default we use euclidean distance. We also support cosine distance.
- **num_partitions** (default: 256): The number of partitions of the index. The number of partitions should be configured so each partition has 3-5K vectors. For example, a table
with ~1M vectors should use 256 partitions. You can specify arbitrary number of partitions but powers of 2 is most conventional.
A higher number leads to faster queries, but it makes index generation slower.
@@ -56,6 +57,7 @@ There are a couple of parameters that can be used to fine-tune the search:
e.g., for 1M vectors divided into 256 partitions, if you're looking for top 20, then refine_factor=200 reranks the whole partition.<br/>
Note: refine_factor is only applicable if an ANN index is present. If specified on a table without an ANN index, it is ignored.
```python
tbl.search(np.random.random((768))) \
.limit(2) \