From c76ec48603d2a88fc90cd36348568c0f3b339674 Mon Sep 17 00:00:00 2001 From: vincent d warmerdam Date: Fri, 15 Mar 2024 22:16:05 +0100 Subject: [PATCH] Explain vonoroi seed initalisation (#1114) This PR fixes https://github.com/lancedb/lancedb/issues/1112. It turned out that K-means is currently used internally, so I figured adding that context to the docs would be nice. --- docs/src/concepts/index_ivfpq.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/src/concepts/index_ivfpq.md b/docs/src/concepts/index_ivfpq.md index 044373f4..cf522557 100644 --- a/docs/src/concepts/index_ivfpq.md +++ b/docs/src/concepts/index_ivfpq.md @@ -31,7 +31,7 @@ As an example, consider starting with 128-dimensional vector consisting of 32-bi While PQ helps with reducing the size of the index, IVF primarily addresses search performance. The primary purpose of an inverted file index is to facilitate rapid and effective nearest neighbor search by narrowing down the search space. -In IVF, the PQ vector space is divided into *Voronoi cells*, which are essentially partitions that consist of all the points in the space that are within a threshold distance of the given region's seed point. These seed points are used to create an inverted index that correlates each centroid with a list of vectors in the space, allowing a search to be restricted to just a subset of vectors in the index. +In IVF, the PQ vector space is divided into *Voronoi cells*, which are essentially partitions that consist of all the points in the space that are within a threshold distance of the given region's seed point. These seed points are initialized by running K-means over the stored vectors. The centroids of K-means turn into the seed points which then each define a region. These regions are then are used to create an inverted index that correlates each centroid with a list of vectors in the space, allowing a search to be restricted to just a subset of vectors in the index. ![](../assets/ivfpq_ivf_desc.webp)