mirror of
https://github.com/lancedb/lancedb.git
synced 2026-01-04 19:02:58 +00:00
Explain vonoroi seed initalisation (#1114)
This PR fixes https://github.com/lancedb/lancedb/issues/1112. It turned out that K-means is currently used internally, so I figured adding that context to the docs would be nice.
This commit is contained in:
committed by
Weston Pace
parent
8bb983bc3d
commit
b9afc01cfd
@@ -31,7 +31,7 @@ As an example, consider starting with 128-dimensional vector consisting of 32-bi
|
||||
|
||||
While PQ helps with reducing the size of the index, IVF primarily addresses search performance. The primary purpose of an inverted file index is to facilitate rapid and effective nearest neighbor search by narrowing down the search space.
|
||||
|
||||
In IVF, the PQ vector space is divided into *Voronoi cells*, which are essentially partitions that consist of all the points in the space that are within a threshold distance of the given region's seed point. These seed points are used to create an inverted index that correlates each centroid with a list of vectors in the space, allowing a search to be restricted to just a subset of vectors in the index.
|
||||
In IVF, the PQ vector space is divided into *Voronoi cells*, which are essentially partitions that consist of all the points in the space that are within a threshold distance of the given region's seed point. These seed points are initialized by running K-means over the stored vectors. The centroids of K-means turn into the seed points which then each define a region. These regions are then are used to create an inverted index that correlates each centroid with a list of vectors in the space, allowing a search to be restricted to just a subset of vectors in the index.
|
||||
|
||||

|
||||
|
||||
|
||||
Reference in New Issue
Block a user