From 7eb3b52297dea2ca63b2f454b3fbd19ddb1022e5 Mon Sep 17 00:00:00 2001 From: Jon X Date: Fri, 6 Sep 2024 12:08:19 +0800 Subject: [PATCH] docs: added a blank line between a paragraph and a list block (#1604) Though the markdown can be rendered well on GitHub (GFM style?), but it seems that it's required to insert a blank line between a paragraph and a list block to make it render well with `mkdocs`? see also the web page: https://lancedb.github.io/lancedb/concepts/index_hnsw/ --- docs/src/concepts/index_hnsw.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/docs/src/concepts/index_hnsw.md b/docs/src/concepts/index_hnsw.md index 9e8dc948..8bfaf39c 100644 --- a/docs/src/concepts/index_hnsw.md +++ b/docs/src/concepts/index_hnsw.md @@ -15,11 +15,13 @@ HNSW also combines this with the ideas behind a classic 1-dimensional search dat ## k-Nearest Neighbor Graphs and k-approximate Nearest neighbor Graphs The k-nearest neighbor graph actually predates its use for ANN search. Its construction is quite simple: + * Each vector in the dataset is given an associated vertex. * Each vertex has outgoing edges to its k nearest neighbors. That is, the k closest other vertices by Euclidean distance between the two corresponding vectors. This can be thought of as a "friend list" for the vertex. * For some applications (including nearest-neighbor search), the incoming edges are also added. Eventually, it was realized that the following greedy search method over such a graph typically results in good approximate nearest neighbors: + * Given a query vector, start at some fixed "entry point" vertex (e.g. the approximate center node). * Look at that vertex's neighbors. If any of them are closer to the query vector than the current vertex, then move to that vertex. * Repeat until a local optimum is found. @@ -36,15 +38,18 @@ One downside of k-NN and k-ANN graphs alone is that one must typically build the ## HNSW: Hierarchical Navigable Small Worlds HNSW builds on k-ANN in two main ways: + * Instead of getting the k-approximate nearest neighbors for a large value of k, it sparsifies the k-ANN graph using a carefully chosen "edge pruning" heuristic, allowing for the number of edges per vertex to be limited to a relatively small constant. * The "entry point" vertex is chosen dynamically using a recursively constructed data structure on a subset of the data, similarly to a skip list. This recursive structure can be thought of as separating into layers: + * At the bottom-most layer, an k-ANN graph on the whole dataset is present. * At the second layer, a k-ANN graph on a fraction of the dataset (e.g. 10%) is present. * At the Lth layer, a k-ANN graph is present. It is over a (constant) fraction (e.g. 10%) of the vectors/vertices present in the L-1th layer. Then the greedy search routine operates as follows: + * At the top layer (using an arbitrary vertex as an entry point), use the greedy local search routine on the k-ANN graph to get an approximate nearest neighbor at that layer. * Using the approximate nearest neighbor found in the previous layer as an entry point, find an approximate nearest neighbor in the next layer with the same method. * Repeat until the bottom-most layer is reached. Then use the entry point to find multiple nearest neighbors (e.g. top 10).