mirror of
https://github.com/neondatabase/neon.git
synced 2026-01-16 01:42:55 +00:00
## Describe your changes Port HNSW implementation for ANN search top Postgres ## Issue ticket number and link https://www.pinecone.io/learn/hnsw ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist
25 lines
1.3 KiB
Markdown
25 lines
1.3 KiB
Markdown
# Revisiting the Inverted Indices for Billion-Scale Approximate Nearest Neighbors
|
|
|
|
This ANN extension of Postgres is based
|
|
on [ivf-hnsw](https://github.com/dbaranchuk/ivf-hnsw.git) implementation of [HNSW](https://www.pinecone.io/learn/hnsw),
|
|
the code for the current state-of-the-art billion-scale nearest neighbor search system presented in the paper:
|
|
|
|
[Revisiting the Inverted Indices for Billion-Scale Approximate Nearest Neighbors](http://openaccess.thecvf.com/content_ECCV_2018/html/Dmitry_Baranchuk_Revisiting_the_Inverted_ECCV_2018_paper.html),
|
|
<br>
|
|
Dmitry Baranchuk, Artem Babenko, Yury Malkov
|
|
|
|
# Postgres extension
|
|
|
|
HNSW index is hold in memory (built on demand) and it's maxial size is limited
|
|
by `maxelements` index parameter. Another required parameter is nubmer of dimensions (if it is not specified in column type).
|
|
Optional parameter `ef` specifies number of neighbors which are considered during index construction and search (corresponds `efConstruction` and `efSearch` parameters
|
|
described in the article).
|
|
|
|
# Example of usage:
|
|
|
|
```
|
|
create extension hnsw;
|
|
create table embeddings(id integer primary key, payload real[]);
|
|
create index on embeddings using hnsw(payload) with (maxelements=1000000, dims=100, m=32);
|
|
select id from embeddings order by payload <-> array[1.0, 2.0,...] limit 100;
|
|
``` |