mirror of
https://github.com/neondatabase/neon.git
synced 2026-01-16 18:02:56 +00:00
## Problem We want to regularly verify the performance of pgvector HNSW parallel index builds and parallel similarity search using HNSW indexes. The first release that considerably improved the index-build parallelism was pgvector 0.7.0 and we want to make sure that we do not regress by our neon compute VM settings (swap, memory over commit, pg conf etc.) ## Summary of changes Prepare a Neon project with 1 million openAI vector embeddings (vector size 1536). Run HNSW indexing operations in the regression test for the various distance metrics. Run similarity queries using pgbench with 100 concurrent clients. I have also added the relevant metrics to the grafana dashboards pgbench and olape --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>
14 lines
416 B
SQL
14 lines
416 B
SQL
-- run with pooled connection
|
|
-- pgbench -T 300 -c 100 -j20 -f pgbench_hnsw_queries.sql -postgresql://neondb_owner:<secret>@ep-floral-thunder-w1gzhaxi-pooler.eu-west-1.aws.neon.build/neondb?sslmode=require"
|
|
|
|
with x (x) as (
|
|
select "embeddings" as x
|
|
from hnsw_test_table
|
|
TABLESAMPLE SYSTEM (1)
|
|
LIMIT 1
|
|
)
|
|
SELECT title, "embeddings" <=> (select x from x) as distance
|
|
FROM hnsw_test_table
|
|
ORDER BY 2
|
|
LIMIT 30;
|