Files
neon/test_runner/performance/pgvector
Peter Bendel fabeff822f Performance test for pgvector HNSW index build and queries (#7873)
## Problem

We want to regularly verify the performance of pgvector HNSW parallel
index builds and parallel similarity search using HNSW indexes.
The first release that considerably improved the index-build parallelism
was pgvector 0.7.0 and we want to make sure that we do not regress by
our neon compute VM settings (swap, memory over commit, pg conf etc.)

## Summary of changes

Prepare a Neon project with 1 million openAI vector embeddings (vector
size 1536).
Run HNSW indexing operations in the regression test for the various
distance metrics.
Run similarity queries using pgbench with 100 concurrent clients.

I have also added the relevant metrics to the grafana dashboards pgbench
and olape

---------

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2024-05-28 11:05:33 +00:00
..

dataset_info, configs, license, task_categories, language, size_categories
dataset_info configs license task_categories language size_categories
features splits download_size dataset_size
name dtype
_id string
name dtype
title string
name dtype
text string
name sequence
text-embedding-3-large-1536-embedding float64
name num_bytes num_examples
train 12679725776 1000000
9551862565 12679725776
config_name data_files
default
split path
train data/train-*
mit
feature-extraction
en
1M<n<10M

1M OpenAI Embeddings: text-embedding-3-large 1536 dimensions