rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-03 11:32:56 +00:00

Files

Peter Bendel 46210035c5 add halfvec indexing and queries to periodic pgvector performance tests (#8057 )

## Problem

halfvec data type was introduced in pgvector 0.7.0 and is popular
because
it allows smaller vectors, smaller indexes and potentially better
performance.

So far we have not tested halfvec in our periodic performance tests.
This PR adds halfvec indexing and halfvec queries to the test.

2024-06-14 18:36:50 +02:00

halfvec_build.sql

add halfvec indexing and queries to periodic pgvector performance tests (#8057 )

2024-06-14 18:36:50 +02:00

HNSW_build.sql

Performance test for pgvector HNSW index build and queries (#7873 )

2024-05-28 11:05:33 +00:00

IVFFLAT_build.sql

Performance test for pgvector HNSW index build and queries (#7873 )

2024-05-28 11:05:33 +00:00

loaddata.py

Performance test for pgvector HNSW index build and queries (#7873 )

2024-05-28 11:05:33 +00:00

pgbench_custom_script_pgvector_halfvec_queries.sql

add halfvec indexing and queries to periodic pgvector performance tests (#8057 )

2024-06-14 18:36:50 +02:00

pgbench_custom_script_pgvector_hsnw_queries.sql

Performance test for pgvector HNSW index build and queries (#7873 )

2024-05-28 11:05:33 +00:00

README.md

clarify how to load the dbpedia vector embeddings into a postgres database (#7894 )

2024-05-28 17:21:09 +03:00

README.md

Source of the dataset for pgvector tests

This readme was copied from https://huggingface.co/datasets/Qdrant/dbpedia-entities-openai3-text-embedding-3-large-1536-1M

Download the parquet files

brew install git-lfs
git-lfs clone https://huggingface.co/datasets/Qdrant/dbpedia-entities-openai3-text-embedding-3-large-1536-1M

Load into postgres:

see loaddata.py in this directory

Rest of dataset card as on huggingface

dataset_info: features:

name: _id dtype: string
name: title dtype: string
name: text dtype: string
name: text-embedding-3-large-1536-embedding sequence: float64 splits:
name: train num_bytes: 12679725776 num_examples: 1000000 download_size: 9551862565 dataset_size: 12679725776 configs:
config_name: default data_files:
- split: train path: data/train-* license: mit task_categories:
feature-extraction language:
en size_categories:
1M<n<10M

1M OpenAI Embeddings: text-embedding-3-large 1536 dimensions

Created: February 2024.
Text used for Embedding: title (string) + text (string)
Embedding Model: OpenAI text-embedding-3-large
This dataset was generated from the first 1M entries of https://huggingface.co/datasets/BeIR/dbpedia-entity, extracted by @KShivendu_