From f9f69a2ee7fb11b9d713bd3f2c50c5be516253c9 Mon Sep 17 00:00:00 2001 From: Peter Bendel Date: Tue, 28 May 2024 16:21:09 +0200 Subject: [PATCH] clarify how to load the dbpedia vector embeddings into a postgres database (#7894) ## Problem Improve the readme for the data load step in the pgvector performance test. --- test_runner/performance/pgvector/README.md | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/test_runner/performance/pgvector/README.md b/test_runner/performance/pgvector/README.md index c55db12e74..83495d270a 100644 --- a/test_runner/performance/pgvector/README.md +++ b/test_runner/performance/pgvector/README.md @@ -1,3 +1,20 @@ +# Source of the dataset for pgvector tests + +This readme was copied from https://huggingface.co/datasets/Qdrant/dbpedia-entities-openai3-text-embedding-3-large-1536-1M + +## Download the parquet files + +```bash +brew install git-lfs +git-lfs clone https://huggingface.co/datasets/Qdrant/dbpedia-entities-openai3-text-embedding-3-large-1536-1M +``` + +## Load into postgres: + +see loaddata.py in this directory + +## Rest of dataset card as on huggingface + --- dataset_info: features: @@ -35,4 +52,4 @@ size_categories: - Created: February 2024. - Text used for Embedding: title (string) + text (string) - Embedding Model: OpenAI text-embedding-3-large -- This dataset was generated from the first 1M entries of https://huggingface.co/datasets/BeIR/dbpedia-entity, extracted by @KShivendu_ [here](https://huggingface.co/datasets/KShivendu/dbpedia-entities-openai-1M) \ No newline at end of file +- This dataset was generated from the first 1M entries of https://huggingface.co/datasets/BeIR/dbpedia-entity, extracted by @KShivendu_ \ No newline at end of file