rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-02-26 05:50:37 +00:00

Files

Tristan Partin 5bd8e2363a Enable all pyupgrade checks in ruff

This will help to keep us from using deprecated Python features going
forward.

Signed-off-by: Tristan Partin <tristan@neon.tech>

2024-10-08 14:32:26 -05:00

halfvec_build.sql

2024-06-14 18:36:50 +02:00

HNSW_build.sql

2024-05-28 11:05:33 +00:00

IVFFLAT_build.sql

2024-05-28 11:05:33 +00:00

loaddata.py

2024-10-08 14:32:26 -05:00

2024-06-14 18:36:50 +02:00

2024-05-28 11:05:33 +00:00

README.md

2024-05-28 17:21:09 +03:00

Source of the dataset for pgvector tests

Download the parquet files

brew install git-lfs
git-lfs clone https://huggingface.co/datasets/Qdrant/dbpedia-entities-openai3-text-embedding-3-large-1536-1M

see loaddata.py in this directory

dataset_info: features:

name: _id dtype: string
name: title dtype: string
name: text dtype: string
name: text-embedding-3-large-1536-embedding sequence: float64 splits:
name: train num_bytes: 12679725776 num_examples: 1000000 download_size: 9551862565 dataset_size: 12679725776 configs:
config_name: default data_files:
- split: train path: data/train-* license: mit task_categories:
feature-extraction language:
en size_categories:
1M<n<10M

1M OpenAI Embeddings: text-embedding-3-large 1536 dimensions

Created: February 2024.
Text used for Embedding: title (string) + text (string)
Embedding Model: OpenAI text-embedding-3-large
This dataset was generated from the first 1M entries of https://huggingface.co/datasets/BeIR/dbpedia-entity, extracted by @KShivendu_