mirror of
https://github.com/lancedb/lancedb.git
synced 2026-01-13 15:22:57 +00:00
### Summary This PR adds **SigLIP** (Sigmoid Loss Image Pretraining) as a new embedding model in the LanceDB embedding registry. SigLIP improves image-text alignment performance using sigmoid-based contrastive loss and offers robust zero-shot generalization. Fixes #2498 ### What’s Implemented #### 1. `SigLIP` Embedding Class * Added `SigLIP` support under `python/lancedb/embeddings/siglip.py` * Implements: * `compute_source_embeddings` * `_batch_generate_embeddings` * Normalization logic * Batch-wise progress logging for image embedding #### 2. Registry Integration * Registered `SigLIP` in `embeddings/__init__.py` * `SigLIP` now usable via `connect(..., embedding="siglip")` #### 3. Evaluation Benchmark Support * Added SigLIP to `test_embeddings_slow.py` for side-by-side benchmarking with OpenCLIP and ImageBind ### New Test Methods #### `test_siglip` * End-to-end test to verify embeddings table creation and vector shape for SigLIP  #### `test_siglip_vs_openclip_vs_imagebind_benchmark_full` * Benchmarks: * **Recall\@1 / 5 / 10** * **mAP (Mean Average Precision)** * **Embedding & Search Latency** * Dimensionality reporting  ### Notes * SigLIP outputs 768D embeddings (vs 512D for OpenCLIP) * Benchmark shows competitive performance despite higher dimensionality * I'm still new to contributing to open-source and learning as I go. Please feel free to suggest any improvements — I'm happy to make changes!
24 lines
964 B
Python
24 lines
964 B
Python
# SPDX-License-Identifier: Apache-2.0
|
|
# SPDX-FileCopyrightText: Copyright The LanceDB Authors
|
|
|
|
|
|
# ruff: noqa: F401
|
|
from .base import EmbeddingFunction, EmbeddingFunctionConfig, TextEmbeddingFunction
|
|
from .bedrock import BedRockText
|
|
from .cohere import CohereEmbeddingFunction
|
|
from .gemini_text import GeminiText
|
|
from .instructor import InstructorEmbeddingFunction
|
|
from .ollama import OllamaEmbeddings
|
|
from .open_clip import OpenClipEmbeddings
|
|
from .openai import OpenAIEmbeddings
|
|
from .registry import EmbeddingFunctionRegistry, get_registry, register
|
|
from .sentence_transformers import SentenceTransformerEmbeddings
|
|
from .gte import GteEmbeddings
|
|
from .transformers import TransformersEmbeddingFunction, ColbertEmbeddings
|
|
from .imagebind import ImageBindEmbeddings
|
|
from .jinaai import JinaEmbeddings
|
|
from .watsonx import WatsonxEmbeddings
|
|
from .voyageai import VoyageAIEmbeddingFunction
|
|
from .colpali import ColPaliEmbeddings
|
|
from .siglip import SigLipEmbeddings
|