feat(embeddings): add siglip embedding support to lancedb (#2499)

###  Summary

This PR adds **SigLIP** (Sigmoid Loss Image Pretraining) as a new
embedding model in the LanceDB embedding registry. SigLIP improves
image-text alignment performance using sigmoid-based contrastive loss
and offers robust zero-shot generalization.

Fixes #2498 

### What’s Implemented

#### 1. `SigLIP` Embedding Class

* Added `SigLIP` support under `python/lancedb/embeddings/siglip.py`
* Implements:

  * `compute_source_embeddings`
  * `_batch_generate_embeddings`
  * Normalization logic
  * Batch-wise progress logging for image embedding

#### 2. Registry Integration

* Registered `SigLIP` in `embeddings/__init__.py`
* `SigLIP` now usable via `connect(..., embedding="siglip")`

#### 3. Evaluation Benchmark Support

* Added SigLIP to `test_embeddings_slow.py` for side-by-side
benchmarking with OpenCLIP and ImageBind


###  New Test Methods

####  `test_siglip`

* End-to-end test to verify embeddings table creation and vector shape
for SigLIP
![WhatsApp Image 2025-07-10 at 18 00
27_a3368163](https://github.com/user-attachments/assets/e5582ee1-80a3-43d7-a7a1-26ceecce9f4d)


####  `test_siglip_vs_openclip_vs_imagebind_benchmark_full`

* Benchmarks:

  * **Recall\@1 / 5 / 10**
  * **mAP (Mean Average Precision)**
  * **Embedding & Search Latency**
  * Dimensionality reporting
![WhatsApp Image 2025-07-10 at 18 12
13_22c67a84](https://github.com/user-attachments/assets/455bf30f-62b7-4684-a3f3-ad52e2a1ffe5)


###  Notes

* SigLIP outputs 768D embeddings (vs 512D for OpenCLIP)
* Benchmark shows competitive performance despite higher dimensionality
* I'm still new to contributing to open-source and learning as I go.
Please feel free to suggest any improvements — I'm happy to make
changes!
This commit is contained in:
Poornachandra.A.N
2025-08-05 00:12:39 +05:30
committed by GitHub
parent 02595dc475
commit 7d0127b376
6 changed files with 257 additions and 22 deletions

View File

@@ -68,8 +68,9 @@ dev = [
"pyright",
'typing-extensions>=4.0.0; python_version < "3.11"',
]
docs = ["mkdocs", "mkdocs-jupyter", "mkdocs-material", "mkdocstrings[python]"]
docs = ["mkdocs", "mkdocs-jupyter", "mkdocs-material", "mkdocstrings-python"]
clip = ["torch", "pillow", "open-clip-torch"]
siglip = ["torch", "pillow", "transformers>=4.41.0","sentencepiece"]
embeddings = [
"requests>=2.31.0",
"openai>=1.6.1",
@@ -87,6 +88,7 @@ embeddings = [
"botocore>=1.31.57",
'ibm-watsonx-ai>=1.1.2; python_version >= "3.10"',
"ollama>=0.3.0",
"sentencepiece"
]
azure = ["adlfs>=2024.2.0"]