feat(embeddings): add siglip embedding support to lancedb (#2499)

###  Summary

This PR adds **SigLIP** (Sigmoid Loss Image Pretraining) as a new
embedding model in the LanceDB embedding registry. SigLIP improves
image-text alignment performance using sigmoid-based contrastive loss
and offers robust zero-shot generalization.

Fixes #2498 

### What’s Implemented

#### 1. `SigLIP` Embedding Class

* Added `SigLIP` support under `python/lancedb/embeddings/siglip.py`
* Implements:

  * `compute_source_embeddings`
  * `_batch_generate_embeddings`
  * Normalization logic
  * Batch-wise progress logging for image embedding

#### 2. Registry Integration

* Registered `SigLIP` in `embeddings/__init__.py`
* `SigLIP` now usable via `connect(..., embedding="siglip")`

#### 3. Evaluation Benchmark Support

* Added SigLIP to `test_embeddings_slow.py` for side-by-side
benchmarking with OpenCLIP and ImageBind


###  New Test Methods

####  `test_siglip`

* End-to-end test to verify embeddings table creation and vector shape
for SigLIP
![WhatsApp Image 2025-07-10 at 18 00
27_a3368163](https://github.com/user-attachments/assets/e5582ee1-80a3-43d7-a7a1-26ceecce9f4d)


####  `test_siglip_vs_openclip_vs_imagebind_benchmark_full`

* Benchmarks:

  * **Recall\@1 / 5 / 10**
  * **mAP (Mean Average Precision)**
  * **Embedding & Search Latency**
  * Dimensionality reporting
![WhatsApp Image 2025-07-10 at 18 12
13_22c67a84](https://github.com/user-attachments/assets/455bf30f-62b7-4684-a3f3-ad52e2a1ffe5)


###  Notes

* SigLIP outputs 768D embeddings (vs 512D for OpenCLIP)
* Benchmark shows competitive performance despite higher dimensionality
* I'm still new to contributing to open-source and learning as I go.
Please feel free to suggest any improvements — I'm happy to make
changes!
This commit is contained in:
Poornachandra.A.N
2025-08-05 00:12:39 +05:30
committed by GitHub
parent 02595dc475
commit 7d0127b376
6 changed files with 257 additions and 22 deletions

3
python/.gitignore vendored
View File

@@ -1,2 +1,3 @@
# Test data created by some example tests
data/
data/
_lancedb.pyd