docs: add changes to Embeddings-> Available models-> overview page (#1596)

adding features and improvements to - Manage Embeddings page Before: ![Screenshot 2024-09-04 223743](https://github.com/user-attachments/assets/f1e116b5-6ebb-4d59-9d29-b20084998cd0) After: ![Screenshot 2024-09-05 214214](https://github.com/user-attachments/assets/8c94318e-68af-447e-97e1-8153860a2914) ![Screenshot 2024-09-05 213623](https://github.com/user-attachments/assets/55c82770-6df9-4bab-9c5c-1ea1552138de) ![Screenshot 2024-09-05 215931](https://github.com/user-attachments/assets/9bfac7d4-16a6-454e-801e-50789ff75261)
2025-12-26 22:59:57 +00:00 · 2024-09-05 22:19:08 +05:30
parent b24810a011
commit 2bc7dca3ca
3 changed files with 91 additions and 29 deletions
--- a/docs/mkdocs.yml
+++ b/docs/mkdocs.yml
@@ -26,6 +26,7 @@ theme:
    - content.code.copy
    - content.tabs.link
    - content.action.edit
+    - content.tooltips
    - toc.follow
    - navigation.top
    - navigation.tabs
@@ -35,6 +36,7 @@ theme:
    - navigation.instant
  icon:
    repo: fontawesome/brands/github
+    annotation: material/arrow-right-circle
  custom_dir: overrides

 plugins:
@@ -76,7 +78,12 @@ markdown_extensions:
  - pymdownx.tabbed:
      alternate_style: true
  - md_in_html
+  - abbr
  - attr_list
+  - pymdownx.snippets
+  - pymdownx.emoji:
+      emoji_index: !!python/name:material.extensions.emoji.twemoji
+      emoji_generator: !!python/name:material.extensions.emoji.to_svg

 nav:
  - Home:
--- a/docs/src/embeddings/available_embedding_models/text_embedding_functions/cohere_embedding.md
+++ b/docs/src/embeddings/available_embedding_models/text_embedding_functions/cohere_embedding.md
@@ -4,13 +4,14 @@ Using cohere API requires cohere package, which can be installed using `pip inst
 You also need to set the `COHERE_API_KEY` environment variable to use the Cohere API.

 Supported models are:
-* embed-english-v3.0
-* embed-multilingual-v3.0
-* embed-english-light-v3.0
-* embed-multilingual-light-v3.0
-* embed-english-v2.0
-* embed-english-light-v2.0
-* embed-multilingual-v2.0
+
+- embed-english-v3.0
+- embed-multilingual-v3.0
+- embed-english-light-v3.0
+- embed-multilingual-light-v3.0
+- embed-english-v2.0
+- embed-english-light-v2.0
+- embed-multilingual-v2.0


 Supported parameters (to be passed in `create` method) are:
--- a/docs/src/embeddings/default_embedding_functions.md
+++ b/docs/src/embeddings/default_embedding_functions.md
@@ -1,30 +1,84 @@
-There are various embedding functions available out of the box with LanceDB to manage your embeddings implicitly. We're actively working on adding other popular embedding APIs and models.
+# 📚 Available Embedding Models

-## Text embedding functions
-Contains the text embedding functions registered by default.
+There are various embedding functions available out of the box with LanceDB to manage your embeddings implicitly. We're actively working on adding other popular embedding APIs and models. 🚀

-* Embedding functions have an inbuilt rate limit handler wrapper for source and query embedding function calls that retry with exponential backoff. 
-* Each `EmbeddingFunction` implementation automatically takes `max_retries` as an argument which has the default value of 7.
+Before jumping on the list of available models, let's understand how to get an embedding model initialized and configured to use in our code: 

-**Available Text Embeddings**:
+!!! example "Example usage"
+    ```python
+    model = get_registry()
+              .get("openai")
+              .create(name="text-embedding-ada-002")
+    ```

- [Sentence Transformers](available_embedding_models/text_embedding_functions/sentence_transformers.md)
- [Huggingface Embedding Models](available_embedding_models/text_embedding_functions/huggingface_embedding.md)
- [Ollama Embeddings](available_embedding_models/text_embedding_functions/ollama_embedding.md)
- [OpenAI Embeddings](available_embedding_models/text_embedding_functions/openai_embedding.md)
- [Instructor Embeddings](available_embedding_models/text_embedding_functions/instructor_embedding.md)
- [Gemini Embeddings](available_embedding_models/text_embedding_functions/gemini_embedding.md)
- [Cohere Embeddings](available_embedding_models/text_embedding_functions/cohere_embedding.md)
- [Jina Embeddings](available_embedding_models/text_embedding_functions/jina_embedding.md)
- [AWS Bedrock Text Embedding Functions](available_embedding_models/text_embedding_functions/aws_bedrock_embedding.md)
- [IBM Watsonx.ai Embeddings](available_embedding_models/text_embedding_functions/ibm_watsonx_ai_embedding.md)
+Now let's understand the above syntax: 
+```python
+model = get_registry().get("model_id").create(...params)
+```
+**This👆 line effectively creates a configured instance of an `embedding function` with `model` of choice that is ready for use.**
+
+- `get_registry()` :  This function call returns an instance of a `EmbeddingFunctionRegistry` object. This registry manages the registration and retrieval of embedding functions.
+
+- `.get("model_id")` : This method call on the registry object and retrieves the **embedding models functions** associated with the `"model_id"` (1) .
+    { .annotate }
+
+    1.  Hover over the names in table below to find out the `model_id` of different embedding functions.
+
+- `.create(...params)` : This method call is on the object returned by the `get` method. It instantiates an embedding model function using the **specified parameters**. 
+
+??? question "What parameters does the `.create(...params)` method accepts?"
+    **Checkout the documentation of specific embedding models (links in the table below👇) to know what parameters it takes**.
+
+!!! tip "Moving on"
+    Now that we know how to get the **desired embedding model** and use it in our code, let's explore the comprehensive **list** of embedding models **supported by LanceDB**, in the tables below.
+
+## Text Embedding Functions 📝 
+These functions are registered by default to handle text embeddings.
+
+- 🔄 **Embedding functions** have an inbuilt rate limit handler wrapper for source and query embedding function calls that retry with **exponential backoff**. 
+
+- 🌕 Each `EmbeddingFunction` implementation automatically takes `max_retries` as an argument which has the default value of 7. 
+
+🌟 **Available Text Embeddings**
+
+| **Embedding** :material-information-outline:{ title="Hover over the name to find out the model_id" } | **Description** | **Documentation** |
+|-----------|-------------|---------------|
+| [**Sentence Transformers**](available_embedding_models/text_embedding_functions/sentence_transformers.md "sentence-transformers")  | 🧠 **SentenceTransformers** is a Python framework for state-of-the-art sentence, text, and image embeddings. | [<img src="https://raw.githubusercontent.com/lancedb/assets/main/docs/assets/logos/sbert_2.png" alt="Sentence Transformers Icon" width="90" height="35">](available_embedding_models/text_embedding_functions/sentence_transformers.md)|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
+| [**Huggingface Models**](available_embedding_models/text_embedding_functions/huggingface_embedding.md "huggingface") |🤗 We offer support for all **Huggingface** models. The default model is `colbert-ir/colbertv2.0`. | [<img src="https://raw.githubusercontent.com/lancedb/assets/main/docs/assets/logos/hugging_face.png" alt="Huggingface Icon" width="130" height="35">](available_embedding_models/text_embedding_functions/huggingface_embedding.md) |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
+| [**Ollama Embeddings**](available_embedding_models/text_embedding_functions/ollama_embedding.md "ollama") | 🔍 Generate embeddings via the **Ollama** python library. Ollama supports embedding models, making it possible to build RAG apps. | [<img src="https://raw.githubusercontent.com/lancedb/assets/main/docs/assets/logos/Ollama.png" alt="Ollama Icon" width="110" height="35">](available_embedding_models/text_embedding_functions/ollama_embedding.md)|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
+| [**OpenAI Embeddings**](available_embedding_models/text_embedding_functions/openai_embedding.md "openai")| 🔑 **OpenAI’s** text embeddings measure the relatedness of text strings. **LanceDB** supports state-of-the-art embeddings from OpenAI. | [<img src="https://raw.githubusercontent.com/lancedb/assets/main/docs/assets/logos/openai.png" alt="OpenAI Icon" width="100" height="35">](available_embedding_models/text_embedding_functions/openai_embedding.md)|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
+| [**Instructor Embeddings**](available_embedding_models/text_embedding_functions/instructor_embedding.md "instructor") | 📚 **Instructor**: An instruction-finetuned text embedding model that can generate text embeddings tailored to any task and domains by simply providing the task instruction, without any finetuning. | [<img src="https://raw.githubusercontent.com/lancedb/assets/main/docs/assets/logos/instructor_embedding.png" alt="Instructor Embedding Icon" width="140" height="35">](available_embedding_models/text_embedding_functions/instructor_embedding.md) |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
+| [**Gemini Embeddings**](available_embedding_models/text_embedding_functions/gemini_embedding.md "gemini-text") | 🌌 Google’s Gemini API generates state-of-the-art embeddings for words, phrases, and sentences. | [<img src="https://raw.githubusercontent.com/lancedb/assets/main/docs/assets/logos/gemini.png" alt="Gemini Icon" width="95" height="35">](available_embedding_models/text_embedding_functions/gemini_embedding.md) |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
+| [**Cohere Embeddings**](available_embedding_models/text_embedding_functions/cohere_embedding.md "cohere") | 💬 This will help you get started with **Cohere** embedding models using LanceDB. Using cohere API requires cohere package. Install it via `pip`. | [<img src="https://raw.githubusercontent.com/lancedb/assets/main/docs/assets/logos/cohere.png" alt="Cohere Icon" width="140" height="35">](available_embedding_models/text_embedding_functions/cohere_embedding.md) |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
+| [**Jina Embeddings**](available_embedding_models/text_embedding_functions/jina_embedding.md "jina") | 🔗 World-class embedding models to improve your search and RAG systems. You will need **jina api key**. | [<img src="https://raw.githubusercontent.com/lancedb/assets/main/docs/assets/logos/jina.png" alt="Jina Icon" width="90" height="35">](available_embedding_models/text_embedding_functions/jina_embedding.md) |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
+| [ **AWS Bedrock Functions**](available_embedding_models/text_embedding_functions/aws_bedrock_embedding.md "bedrock-text") | ☁️ AWS Bedrock supports multiple base models for generating text embeddings. You need to setup the AWS credentials to use this embedding function. | [<img src="https://raw.githubusercontent.com/lancedb/assets/main/docs/assets/logos/aws_bedrock.png" alt="AWS Bedrock Icon" width="120" height="35">](available_embedding_models/text_embedding_functions/aws_bedrock_embedding.md) |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
+| [**IBM Watsonx.ai**](available_embedding_models/text_embedding_functions/ibm_watsonx_ai_embedding.md "watsonx") | 💡 Generate text embeddings using IBM's watsonx.ai platform. **Note**: watsonx.ai library is an optional dependency. | [<img src="https://raw.githubusercontent.com/lancedb/assets/main/docs/assets/logos/watsonx.png" alt="Watsonx Icon" width="140" height="35">](available_embedding_models/text_embedding_functions/ibm_watsonx_ai_embedding.md) |


-## Multi-modal embedding functions
-Multi-modal embedding functions allow you to query your table using both images and text.

-**Available Multi-modal Embeddings** :
+[st-key]: "sentence-transformers"
+[hf-key]: "huggingface"
+[ollama-key]: "ollama"
+[openai-key]: "openai"
+[instructor-key]: "instructor"
+[gemini-key]: "gemini-text"
+[cohere-key]: "cohere"
+[jina-key]: "jina"
+[aws-key]: "bedrock-text"
+[watsonx-key]: "watsonx"

- [OpenClip Embeddings](available_embedding_models/multimodal_embedding_functions/openclip_embedding.md)
- [Imagebind Embeddings](available_embedding_models/multimodal_embedding_functions/imagebind_embedding.md)
- [Jina Embeddings](available_embedding_models/multimodal_embedding_functions/jina_multimodal_embedding.md)
+
+## Multi-modal Embedding Functions🖼️ 
+
+Multi-modal embedding functions allow you to query your table using both images and text. 💬🖼️
+
+🌐 **Available Multi-modal Embeddings**
+
+| Embedding :material-information-outline:{ title="Hover over the name to find out the model_id" }  | Description | Documentation  |
+|-----------|-------------|---------------|
+| [**OpenClip Embeddings**](available_embedding_models/multimodal_embedding_functions/openclip_embedding.md "open-clip") | 🎨 We support CLIP model embeddings using the open source alternative, **open-clip** which supports various customizations. | [<img src="https://raw.githubusercontent.com/lancedb/assets/main/docs/assets/logos/openclip_github.png" alt="openclip Icon" width="150" height="35">](available_embedding_models/multimodal_embedding_functions/openclip_embedding.md) |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
+| [**Imagebind Embeddings**](available_embedding_models/multimodal_embedding_functions/imagebind_embedding.md "imageind") | 🌌  We have support for **imagebind model embeddings**. You can download our version of the packaged model via - `pip install imagebind-packaged==0.1.2`. | [<img src="https://raw.githubusercontent.com/lancedb/assets/main/docs/assets/logos/imagebind_meta.png" alt="imagebind Icon" width="150" height="35">](available_embedding_models/multimodal_embedding_functions/imagebind_embedding.md)|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
+| [**Jina Multi-modal Embeddings**](available_embedding_models/multimodal_embedding_functions/jina_multimodal_embedding.md "jina") | 🔗 **Jina embeddings** can also be used to embed both **text** and **image** data, only some of the models support image data and you can check the detailed documentation. 👉 | [<img src="https://raw.githubusercontent.com/lancedb/assets/main/docs/assets/logos/jina.png" alt="jina Icon" width="90" height="35">](available_embedding_models/multimodal_embedding_functions/jina_multimodal_embedding.md) |
+
+!!! note
+    If you'd like to request support for additional **embedding functions**, please feel free to open an issue on our LanceDB [GitHub issue page](https://github.com/lancedb/lancedb/issues).