docs: add changes to Embeddings-> Available models-> overview page (#1596)

adding features and improvements to - Manage Embeddings page

Before:
![Screenshot 2024-09-04
223743](https://github.com/user-attachments/assets/f1e116b5-6ebb-4d59-9d29-b20084998cd0)

After:



![Screenshot 2024-09-05
214214](https://github.com/user-attachments/assets/8c94318e-68af-447e-97e1-8153860a2914)

![Screenshot 2024-09-05
213623](https://github.com/user-attachments/assets/55c82770-6df9-4bab-9c5c-1ea1552138de)

![Screenshot 2024-09-05
215931](https://github.com/user-attachments/assets/9bfac7d4-16a6-454e-801e-50789ff75261)
This commit is contained in:
Rithik Kumar
2024-09-05 22:19:08 +05:30
committed by GitHub
parent b24810a011
commit 2bc7dca3ca
3 changed files with 91 additions and 29 deletions

View File

@@ -26,6 +26,7 @@ theme:
- content.code.copy
- content.tabs.link
- content.action.edit
- content.tooltips
- toc.follow
- navigation.top
- navigation.tabs
@@ -35,6 +36,7 @@ theme:
- navigation.instant
icon:
repo: fontawesome/brands/github
annotation: material/arrow-right-circle
custom_dir: overrides
plugins:
@@ -76,7 +78,12 @@ markdown_extensions:
- pymdownx.tabbed:
alternate_style: true
- md_in_html
- abbr
- attr_list
- pymdownx.snippets
- pymdownx.emoji:
emoji_index: !!python/name:material.extensions.emoji.twemoji
emoji_generator: !!python/name:material.extensions.emoji.to_svg
nav:
- Home:

View File

@@ -4,13 +4,14 @@ Using cohere API requires cohere package, which can be installed using `pip inst
You also need to set the `COHERE_API_KEY` environment variable to use the Cohere API.
Supported models are:
* embed-english-v3.0
* embed-multilingual-v3.0
* embed-english-light-v3.0
* embed-multilingual-light-v3.0
* embed-english-v2.0
* embed-english-light-v2.0
* embed-multilingual-v2.0
- embed-english-v3.0
- embed-multilingual-v3.0
- embed-english-light-v3.0
- embed-multilingual-light-v3.0
- embed-english-v2.0
- embed-english-light-v2.0
- embed-multilingual-v2.0
Supported parameters (to be passed in `create` method) are:

View File

@@ -1,30 +1,84 @@
There are various embedding functions available out of the box with LanceDB to manage your embeddings implicitly. We're actively working on adding other popular embedding APIs and models.
# 📚 Available Embedding Models
## Text embedding functions
Contains the text embedding functions registered by default.
There are various embedding functions available out of the box with LanceDB to manage your embeddings implicitly. We're actively working on adding other popular embedding APIs and models. 🚀
* Embedding functions have an inbuilt rate limit handler wrapper for source and query embedding function calls that retry with exponential backoff.
* Each `EmbeddingFunction` implementation automatically takes `max_retries` as an argument which has the default value of 7.
Before jumping on the list of available models, let's understand how to get an embedding model initialized and configured to use in our code:
**Available Text Embeddings**:
!!! example "Example usage"
```python
model = get_registry()
.get("openai")
.create(name="text-embedding-ada-002")
```
- [Sentence Transformers](available_embedding_models/text_embedding_functions/sentence_transformers.md)
- [Huggingface Embedding Models](available_embedding_models/text_embedding_functions/huggingface_embedding.md)
- [Ollama Embeddings](available_embedding_models/text_embedding_functions/ollama_embedding.md)
- [OpenAI Embeddings](available_embedding_models/text_embedding_functions/openai_embedding.md)
- [Instructor Embeddings](available_embedding_models/text_embedding_functions/instructor_embedding.md)
- [Gemini Embeddings](available_embedding_models/text_embedding_functions/gemini_embedding.md)
- [Cohere Embeddings](available_embedding_models/text_embedding_functions/cohere_embedding.md)
- [Jina Embeddings](available_embedding_models/text_embedding_functions/jina_embedding.md)
- [AWS Bedrock Text Embedding Functions](available_embedding_models/text_embedding_functions/aws_bedrock_embedding.md)
- [IBM Watsonx.ai Embeddings](available_embedding_models/text_embedding_functions/ibm_watsonx_ai_embedding.md)
Now let's understand the above syntax:
```python
model = get_registry().get("model_id").create(...params)
```
**This👆 line effectively creates a configured instance of an `embedding function` with `model` of choice that is ready for use.**
- `get_registry()` : This function call returns an instance of a `EmbeddingFunctionRegistry` object. This registry manages the registration and retrieval of embedding functions.
- `.get("model_id")` : This method call on the registry object and retrieves the **embedding models functions** associated with the `"model_id"` (1) .
{ .annotate }
1. Hover over the names in table below to find out the `model_id` of different embedding functions.
- `.create(...params)` : This method call is on the object returned by the `get` method. It instantiates an embedding model function using the **specified parameters**.
??? question "What parameters does the `.create(...params)` method accepts?"
**Checkout the documentation of specific embedding models (links in the table below👇) to know what parameters it takes**.
!!! tip "Moving on"
Now that we know how to get the **desired embedding model** and use it in our code, let's explore the comprehensive **list** of embedding models **supported by LanceDB**, in the tables below.
## Text Embedding Functions 📝
These functions are registered by default to handle text embeddings.
- 🔄 **Embedding functions** have an inbuilt rate limit handler wrapper for source and query embedding function calls that retry with **exponential backoff**.
- 🌕 Each `EmbeddingFunction` implementation automatically takes `max_retries` as an argument which has the default value of 7.
🌟 **Available Text Embeddings**
| **Embedding** :material-information-outline:{ title="Hover over the name to find out the model_id" } | **Description** | **Documentation** |
|-----------|-------------|---------------|
| [**Sentence Transformers**](available_embedding_models/text_embedding_functions/sentence_transformers.md "sentence-transformers") | 🧠 **SentenceTransformers** is a Python framework for state-of-the-art sentence, text, and image embeddings. | [<img src="https://raw.githubusercontent.com/lancedb/assets/main/docs/assets/logos/sbert_2.png" alt="Sentence Transformers Icon" width="90" height="35">](available_embedding_models/text_embedding_functions/sentence_transformers.md)|
| [**Huggingface Models**](available_embedding_models/text_embedding_functions/huggingface_embedding.md "huggingface") |🤗 We offer support for all **Huggingface** models. The default model is `colbert-ir/colbertv2.0`. | [<img src="https://raw.githubusercontent.com/lancedb/assets/main/docs/assets/logos/hugging_face.png" alt="Huggingface Icon" width="130" height="35">](available_embedding_models/text_embedding_functions/huggingface_embedding.md) |
| [**Ollama Embeddings**](available_embedding_models/text_embedding_functions/ollama_embedding.md "ollama") | 🔍 Generate embeddings via the **Ollama** python library. Ollama supports embedding models, making it possible to build RAG apps. | [<img src="https://raw.githubusercontent.com/lancedb/assets/main/docs/assets/logos/Ollama.png" alt="Ollama Icon" width="110" height="35">](available_embedding_models/text_embedding_functions/ollama_embedding.md)|
| [**OpenAI Embeddings**](available_embedding_models/text_embedding_functions/openai_embedding.md "openai")| 🔑 **OpenAIs** text embeddings measure the relatedness of text strings. **LanceDB** supports state-of-the-art embeddings from OpenAI. | [<img src="https://raw.githubusercontent.com/lancedb/assets/main/docs/assets/logos/openai.png" alt="OpenAI Icon" width="100" height="35">](available_embedding_models/text_embedding_functions/openai_embedding.md)|
| [**Instructor Embeddings**](available_embedding_models/text_embedding_functions/instructor_embedding.md "instructor") | 📚 **Instructor**: An instruction-finetuned text embedding model that can generate text embeddings tailored to any task and domains by simply providing the task instruction, without any finetuning. | [<img src="https://raw.githubusercontent.com/lancedb/assets/main/docs/assets/logos/instructor_embedding.png" alt="Instructor Embedding Icon" width="140" height="35">](available_embedding_models/text_embedding_functions/instructor_embedding.md) |
| [**Gemini Embeddings**](available_embedding_models/text_embedding_functions/gemini_embedding.md "gemini-text") | 🌌 Googles Gemini API generates state-of-the-art embeddings for words, phrases, and sentences. | [<img src="https://raw.githubusercontent.com/lancedb/assets/main/docs/assets/logos/gemini.png" alt="Gemini Icon" width="95" height="35">](available_embedding_models/text_embedding_functions/gemini_embedding.md) |
| [**Cohere Embeddings**](available_embedding_models/text_embedding_functions/cohere_embedding.md "cohere") | 💬 This will help you get started with **Cohere** embedding models using LanceDB. Using cohere API requires cohere package. Install it via `pip`. | [<img src="https://raw.githubusercontent.com/lancedb/assets/main/docs/assets/logos/cohere.png" alt="Cohere Icon" width="140" height="35">](available_embedding_models/text_embedding_functions/cohere_embedding.md) |
| [**Jina Embeddings**](available_embedding_models/text_embedding_functions/jina_embedding.md "jina") | 🔗 World-class embedding models to improve your search and RAG systems. You will need **jina api key**. | [<img src="https://raw.githubusercontent.com/lancedb/assets/main/docs/assets/logos/jina.png" alt="Jina Icon" width="90" height="35">](available_embedding_models/text_embedding_functions/jina_embedding.md) |
| [ **AWS Bedrock Functions**](available_embedding_models/text_embedding_functions/aws_bedrock_embedding.md "bedrock-text") | ☁️ AWS Bedrock supports multiple base models for generating text embeddings. You need to setup the AWS credentials to use this embedding function. | [<img src="https://raw.githubusercontent.com/lancedb/assets/main/docs/assets/logos/aws_bedrock.png" alt="AWS Bedrock Icon" width="120" height="35">](available_embedding_models/text_embedding_functions/aws_bedrock_embedding.md) |
| [**IBM Watsonx.ai**](available_embedding_models/text_embedding_functions/ibm_watsonx_ai_embedding.md "watsonx") | 💡 Generate text embeddings using IBM's watsonx.ai platform. **Note**: watsonx.ai library is an optional dependency. | [<img src="https://raw.githubusercontent.com/lancedb/assets/main/docs/assets/logos/watsonx.png" alt="Watsonx Icon" width="140" height="35">](available_embedding_models/text_embedding_functions/ibm_watsonx_ai_embedding.md) |
## Multi-modal embedding functions
Multi-modal embedding functions allow you to query your table using both images and text.
**Available Multi-modal Embeddings** :
[st-key]: "sentence-transformers"
[hf-key]: "huggingface"
[ollama-key]: "ollama"
[openai-key]: "openai"
[instructor-key]: "instructor"
[gemini-key]: "gemini-text"
[cohere-key]: "cohere"
[jina-key]: "jina"
[aws-key]: "bedrock-text"
[watsonx-key]: "watsonx"
- [OpenClip Embeddings](available_embedding_models/multimodal_embedding_functions/openclip_embedding.md)
- [Imagebind Embeddings](available_embedding_models/multimodal_embedding_functions/imagebind_embedding.md)
- [Jina Embeddings](available_embedding_models/multimodal_embedding_functions/jina_multimodal_embedding.md)
## Multi-modal Embedding Functions🖼️
Multi-modal embedding functions allow you to query your table using both images and text. 💬🖼️
🌐 **Available Multi-modal Embeddings**
| Embedding :material-information-outline:{ title="Hover over the name to find out the model_id" } | Description | Documentation |
|-----------|-------------|---------------|
| [**OpenClip Embeddings**](available_embedding_models/multimodal_embedding_functions/openclip_embedding.md "open-clip") | 🎨 We support CLIP model embeddings using the open source alternative, **open-clip** which supports various customizations. | [<img src="https://raw.githubusercontent.com/lancedb/assets/main/docs/assets/logos/openclip_github.png" alt="openclip Icon" width="150" height="35">](available_embedding_models/multimodal_embedding_functions/openclip_embedding.md) |
| [**Imagebind Embeddings**](available_embedding_models/multimodal_embedding_functions/imagebind_embedding.md "imageind") | 🌌 We have support for **imagebind model embeddings**. You can download our version of the packaged model via - `pip install imagebind-packaged==0.1.2`. | [<img src="https://raw.githubusercontent.com/lancedb/assets/main/docs/assets/logos/imagebind_meta.png" alt="imagebind Icon" width="150" height="35">](available_embedding_models/multimodal_embedding_functions/imagebind_embedding.md)|
| [**Jina Multi-modal Embeddings**](available_embedding_models/multimodal_embedding_functions/jina_multimodal_embedding.md "jina") | 🔗 **Jina embeddings** can also be used to embed both **text** and **image** data, only some of the models support image data and you can check the detailed documentation. 👉 | [<img src="https://raw.githubusercontent.com/lancedb/assets/main/docs/assets/logos/jina.png" alt="jina Icon" width="90" height="35">](available_embedding_models/multimodal_embedding_functions/jina_multimodal_embedding.md) |
!!! note
If you'd like to request support for additional **embedding functions**, please feel free to open an issue on our LanceDB [GitHub issue page](https://github.com/lancedb/lancedb/issues).