From 2bc7dca3ca5ba0010e324d2306aea7e64ec42049 Mon Sep 17 00:00:00 2001
From: Rithik Kumar <46047011+rithikJha@users.noreply.github.com>
Date: Thu, 5 Sep 2024 22:19:08 +0530
Subject: [PATCH] docs: add changes to Embeddings-> Available models-> overview
page (#1596)
adding features and improvements to - Manage Embeddings page
Before:

After:



---
docs/mkdocs.yml | 7 ++
.../cohere_embedding.md | 15 +--
.../embeddings/default_embedding_functions.md | 98 ++++++++++++++-----
3 files changed, 91 insertions(+), 29 deletions(-)
diff --git a/docs/mkdocs.yml b/docs/mkdocs.yml
index 0230caef..bb0c456c 100644
--- a/docs/mkdocs.yml
+++ b/docs/mkdocs.yml
@@ -26,6 +26,7 @@ theme:
- content.code.copy
- content.tabs.link
- content.action.edit
+ - content.tooltips
- toc.follow
- navigation.top
- navigation.tabs
@@ -35,6 +36,7 @@ theme:
- navigation.instant
icon:
repo: fontawesome/brands/github
+ annotation: material/arrow-right-circle
custom_dir: overrides
plugins:
@@ -76,7 +78,12 @@ markdown_extensions:
- pymdownx.tabbed:
alternate_style: true
- md_in_html
+ - abbr
- attr_list
+ - pymdownx.snippets
+ - pymdownx.emoji:
+ emoji_index: !!python/name:material.extensions.emoji.twemoji
+ emoji_generator: !!python/name:material.extensions.emoji.to_svg
nav:
- Home:
diff --git a/docs/src/embeddings/available_embedding_models/text_embedding_functions/cohere_embedding.md b/docs/src/embeddings/available_embedding_models/text_embedding_functions/cohere_embedding.md
index 39eba18c..fd99f2ca 100644
--- a/docs/src/embeddings/available_embedding_models/text_embedding_functions/cohere_embedding.md
+++ b/docs/src/embeddings/available_embedding_models/text_embedding_functions/cohere_embedding.md
@@ -4,13 +4,14 @@ Using cohere API requires cohere package, which can be installed using `pip inst
You also need to set the `COHERE_API_KEY` environment variable to use the Cohere API.
Supported models are:
-* embed-english-v3.0
-* embed-multilingual-v3.0
-* embed-english-light-v3.0
-* embed-multilingual-light-v3.0
-* embed-english-v2.0
-* embed-english-light-v2.0
-* embed-multilingual-v2.0
+
+- embed-english-v3.0
+- embed-multilingual-v3.0
+- embed-english-light-v3.0
+- embed-multilingual-light-v3.0
+- embed-english-v2.0
+- embed-english-light-v2.0
+- embed-multilingual-v2.0
Supported parameters (to be passed in `create` method) are:
diff --git a/docs/src/embeddings/default_embedding_functions.md b/docs/src/embeddings/default_embedding_functions.md
index ced97048..5457dc9f 100644
--- a/docs/src/embeddings/default_embedding_functions.md
+++ b/docs/src/embeddings/default_embedding_functions.md
@@ -1,30 +1,84 @@
-There are various embedding functions available out of the box with LanceDB to manage your embeddings implicitly. We're actively working on adding other popular embedding APIs and models.
+# π Available Embedding Models
-## Text embedding functions
-Contains the text embedding functions registered by default.
+There are various embedding functions available out of the box with LanceDB to manage your embeddings implicitly. We're actively working on adding other popular embedding APIs and models. π
-* Embedding functions have an inbuilt rate limit handler wrapper for source and query embedding function calls that retry with exponential backoff.
-* Each `EmbeddingFunction` implementation automatically takes `max_retries` as an argument which has the default value of 7.
+Before jumping on the list of available models, let's understand how to get an embedding model initialized and configured to use in our code:
-**Available Text Embeddings**:
+!!! example "Example usage"
+ ```python
+ model = get_registry()
+ .get("openai")
+ .create(name="text-embedding-ada-002")
+ ```
-- [Sentence Transformers](available_embedding_models/text_embedding_functions/sentence_transformers.md)
-- [Huggingface Embedding Models](available_embedding_models/text_embedding_functions/huggingface_embedding.md)
-- [Ollama Embeddings](available_embedding_models/text_embedding_functions/ollama_embedding.md)
-- [OpenAI Embeddings](available_embedding_models/text_embedding_functions/openai_embedding.md)
-- [Instructor Embeddings](available_embedding_models/text_embedding_functions/instructor_embedding.md)
-- [Gemini Embeddings](available_embedding_models/text_embedding_functions/gemini_embedding.md)
-- [Cohere Embeddings](available_embedding_models/text_embedding_functions/cohere_embedding.md)
-- [Jina Embeddings](available_embedding_models/text_embedding_functions/jina_embedding.md)
-- [AWS Bedrock Text Embedding Functions](available_embedding_models/text_embedding_functions/aws_bedrock_embedding.md)
-- [IBM Watsonx.ai Embeddings](available_embedding_models/text_embedding_functions/ibm_watsonx_ai_embedding.md)
+Now let's understand the above syntax:
+```python
+model = get_registry().get("model_id").create(...params)
+```
+**Thisπ line effectively creates a configured instance of an `embedding function` with `model` of choice that is ready for use.**
+
+- `get_registry()` : This function call returns an instance of a `EmbeddingFunctionRegistry` object. This registry manages the registration and retrieval of embedding functions.
+
+- `.get("model_id")` : This method call on the registry object and retrieves the **embedding models functions** associated with the `"model_id"` (1) .
+ { .annotate }
+
+ 1. Hover over the names in table below to find out the `model_id` of different embedding functions.
+
+- `.create(...params)` : This method call is on the object returned by the `get` method. It instantiates an embedding model function using the **specified parameters**.
+
+??? question "What parameters does the `.create(...params)` method accepts?"
+ **Checkout the documentation of specific embedding models (links in the table belowπ) to know what parameters it takes**.
+
+!!! tip "Moving on"
+ Now that we know how to get the **desired embedding model** and use it in our code, let's explore the comprehensive **list** of embedding models **supported by LanceDB**, in the tables below.
+
+## Text Embedding Functions π
+These functions are registered by default to handle text embeddings.
+
+- π **Embedding functions** have an inbuilt rate limit handler wrapper for source and query embedding function calls that retry with **exponential backoff**.
+
+- π Each `EmbeddingFunction` implementation automatically takes `max_retries` as an argument which has the default value of 7.
+
+π **Available Text Embeddings**
+
+| **Embedding** :material-information-outline:{ title="Hover over the name to find out the model_id" } | **Description** | **Documentation** |
+|-----------|-------------|---------------|
+| [**Sentence Transformers**](available_embedding_models/text_embedding_functions/sentence_transformers.md "sentence-transformers") | π§ **SentenceTransformers** is a Python framework for state-of-the-art sentence, text, and image embeddings. | [
](available_embedding_models/text_embedding_functions/sentence_transformers.md)|
+| [**Huggingface Models**](available_embedding_models/text_embedding_functions/huggingface_embedding.md "huggingface") |π€ We offer support for all **Huggingface** models. The default model is `colbert-ir/colbertv2.0`. | [
](available_embedding_models/text_embedding_functions/huggingface_embedding.md) |
+| [**Ollama Embeddings**](available_embedding_models/text_embedding_functions/ollama_embedding.md "ollama") | π Generate embeddings via the **Ollama** python library. Ollama supports embedding models, making it possible to build RAG apps. | [
](available_embedding_models/text_embedding_functions/ollama_embedding.md)|
+| [**OpenAI Embeddings**](available_embedding_models/text_embedding_functions/openai_embedding.md "openai")| π **OpenAIβs** text embeddings measure the relatedness of text strings. **LanceDB** supports state-of-the-art embeddings from OpenAI. | [
](available_embedding_models/text_embedding_functions/openai_embedding.md)|
+| [**Instructor Embeddings**](available_embedding_models/text_embedding_functions/instructor_embedding.md "instructor") | π **Instructor**: An instruction-finetuned text embedding model that can generate text embeddings tailored to any task and domains by simply providing the task instruction, without any finetuning. | [
](available_embedding_models/text_embedding_functions/instructor_embedding.md) |
+| [**Gemini Embeddings**](available_embedding_models/text_embedding_functions/gemini_embedding.md "gemini-text") | π Googleβs Gemini API generates state-of-the-art embeddings for words, phrases, and sentences. | [
](available_embedding_models/text_embedding_functions/gemini_embedding.md) |
+| [**Cohere Embeddings**](available_embedding_models/text_embedding_functions/cohere_embedding.md "cohere") | π¬ This will help you get started with **Cohere** embedding models using LanceDB. Using cohere API requires cohere package. Install it via `pip`. | [
](available_embedding_models/text_embedding_functions/cohere_embedding.md) |
+| [**Jina Embeddings**](available_embedding_models/text_embedding_functions/jina_embedding.md "jina") | π World-class embedding models to improve your search and RAG systems. You will need **jina api key**. | [
](available_embedding_models/text_embedding_functions/jina_embedding.md) |
+| [ **AWS Bedrock Functions**](available_embedding_models/text_embedding_functions/aws_bedrock_embedding.md "bedrock-text") | βοΈ AWS Bedrock supports multiple base models for generating text embeddings. You need to setup the AWS credentials to use this embedding function. | [
](available_embedding_models/text_embedding_functions/aws_bedrock_embedding.md) |
+| [**IBM Watsonx.ai**](available_embedding_models/text_embedding_functions/ibm_watsonx_ai_embedding.md "watsonx") | π‘ Generate text embeddings using IBM's watsonx.ai platform. **Note**: watsonx.ai library is an optional dependency. | [
](available_embedding_models/text_embedding_functions/ibm_watsonx_ai_embedding.md) |
-## Multi-modal embedding functions
-Multi-modal embedding functions allow you to query your table using both images and text.
-**Available Multi-modal Embeddings** :
+[st-key]: "sentence-transformers"
+[hf-key]: "huggingface"
+[ollama-key]: "ollama"
+[openai-key]: "openai"
+[instructor-key]: "instructor"
+[gemini-key]: "gemini-text"
+[cohere-key]: "cohere"
+[jina-key]: "jina"
+[aws-key]: "bedrock-text"
+[watsonx-key]: "watsonx"
-- [OpenClip Embeddings](available_embedding_models/multimodal_embedding_functions/openclip_embedding.md)
-- [Imagebind Embeddings](available_embedding_models/multimodal_embedding_functions/imagebind_embedding.md)
-- [Jina Embeddings](available_embedding_models/multimodal_embedding_functions/jina_multimodal_embedding.md)
\ No newline at end of file
+
+## Multi-modal Embedding FunctionsπΌοΈ
+
+Multi-modal embedding functions allow you to query your table using both images and text. π¬πΌοΈ
+
+π **Available Multi-modal Embeddings**
+
+| Embedding :material-information-outline:{ title="Hover over the name to find out the model_id" } | Description | Documentation |
+|-----------|-------------|---------------|
+| [**OpenClip Embeddings**](available_embedding_models/multimodal_embedding_functions/openclip_embedding.md "open-clip") | π¨ We support CLIP model embeddings using the open source alternative, **open-clip** which supports various customizations. | [
](available_embedding_models/multimodal_embedding_functions/openclip_embedding.md) |
+| [**Imagebind Embeddings**](available_embedding_models/multimodal_embedding_functions/imagebind_embedding.md "imageind") | π We have support for **imagebind model embeddings**. You can download our version of the packaged model via - `pip install imagebind-packaged==0.1.2`. | [
](available_embedding_models/multimodal_embedding_functions/imagebind_embedding.md)|
+| [**Jina Multi-modal Embeddings**](available_embedding_models/multimodal_embedding_functions/jina_multimodal_embedding.md "jina") | π **Jina embeddings** can also be used to embed both **text** and **image** data, only some of the models support image data and you can check the detailed documentation. π | [
](available_embedding_models/multimodal_embedding_functions/jina_multimodal_embedding.md) |
+
+!!! note
+ If you'd like to request support for additional **embedding functions**, please feel free to open an issue on our LanceDB [GitHub issue page](https://github.com/lancedb/lancedb/issues).
\ No newline at end of file