From 2bc7dca3ca5ba0010e324d2306aea7e64ec42049 Mon Sep 17 00:00:00 2001 From: Rithik Kumar <46047011+rithikJha@users.noreply.github.com> Date: Thu, 5 Sep 2024 22:19:08 +0530 Subject: [PATCH] docs: add changes to Embeddings-> Available models-> overview page (#1596) adding features and improvements to - Manage Embeddings page Before: ![Screenshot 2024-09-04 223743](https://github.com/user-attachments/assets/f1e116b5-6ebb-4d59-9d29-b20084998cd0) After: ![Screenshot 2024-09-05 214214](https://github.com/user-attachments/assets/8c94318e-68af-447e-97e1-8153860a2914) ![Screenshot 2024-09-05 213623](https://github.com/user-attachments/assets/55c82770-6df9-4bab-9c5c-1ea1552138de) ![Screenshot 2024-09-05 215931](https://github.com/user-attachments/assets/9bfac7d4-16a6-454e-801e-50789ff75261) --- docs/mkdocs.yml | 7 ++ .../cohere_embedding.md | 15 +-- .../embeddings/default_embedding_functions.md | 98 ++++++++++++++----- 3 files changed, 91 insertions(+), 29 deletions(-) diff --git a/docs/mkdocs.yml b/docs/mkdocs.yml index 0230caef..bb0c456c 100644 --- a/docs/mkdocs.yml +++ b/docs/mkdocs.yml @@ -26,6 +26,7 @@ theme: - content.code.copy - content.tabs.link - content.action.edit + - content.tooltips - toc.follow - navigation.top - navigation.tabs @@ -35,6 +36,7 @@ theme: - navigation.instant icon: repo: fontawesome/brands/github + annotation: material/arrow-right-circle custom_dir: overrides plugins: @@ -76,7 +78,12 @@ markdown_extensions: - pymdownx.tabbed: alternate_style: true - md_in_html + - abbr - attr_list + - pymdownx.snippets + - pymdownx.emoji: + emoji_index: !!python/name:material.extensions.emoji.twemoji + emoji_generator: !!python/name:material.extensions.emoji.to_svg nav: - Home: diff --git a/docs/src/embeddings/available_embedding_models/text_embedding_functions/cohere_embedding.md b/docs/src/embeddings/available_embedding_models/text_embedding_functions/cohere_embedding.md index 39eba18c..fd99f2ca 100644 --- a/docs/src/embeddings/available_embedding_models/text_embedding_functions/cohere_embedding.md +++ b/docs/src/embeddings/available_embedding_models/text_embedding_functions/cohere_embedding.md @@ -4,13 +4,14 @@ Using cohere API requires cohere package, which can be installed using `pip inst You also need to set the `COHERE_API_KEY` environment variable to use the Cohere API. Supported models are: -* embed-english-v3.0 -* embed-multilingual-v3.0 -* embed-english-light-v3.0 -* embed-multilingual-light-v3.0 -* embed-english-v2.0 -* embed-english-light-v2.0 -* embed-multilingual-v2.0 + +- embed-english-v3.0 +- embed-multilingual-v3.0 +- embed-english-light-v3.0 +- embed-multilingual-light-v3.0 +- embed-english-v2.0 +- embed-english-light-v2.0 +- embed-multilingual-v2.0 Supported parameters (to be passed in `create` method) are: diff --git a/docs/src/embeddings/default_embedding_functions.md b/docs/src/embeddings/default_embedding_functions.md index ced97048..5457dc9f 100644 --- a/docs/src/embeddings/default_embedding_functions.md +++ b/docs/src/embeddings/default_embedding_functions.md @@ -1,30 +1,84 @@ -There are various embedding functions available out of the box with LanceDB to manage your embeddings implicitly. We're actively working on adding other popular embedding APIs and models. +# πŸ“š Available Embedding Models -## Text embedding functions -Contains the text embedding functions registered by default. +There are various embedding functions available out of the box with LanceDB to manage your embeddings implicitly. We're actively working on adding other popular embedding APIs and models. πŸš€ -* Embedding functions have an inbuilt rate limit handler wrapper for source and query embedding function calls that retry with exponential backoff. -* Each `EmbeddingFunction` implementation automatically takes `max_retries` as an argument which has the default value of 7. +Before jumping on the list of available models, let's understand how to get an embedding model initialized and configured to use in our code: -**Available Text Embeddings**: +!!! example "Example usage" + ```python + model = get_registry() + .get("openai") + .create(name="text-embedding-ada-002") + ``` -- [Sentence Transformers](available_embedding_models/text_embedding_functions/sentence_transformers.md) -- [Huggingface Embedding Models](available_embedding_models/text_embedding_functions/huggingface_embedding.md) -- [Ollama Embeddings](available_embedding_models/text_embedding_functions/ollama_embedding.md) -- [OpenAI Embeddings](available_embedding_models/text_embedding_functions/openai_embedding.md) -- [Instructor Embeddings](available_embedding_models/text_embedding_functions/instructor_embedding.md) -- [Gemini Embeddings](available_embedding_models/text_embedding_functions/gemini_embedding.md) -- [Cohere Embeddings](available_embedding_models/text_embedding_functions/cohere_embedding.md) -- [Jina Embeddings](available_embedding_models/text_embedding_functions/jina_embedding.md) -- [AWS Bedrock Text Embedding Functions](available_embedding_models/text_embedding_functions/aws_bedrock_embedding.md) -- [IBM Watsonx.ai Embeddings](available_embedding_models/text_embedding_functions/ibm_watsonx_ai_embedding.md) +Now let's understand the above syntax: +```python +model = get_registry().get("model_id").create(...params) +``` +**ThisπŸ‘† line effectively creates a configured instance of an `embedding function` with `model` of choice that is ready for use.** + +- `get_registry()` : This function call returns an instance of a `EmbeddingFunctionRegistry` object. This registry manages the registration and retrieval of embedding functions. + +- `.get("model_id")` : This method call on the registry object and retrieves the **embedding models functions** associated with the `"model_id"` (1) . + { .annotate } + + 1. Hover over the names in table below to find out the `model_id` of different embedding functions. + +- `.create(...params)` : This method call is on the object returned by the `get` method. It instantiates an embedding model function using the **specified parameters**. + +??? question "What parameters does the `.create(...params)` method accepts?" + **Checkout the documentation of specific embedding models (links in the table belowπŸ‘‡) to know what parameters it takes**. + +!!! tip "Moving on" + Now that we know how to get the **desired embedding model** and use it in our code, let's explore the comprehensive **list** of embedding models **supported by LanceDB**, in the tables below. + +## Text Embedding Functions πŸ“ +These functions are registered by default to handle text embeddings. + +- πŸ”„ **Embedding functions** have an inbuilt rate limit handler wrapper for source and query embedding function calls that retry with **exponential backoff**. + +- πŸŒ• Each `EmbeddingFunction` implementation automatically takes `max_retries` as an argument which has the default value of 7. + +🌟 **Available Text Embeddings** + +| **Embedding** :material-information-outline:{ title="Hover over the name to find out the model_id" } | **Description** | **Documentation** | +|-----------|-------------|---------------| +| [**Sentence Transformers**](available_embedding_models/text_embedding_functions/sentence_transformers.md "sentence-transformers") | 🧠 **SentenceTransformers** is a Python framework for state-of-the-art sentence, text, and image embeddings. | [Sentence Transformers Icon](available_embedding_models/text_embedding_functions/sentence_transformers.md)| +| [**Huggingface Models**](available_embedding_models/text_embedding_functions/huggingface_embedding.md "huggingface") |πŸ€— We offer support for all **Huggingface** models. The default model is `colbert-ir/colbertv2.0`. | [Huggingface Icon](available_embedding_models/text_embedding_functions/huggingface_embedding.md) | +| [**Ollama Embeddings**](available_embedding_models/text_embedding_functions/ollama_embedding.md "ollama") | πŸ” Generate embeddings via the **Ollama** python library. Ollama supports embedding models, making it possible to build RAG apps. | [Ollama Icon](available_embedding_models/text_embedding_functions/ollama_embedding.md)| +| [**OpenAI Embeddings**](available_embedding_models/text_embedding_functions/openai_embedding.md "openai")| πŸ”‘ **OpenAI’s** text embeddings measure the relatedness of text strings. **LanceDB** supports state-of-the-art embeddings from OpenAI. | [OpenAI Icon](available_embedding_models/text_embedding_functions/openai_embedding.md)| +| [**Instructor Embeddings**](available_embedding_models/text_embedding_functions/instructor_embedding.md "instructor") | πŸ“š **Instructor**: An instruction-finetuned text embedding model that can generate text embeddings tailored to any task and domains by simply providing the task instruction, without any finetuning. | [Instructor Embedding Icon](available_embedding_models/text_embedding_functions/instructor_embedding.md) | +| [**Gemini Embeddings**](available_embedding_models/text_embedding_functions/gemini_embedding.md "gemini-text") | 🌌 Google’s Gemini API generates state-of-the-art embeddings for words, phrases, and sentences. | [Gemini Icon](available_embedding_models/text_embedding_functions/gemini_embedding.md) | +| [**Cohere Embeddings**](available_embedding_models/text_embedding_functions/cohere_embedding.md "cohere") | πŸ’¬ This will help you get started with **Cohere** embedding models using LanceDB. Using cohere API requires cohere package. Install it via `pip`. | [Cohere Icon](available_embedding_models/text_embedding_functions/cohere_embedding.md) | +| [**Jina Embeddings**](available_embedding_models/text_embedding_functions/jina_embedding.md "jina") | πŸ”— World-class embedding models to improve your search and RAG systems. You will need **jina api key**. | [Jina Icon](available_embedding_models/text_embedding_functions/jina_embedding.md) | +| [ **AWS Bedrock Functions**](available_embedding_models/text_embedding_functions/aws_bedrock_embedding.md "bedrock-text") | ☁️ AWS Bedrock supports multiple base models for generating text embeddings. You need to setup the AWS credentials to use this embedding function. | [AWS Bedrock Icon](available_embedding_models/text_embedding_functions/aws_bedrock_embedding.md) | +| [**IBM Watsonx.ai**](available_embedding_models/text_embedding_functions/ibm_watsonx_ai_embedding.md "watsonx") | πŸ’‘ Generate text embeddings using IBM's watsonx.ai platform. **Note**: watsonx.ai library is an optional dependency. | [Watsonx Icon](available_embedding_models/text_embedding_functions/ibm_watsonx_ai_embedding.md) | -## Multi-modal embedding functions -Multi-modal embedding functions allow you to query your table using both images and text. -**Available Multi-modal Embeddings** : +[st-key]: "sentence-transformers" +[hf-key]: "huggingface" +[ollama-key]: "ollama" +[openai-key]: "openai" +[instructor-key]: "instructor" +[gemini-key]: "gemini-text" +[cohere-key]: "cohere" +[jina-key]: "jina" +[aws-key]: "bedrock-text" +[watsonx-key]: "watsonx" -- [OpenClip Embeddings](available_embedding_models/multimodal_embedding_functions/openclip_embedding.md) -- [Imagebind Embeddings](available_embedding_models/multimodal_embedding_functions/imagebind_embedding.md) -- [Jina Embeddings](available_embedding_models/multimodal_embedding_functions/jina_multimodal_embedding.md) \ No newline at end of file + +## Multi-modal Embedding FunctionsπŸ–ΌοΈ + +Multi-modal embedding functions allow you to query your table using both images and text. πŸ’¬πŸ–ΌοΈ + +🌐 **Available Multi-modal Embeddings** + +| Embedding :material-information-outline:{ title="Hover over the name to find out the model_id" } | Description | Documentation | +|-----------|-------------|---------------| +| [**OpenClip Embeddings**](available_embedding_models/multimodal_embedding_functions/openclip_embedding.md "open-clip") | 🎨 We support CLIP model embeddings using the open source alternative, **open-clip** which supports various customizations. | [openclip Icon](available_embedding_models/multimodal_embedding_functions/openclip_embedding.md) | +| [**Imagebind Embeddings**](available_embedding_models/multimodal_embedding_functions/imagebind_embedding.md "imageind") | 🌌 We have support for **imagebind model embeddings**. You can download our version of the packaged model via - `pip install imagebind-packaged==0.1.2`. | [imagebind Icon](available_embedding_models/multimodal_embedding_functions/imagebind_embedding.md)| +| [**Jina Multi-modal Embeddings**](available_embedding_models/multimodal_embedding_functions/jina_multimodal_embedding.md "jina") | πŸ”— **Jina embeddings** can also be used to embed both **text** and **image** data, only some of the models support image data and you can check the detailed documentation. πŸ‘‰ | [jina Icon](available_embedding_models/multimodal_embedding_functions/jina_multimodal_embedding.md) | + +!!! note + If you'd like to request support for additional **embedding functions**, please feel free to open an issue on our LanceDB [GitHub issue page](https://github.com/lancedb/lancedb/issues). \ No newline at end of file