{ "cells": [ { "cell_type": "markdown", "id": "88c1af18", "metadata": {}, "source": [ "# Example - MultiModal CLIP Embeddings" ] }, { "cell_type": "markdown", "id": "c6b5d346-2c2a-4341-a132-00e53543f8d1", "metadata": { "id": "c6b5d346-2c2a-4341-a132-00e53543f8d1" }, "source": [ "# The Disappearing Embedding Function\n", "\n", "Previously, to use vector databases, you had to do the embedding process yourself and interact with the system using vectors directly.\n", "With this new release of LanceDB, we make it much more convenient so you don't need to worry about that at all.\n", "\n", "1. We present you with sentence-transformer, openai, and openclip embedding functions that can be saved directly as table metadata\n", "2. You no longer have to generate the vectors directly either during query time or ingestion time\n", "3. The embedding function interface is extensible so you can create your own\n", "4. The function is persisted as table metadata so you can use it across sessions" ] }, { "cell_type": "code", "execution_count": 1, "id": "4c25eb9d-9e05-4133-927e-747516cb9310", "metadata": { "id": "4c25eb9d-9e05-4133-927e-747516cb9310" }, "outputs": [], "source": [ "import lancedb" ] }, { "cell_type": "markdown", "id": "db4bd459-9bab-4803-bbe8-20201b445245", "metadata": { "id": "db4bd459-9bab-4803-bbe8-20201b445245" }, "source": [ "## Multi-modal search made easy\n", "\n", "In this example we'll go over multi-modal image search using:\n", "- Oxford Pet dataset\n", "- OpenClip model\n", "- LanceDB" ] }, { "cell_type": "markdown", "id": "4ddd2873-0aa7-4869-bb20-21c85477ba29", "metadata": { "id": "4ddd2873-0aa7-4869-bb20-21c85477ba29" }, "source": [ "### Data" ] }, { "cell_type": "markdown", "id": "b36f56d3-0794-4018-a397-6a8f3e1b0050", "metadata": { "id": "b36f56d3-0794-4018-a397-6a8f3e1b0050" }, "source": [ "First, download the dataset from https://www.robots.ox.ac.uk/~vgg/data/pets/\n", "Specifically, download the [images.tar.gz](https://thor.robots.ox.ac.uk/~vgg/data/pets/images.tar.gz)\n", "\n", "This notebook assumes you've downloaded it into your ~/Downloads directory.\n", "When you extract the tarball, it will create an `images` directory." ] }, { "cell_type": "markdown", "id": "c5dae94b-ad78-41d4-aa45-06cb96a0fff1", "metadata": { "id": "c5dae94b-ad78-41d4-aa45-06cb96a0fff1" }, "source": [ "### Define embedding function\n", "\n", "We'll use the OpenClipEmbeddingFunction here for multi-modal image search." ] }, { "cell_type": "code", "execution_count": 7, "id": "d4bcd5f5-29a2-4b81-9262-852ef456db9f", "metadata": { "id": "d4bcd5f5-29a2-4b81-9262-852ef456db9f" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/saksham/Documents/lancedb/env/lib/python3.8/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", " from .autonotebook import tqdm as notebook_tqdm\n", "Downloading (…)ip_pytorch_model.bin: 100%|██████████| 605M/605M [00:41<00:00, 14.6MB/s] \n" ] } ], "source": [ "from lancedb.embeddings import EmbeddingFunctionRegistry\n", "\n", "registry = EmbeddingFunctionRegistry.get_instance()\n", "clip = registry.get(\"open-clip\").create()" ] }, { "cell_type": "code", "execution_count": 6, "id": "de72bf3c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Collecting open_clip_torch\n", " Downloading open_clip_torch-2.20.0-py3-none-any.whl (1.5 MB)\n", "\u001b[K |████████████████████████████████| 1.5 MB 771 kB/s eta 0:00:01\n", "\u001b[?25hRequirement already satisfied: regex in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from open_clip_torch) (2023.10.3)\n", "Requirement already satisfied: tqdm in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from open_clip_torch) (4.66.1)\n", "Collecting torchvision\n", " Downloading torchvision-0.16.0-cp38-cp38-manylinux1_x86_64.whl (6.9 MB)\n", "\u001b[K |████████████████████████████████| 6.9 MB 21.0 MB/s eta 0:00:01\n", "\u001b[?25hCollecting huggingface-hub\n", " Downloading huggingface_hub-0.17.3-py3-none-any.whl (295 kB)\n", "\u001b[K |████████████████████████████████| 295 kB 43.1 MB/s eta 0:00:01\n", "\u001b[?25hCollecting protobuf<4\n", " Using cached protobuf-3.20.3-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.0 MB)\n", "Collecting timm\n", " Downloading timm-0.9.7-py3-none-any.whl (2.2 MB)\n", "\u001b[K |████████████████████████████████| 2.2 MB 28.3 MB/s eta 0:00:01\n", "\u001b[?25hCollecting sentencepiece\n", " Downloading sentencepiece-0.1.99-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)\n", "\u001b[K |████████████████████████████████| 1.3 MB 39.9 MB/s eta 0:00:01\n", "\u001b[?25hCollecting torch>=1.9.0\n", " Downloading torch-2.1.0-cp38-cp38-manylinux1_x86_64.whl (670.2 MB)\n", "\u001b[K |████████████████████████████████| 670.2 MB 47 kB/s s eta 0:00:01\n", "\u001b[?25hCollecting ftfy\n", " Downloading ftfy-6.1.1-py3-none-any.whl (53 kB)\n", "\u001b[K |████████████████████████████████| 53 kB 2.3 MB/s eta 0:00:01\n", "\u001b[?25hCollecting pillow!=8.3.*,>=5.3.0\n", " Using cached Pillow-10.0.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.5 MB)\n", "Requirement already satisfied: requests in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from torchvision->open_clip_torch) (2.31.0)\n", "Requirement already satisfied: numpy in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from torchvision->open_clip_torch) (1.24.4)\n", "Requirement already satisfied: packaging>=20.9 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from huggingface-hub->open_clip_torch) (23.2)\n", "Collecting fsspec\n", " Downloading fsspec-2023.9.2-py3-none-any.whl (173 kB)\n", "\u001b[K |████████████████████████████████| 173 kB 22.0 MB/s eta 0:00:01\n", "\u001b[?25hCollecting filelock\n", " Using cached filelock-3.12.4-py3-none-any.whl (11 kB)\n", "Requirement already satisfied: pyyaml>=5.1 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from huggingface-hub->open_clip_torch) (6.0.1)\n", "Requirement already satisfied: typing-extensions>=3.7.4.3 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from huggingface-hub->open_clip_torch) (4.8.0)\n", "Collecting safetensors\n", " Downloading safetensors-0.3.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)\n", "\u001b[K |████████████████████████████████| 1.3 MB 22.8 MB/s eta 0:00:01\n", "\u001b[?25hCollecting networkx\n", " Downloading networkx-3.1-py3-none-any.whl (2.1 MB)\n", "\u001b[K |████████████████████████████████| 2.1 MB 16.6 MB/s eta 0:00:01\n", "\u001b[?25hCollecting triton==2.1.0; platform_system == \"Linux\" and platform_machine == \"x86_64\"\n", " Downloading triton-2.1.0-0-cp38-cp38-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (89.2 MB)\n", "\u001b[K |████████████████████████████████| 89.2 MB 31.6 MB/s eta 0:00:01\n", "\u001b[?25hCollecting nvidia-curand-cu12==10.3.2.106; platform_system == \"Linux\" and platform_machine == \"x86_64\"\n", " Downloading nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)\n", "\u001b[K |████████████████████████████████| 56.5 MB 15.9 MB/s eta 0:00:01\n", "\u001b[?25hCollecting nvidia-nvtx-cu12==12.1.105; platform_system == \"Linux\" and platform_machine == \"x86_64\"\n", " Downloading nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB)\n", "\u001b[K |████████████████████████████████| 99 kB 9.4 MB/s eta 0:00:01\n", "\u001b[?25hCollecting sympy\n", " Downloading sympy-1.12-py3-none-any.whl (5.7 MB)\n", "\u001b[K |████████████████████████████████| 5.7 MB 16.4 MB/s eta 0:00:01\n", "\u001b[?25hCollecting nvidia-cusparse-cu12==12.1.0.106; platform_system == \"Linux\" and platform_machine == \"x86_64\"\n", " Downloading nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB)\n", "\u001b[K |████████████████████████████████| 196.0 MB 78 kB/s eta 0:00:011\n", "\u001b[?25hCollecting nvidia-cuda-nvrtc-cu12==12.1.105; platform_system == \"Linux\" and platform_machine == \"x86_64\"\n", " Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)\n", "\u001b[K |████████████████████████████████| 23.7 MB 619 kB/s eta 0:00:011\n", "\u001b[?25hCollecting nvidia-cufft-cu12==11.0.2.54; platform_system == \"Linux\" and platform_machine == \"x86_64\"\n", " Downloading nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)\n", "\u001b[K |████████████████████████████████| 121.6 MB 93 kB/s s eta 0:00:01\n", "\u001b[?25hCollecting nvidia-cuda-cupti-cu12==12.1.105; platform_system == \"Linux\" and platform_machine == \"x86_64\"\n", " Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)\n", "\u001b[K |████████████████████████████████| 14.1 MB 19.5 MB/s eta 0:00:01\n", "\u001b[?25hRequirement already satisfied: jinja2 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from torch>=1.9.0->open_clip_torch) (3.1.2)\n", "Collecting nvidia-nccl-cu12==2.18.1; platform_system == \"Linux\" and platform_machine == \"x86_64\"\n", " Downloading nvidia_nccl_cu12-2.18.1-py3-none-manylinux1_x86_64.whl (209.8 MB)\n", "\u001b[K |████████████████████████████████| 209.8 MB 5.2 kB/s eta 0:00:01 |███████████████████████████████▊| 208.2 MB 17.0 MB/s eta 0:00:01\n", "\u001b[?25hCollecting nvidia-cudnn-cu12==8.9.2.26; platform_system == \"Linux\" and platform_machine == \"x86_64\"\n", " Downloading nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB)\n", "\u001b[K |████████████████████████████████| 731.7 MB 22 kB/s eta 0:00:011\n", "\u001b[?25hCollecting nvidia-cublas-cu12==12.1.3.1; platform_system == \"Linux\" and platform_machine == \"x86_64\"\n", " Downloading nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)\n", "\u001b[K |████████████████████████████████| 410.6 MB 9.2 kB/s eta 0:00:012\n", "\u001b[?25hCollecting nvidia-cuda-runtime-cu12==12.1.105; platform_system == \"Linux\" and platform_machine == \"x86_64\"\n", " Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)\n", "\u001b[K |████████████████████████████████| 823 kB 18.5 MB/s eta 0:00:01\n", "\u001b[?25hCollecting nvidia-cusolver-cu12==11.4.5.107; platform_system == \"Linux\" and platform_machine == \"x86_64\"\n", " Downloading nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB)\n", "\u001b[K |████████████████████████████████| 124.2 MB 43 kB/s s eta 0:00:01ta 0:00:02\n", "\u001b[?25hRequirement already satisfied: wcwidth>=0.2.5 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from ftfy->open_clip_torch) (0.2.8)\n", "Requirement already satisfied: certifi>=2017.4.17 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from requests->torchvision->open_clip_torch) (2023.7.22)\n", "Requirement already satisfied: urllib3<3,>=1.21.1 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from requests->torchvision->open_clip_torch) (2.0.6)\n", "Requirement already satisfied: idna<4,>=2.5 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from requests->torchvision->open_clip_torch) (3.4)\n", "Requirement already satisfied: charset-normalizer<4,>=2 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from requests->torchvision->open_clip_torch) (3.3.0)\n", "Collecting mpmath>=0.19\n", " Downloading mpmath-1.3.0-py3-none-any.whl (536 kB)\n", "\u001b[K |████████████████████████████████| 536 kB 14.2 MB/s eta 0:00:01\n", "\u001b[?25hCollecting nvidia-nvjitlink-cu12\n", " Downloading nvidia_nvjitlink_cu12-12.2.140-py3-none-manylinux1_x86_64.whl (20.2 MB)\n", "\u001b[K |████████████████████████████████| 20.2 MB 14.3 MB/s eta 0:00:01\n", "\u001b[?25hRequirement already satisfied: MarkupSafe>=2.0 in /home/saksham/Documents/lancedb/env/lib/python3.8/site-packages (from jinja2->torch>=1.9.0->open_clip_torch) (2.1.3)\n", "Installing collected packages: pillow, networkx, filelock, triton, nvidia-curand-cu12, nvidia-nvtx-cu12, mpmath, sympy, nvidia-nvjitlink-cu12, nvidia-cusparse-cu12, fsspec, nvidia-cuda-nvrtc-cu12, nvidia-cufft-cu12, nvidia-cuda-cupti-cu12, nvidia-nccl-cu12, nvidia-cublas-cu12, nvidia-cudnn-cu12, nvidia-cuda-runtime-cu12, nvidia-cusolver-cu12, torch, torchvision, huggingface-hub, protobuf, safetensors, timm, sentencepiece, ftfy, open-clip-torch\n", "Successfully installed filelock-3.12.4 fsspec-2023.9.2 ftfy-6.1.1 huggingface-hub-0.17.3 mpmath-1.3.0 networkx-3.1 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-8.9.2.26 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.18.1 nvidia-nvjitlink-cu12-12.2.140 nvidia-nvtx-cu12-12.1.105 open-clip-torch-2.20.0 pillow-10.0.1 protobuf-3.20.3 safetensors-0.3.3 sentencepiece-0.1.99 sympy-1.12 timm-0.9.7 torch-2.1.0 torchvision-0.16.0 triton-2.1.0\n" ] } ], "source": [ "!pip install open_clip_torch" ] }, { "cell_type": "code", "execution_count": 8, "id": "be5844c4-a7a7-49cb-b2bd-0253de814161", "metadata": { "id": "be5844c4-a7a7-49cb-b2bd-0253de814161", "outputId": "7ea0aefa-74d4-447b-f14c-b8c6e389068c" }, "outputs": [ { "data": { "text/plain": [ "OpenClipEmbeddings(name='ViT-B-32', pretrained='laion2b_s34b_b79k', device='cpu', batch_size=64, normalize=True)" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "clip" ] }, { "cell_type": "markdown", "id": "ab96cbca-3dbf-4934-81b7-f8836e18509f", "metadata": { "id": "ab96cbca-3dbf-4934-81b7-f8836e18509f" }, "source": [ "### The data model\n", "\n", "We'll declare a new model that subclasses LanceModel (special pydantic model) to represent the table.\n", "This table has two columns, one for the image_uri and one for the vector generated from those images.\n", "The embedding function defines the number of dimensions in its vectors so you don't need to\n", "look it up.\n", "\n", "We use the `VectorField` method from the embedding function to annotate the model\n", "so that LanceDB knows to use the open-clip embedding function to generate query embeddings that\n", "correspond to the `vector` column.\n", "\n", "We also use the `SourceField` so that when adding data, LanceDB knows to automatically use\n", "open-clip to encode the input images.\n", "\n", "Finally, because we're working with images, we add a convenience property `image` to open the image and\n", "return a PIL Image so it can be visualized in Jupyter Notebook" ] }, { "cell_type": "code", "execution_count": null, "id": "4e2e7f4b-788a-4396-9c79-7e5ede47e6a1", "metadata": { "id": "4e2e7f4b-788a-4396-9c79-7e5ede47e6a1" }, "outputs": [], "source": [ "from PIL import Image\n", "from lancedb.pydantic import LanceModel, Vector\n", "\n", "class Pets(LanceModel):\n", " vector: Vector(clip.ndims()) = clip.VectorField()\n", " image_uri: str = clip.SourceField()\n", "\n", " @property\n", " def image(self):\n", " return Image.open(self.image_uri)" ] }, { "cell_type": "markdown", "id": "74b54c67-0ee0-47e7-8b72-97fec7ce4140", "metadata": { "id": "74b54c67-0ee0-47e7-8b72-97fec7ce4140" }, "source": [ "### Create the table" ] }, { "cell_type": "markdown", "id": "89f4b38a-4636-4ad7-b3ca-0aa46aba6afc", "metadata": { "id": "89f4b38a-4636-4ad7-b3ca-0aa46aba6afc" }, "source": [ "First we connect to a local lancedb directory" ] }, { "cell_type": "code", "execution_count": null, "id": "9f68cff2-0fdb-4748-ba4d-5e65e5a2b4f4", "metadata": { "id": "9f68cff2-0fdb-4748-ba4d-5e65e5a2b4f4" }, "outputs": [], "source": [ "db = lancedb.connect(\"~/.lancedb\")" ] }, { "cell_type": "markdown", "id": "cbdf5a63-8217-4110-8aa9-42946c1a0026", "metadata": { "id": "cbdf5a63-8217-4110-8aa9-42946c1a0026" }, "source": [ "Next we get all of the paths for the images we downloaded and create a table.\n", "Notice that we didn't have to worry about generating the image embeddings ourselves." ] }, { "cell_type": "code", "execution_count": null, "id": "196dfc99-aa6e-48a2-a7ca-c1e2db1ac674", "metadata": { "id": "196dfc99-aa6e-48a2-a7ca-c1e2db1ac674" }, "outputs": [], "source": [ "import pandas as pd\n", "from pathlib import Path\n", "from random import sample\n", "\n", "if \"pets\" in db:\n", " table = db[\"pets\"]\n", "else:\n", " table = db.create_table(\"pets\", schema=Pets)\n", " # use a sampling of 1000 images\n", " p = Path(\"~/Downloads/images\").expanduser()\n", " uris = [str(f) for f in p.glob(\"*.jpg\")]\n", " uris = sample(uris, 1000)\n", " table.add(pd.DataFrame({\"image_uri\": uris}))" ] }, { "cell_type": "code", "execution_count": null, "id": "0ac08735-602f-4dfc-be47-486265851ad1", "metadata": { "id": "0ac08735-602f-4dfc-be47-486265851ad1", "outputId": "8c027412-1a1a-449c-c641-6b2db4c2cb92" }, "outputs": [ { "data": { "text/html": [ "
| \n", " | vector | \n", "image_uri | \n", "
|---|---|---|
| 0 | \n", "[0.018789755, 0.11621179, -0.09760579, -0.0268... | \n", "/Users/changshe/Downloads/images/leonberger_14... | \n", "
| 1 | \n", "[0.021960497, 0.06073219, -0.1625527, 0.021481... | \n", "/Users/changshe/Downloads/images/havanese_63.jpg | \n", "
| 2 | \n", "[0.0074375155, 0.084355146, -0.027461205, -0.0... | \n", "/Users/changshe/Downloads/images/english_cocke... | \n", "
| 3 | \n", "[-0.01220356, 0.020815236, -0.08587208, -0.027... | \n", "/Users/changshe/Downloads/images/shiba_inu_143... | \n", "
| 4 | \n", "[-0.010112503, 0.14021927, -0.14588796, -0.046... | \n", "/Users/changshe/Downloads/images/saint_bernard... | \n", "