lancedb/docs/src/notebooks/tables_guide.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "d24eb4c6-e246-44ca-ba7c-6eae7923bd4c",
   "metadata": {},
   "source": [
    "## LanceDB Tables\n",
    "A Table is a collection of Records in a LanceDB Database.\n",
    "\n",
    "![illustration](../assets/ecosystem-illustration.png)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "c1b4e34b-a49c-471d-a343-a5940bb5138a",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install lancedb -qq"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "4e5a8d07-d9a1-48c1-913a-8e0629289579",
   "metadata": {},
   "outputs": [],
   "source": [
    "import lancedb\n",
    "db = lancedb.connect(\"./.lancedb\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "66fb93d5-3551-406b-99b2-488442d61d06",
   "metadata": {},
   "source": [
    "LanceDB allows ingesting data from various sources - `dict`, `list[dict]`, `pd.DataFrame`, `pa.Table` or a `Iterator[pa.RecordBatch]`. Let's take a look at some of the these.\n",
    "\n",
    " ### From list of tuples or dictionaries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "5df12f66-8d99-43ad-8d0b-22189ec0a6b9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pyarrow.Table\n",
       "vector: fixed_size_list<item: float>[2]\n",
       "  child 0, item: float\n",
       "lat: double\n",
       "long: double\n",
       "----\n",
       "vector: [[[1.1,1.2],[0.2,1.8]]]\n",
       "lat: [[45.5,40.1]]\n",
       "long: [[-122.7,-74.1]]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import lancedb\n",
    "\n",
    "db = lancedb.connect(\"./.lancedb\")\n",
    "\n",
    "data = [{\"vector\": [1.1, 1.2], \"lat\": 45.5, \"long\": -122.7},\n",
    "        {\"vector\": [0.2, 1.8], \"lat\": 40.1, \"long\": -74.1}]\n",
    "\n",
    "db.create_table(\"my_table\", data)\n",
    "\n",
    "db[\"my_table\"].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "10ce802f-1a10-49ee-8ee3-a9bfb302d86c",
   "metadata": {},
   "source": [
    "## From pandas DataFrame\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "f4d87ae9-0ccb-48eb-b31d-bb8f2370e47e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pyarrow.Table\n",
       "vector: fixed_size_list<item: float>[2]\n",
       "  child 0, item: float\n",
       "lat: double\n",
       "long: double\n",
       "----\n",
       "vector: [[[1.1,1.2],[0.2,1.8]]]\n",
       "lat: [[45.5,40.1]]\n",
       "long: [[-122.7,-74.1]]"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "data = pd.DataFrame(\n",
    "    {\n",
    "        \"vector\": [[1.1, 1.2, 1.3, 1.4], [0.2, 1.8, 0.4, 3.6]],\n",
    "        \"lat\": [45.5, 40.1],\n",
    "        \"long\": [-122.7, -74.1],\n",
    "    }\n",
    ")\n",
    "db.create_table(\"my_table_pandas\", data)\n",
    "db[\"my_table_pandas\"].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4be81469-5b57-4f78-9c72-3938c0378d9d",
   "metadata": {},
   "source": [
    "Data is converted to Arrow before being written to disk. For maximum control over how data is saved, either provide the PyArrow schema to convert to or else provide a PyArrow Table directly.\n",
    "  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "25f34bcf-fca0-4431-8601-eac95d1bd347",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[2024-01-31T18:59:33Z WARN  lance::dataset] No existing dataset at /Users/qian/Work/LanceDB/lancedb/docs/src/notebooks/.lancedb/table3.lance, it will be created\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "vector: fixed_size_list<item: float>[2]\n",
       "  child 0, item: float\n",
       "lat: float\n",
       "long: float"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pyarrow as pa\n",
    "\n",
    "custom_schema = pa.schema([\n",
    "pa.field(\"vector\", pa.list_(pa.float32(), 4)),\n",
    "pa.field(\"lat\", pa.float32()),\n",
    "pa.field(\"long\", pa.float32())\n",
    "])\n",
    "\n",
    "table = db.create_table(\"table3\", data, schema=custom_schema, mode=\"overwrite\")\n",
    "table.schema"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4df51925-7ca2-4005-9c72-38b3d26240c6",
   "metadata": {},
   "source": [
    "### From an Arrow Table\n",
    "\n",
    "You can also create LanceDB tables directly from pyarrow tables. LanceDB supports float16 type."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "90a880f6-be43-4c9d-ba65-0b05197c0f6f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "vector: fixed_size_list<item: halffloat>[16]\n",
       "  child 0, item: halffloat\n",
       "text: string"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "dim = 16\n",
    "total = 2\n",
    "schema = pa.schema(\n",
    "    [\n",
    "        pa.field(\"vector\", pa.list_(pa.float16(), dim)),\n",
    "        pa.field(\"text\", pa.string())\n",
    "    ]\n",
    ")\n",
    "data = pa.Table.from_arrays(\n",
    "    [\n",
    "        pa.array([np.random.randn(dim).astype(np.float16) for _ in range(total)],\n",
    "                pa.list_(pa.float16(), dim)),\n",
    "        pa.array([\"foo\", \"bar\"])\n",
    "    ],\n",
    "    [\"vector\", \"text\"],\n",
    ")\n",
    "\n",
    "tbl = db.create_table(\"f16_tbl\", data, schema=schema)\n",
    "tbl.schema"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0f36c51c-d902-449d-8292-700e53990c32",
   "metadata": {},
   "source": [
    "### From Pydantic Models\n",
    "\n",
    "LanceDB supports to create Apache Arrow Schema from a Pydantic BaseModel."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "d81121d7-e4b7-447c-a48c-974b6ebb464a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "movie_id: int64 not null\n",
       "vector: fixed_size_list<item: float>[128] not null\n",
       "  child 0, item: float\n",
       "genres: string not null\n",
       "title: string not null\n",
       "imdb_id: int64 not null"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from lancedb.pydantic import Vector, LanceModel\n",
    "\n",
    "class Content(LanceModel):\n",
    "    movie_id: int\n",
    "    vector: Vector(128)\n",
    "    genres: str\n",
    "    title: str\n",
    "    imdb_id: int\n",
    "        \n",
    "    @property\n",
    "    def imdb_url(self) -> str:\n",
    "        return f\"https://www.imdb.com/title/tt{self.imdb_id}\"\n",
    "\n",
    "import pyarrow as pa\n",
    "db = lancedb.connect(\"~/.lancedb\")\n",
    "table_name = \"movielens_small\"\n",
    "table = db.create_table(table_name, schema=Content)\n",
    "table.schema"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "860e1f77-e860-46a9-98b7-b2979092ccd6",
   "metadata": {},
   "source": [
    "### Using Iterators / Writing Large Datasets\n",
    "\n",
    "It is recommended to use itertators to add large datasets in batches when creating your table in one go. This does not create multiple versions of your dataset unlike manually adding batches using `table.add()`\n",
    "\n",
    "LanceDB additionally supports pyarrow's `RecordBatch` Iterators or other generators producing supported data types.\n",
    "\n",
    "## Here's an example using using `RecordBatch` iterator for creating tables."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "bc247142-4e3c-41a2-b94c-8e00d2c2a508",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "LanceTable(table4)"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pyarrow as pa\n",
    "\n",
    "def make_batches():\n",
    "    for i in range(5):\n",
    "        yield pa.RecordBatch.from_arrays(\n",
    "            [\n",
    "                pa.array([[3.1, 4.1], [5.9, 26.5]],\n",
    "                        pa.list_(pa.float32(), 2)),\n",
    "                pa.array([\"foo\", \"bar\"]),\n",
    "                pa.array([10.0, 20.0]),\n",
    "            ],\n",
    "            [\"vector\", \"item\", \"price\"],\n",
    "        )\n",
    "\n",
    "schema = pa.schema([\n",
    "    pa.field(\"vector\", pa.list_(pa.float32(), 2)),\n",
    "    pa.field(\"item\", pa.utf8()),\n",
    "    pa.field(\"price\", pa.float32()),\n",
    "])\n",
    "\n",
    "db.create_table(\"table4\", make_batches(), schema=schema)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "94f7dd2b-bae4-4bdf-8534-201437c31027",
   "metadata": {},
   "source": [
    "### Using pandas `DataFrame` Iterator and Pydantic Schema\n",
    "\n",
    "You can set the schema via pyarrow schema object or using Pydantic object"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "25ad3523-e0c9-4c28-b3df-38189c4e0e5f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "vector: fixed_size_list<item: float>[2] not null\n",
       "  child 0, item: float\n",
       "item: string not null\n",
       "price: double not null"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pyarrow as pa\n",
    "import pandas as pd\n",
    "\n",
    "class PydanticSchema(LanceModel):\n",
    "    vector: Vector(2)\n",
    "    item: str\n",
    "    price: float\n",
    "\n",
    "def make_batches():\n",
    "    for i in range(5):\n",
    "        yield pd.DataFrame(\n",
    "                {\n",
    "                    \"vector\": [[3.1, 4.1], [1, 1]],\n",
    "                    \"item\": [\"foo\", \"bar\"],\n",
    "                    \"price\": [10.0, 20.0],\n",
    "                })\n",
    "\n",
    "tbl = db.create_table(\"table5\", make_batches(), schema=PydanticSchema)\n",
    "tbl.schema"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4aa955e9-fcd0-4c99-b644-f218f3bb3f1a",
   "metadata": {},
   "source": [
    "## Creating Empty Table\n",
    "\n",
    "You can create an empty table by just passing the schema and later add to it using `table.add()`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "2814173a-eacc-4dd8-a64d-6312b44582cc",
   "metadata": {},
   "outputs": [],
   "source": [
    "import lancedb\n",
    "from lancedb.pydantic import LanceModel, Vector\n",
    "\n",
    "class Model(LanceModel):\n",
    "      vector: Vector(2)\n",
    "\n",
    "tbl = db.create_table(\"table6\", schema=Model.to_arrow_schema())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1d1b0f5c-a1d9-459f-8614-8376b6f577e1",
   "metadata": {},
   "source": [
    "## Open Existing Tables\n",
    "\n",
    "If you forget the name of your table, you can always get a listing of all table names:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "df9e13c0-41f6-437f-9dfa-2fd71d3d9c45",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['table6', 'table4', 'table5', 'movielens_small']"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "db.table_names()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "9343f5ad-6024-42ee-ac2f-6c1471df8679",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>vector</th>\n",
       "      <th>item</th>\n",
       "      <th>price</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>[3.1, 4.1]</td>\n",
       "      <td>foo</td>\n",
       "      <td>10.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>[5.9, 26.5]</td>\n",
       "      <td>bar</td>\n",
       "      <td>20.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>[3.1, 4.1]</td>\n",
       "      <td>foo</td>\n",
       "      <td>10.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>[5.9, 26.5]</td>\n",
       "      <td>bar</td>\n",
       "      <td>20.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>[3.1, 4.1]</td>\n",
       "      <td>foo</td>\n",
       "      <td>10.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>[5.9, 26.5]</td>\n",
       "      <td>bar</td>\n",
       "      <td>20.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>[3.1, 4.1]</td>\n",
       "      <td>foo</td>\n",
       "      <td>10.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>[5.9, 26.5]</td>\n",
       "      <td>bar</td>\n",
       "      <td>20.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>[3.1, 4.1]</td>\n",
       "      <td>foo</td>\n",
       "      <td>10.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>[5.9, 26.5]</td>\n",
       "      <td>bar</td>\n",
       "      <td>20.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        vector item  price\n",
       "0   [3.1, 4.1]  foo   10.0\n",
       "1  [5.9, 26.5]  bar   20.0\n",
       "2   [3.1, 4.1]  foo   10.0\n",
       "3  [5.9, 26.5]  bar   20.0\n",
       "4   [3.1, 4.1]  foo   10.0\n",
       "5  [5.9, 26.5]  bar   20.0\n",
       "6   [3.1, 4.1]  foo   10.0\n",
       "7  [5.9, 26.5]  bar   20.0\n",
       "8   [3.1, 4.1]  foo   10.0\n",
       "9  [5.9, 26.5]  bar   20.0"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tbl = db.open_table(\"table4\")\n",
    "tbl.to_pandas()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5019246f-12e3-4f78-88a8-9f4939802c76",
   "metadata": {},
   "source": [
    "## Adding to table\n",
    "After a table has been created, you can always add more data to it using\n",
    "\n",
    "You can add any of the valid data structures accepted by LanceDB table, i.e, `dict`, `list[dict]`, `pd.DataFrame`, or a `Iterator[pa.RecordBatch]`. Here are some examples."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "8a56250f-73a1-4c26-a6ad-5c7a0ce3a9ab",
   "metadata": {},
   "outputs": [],
   "source": [
    "data = [\n",
    "        {\"vector\": [1.3, 1.4], \"item\": \"fizz\", \"price\": 100.0},\n",
    "        {\"vector\": [9.5, 56.2], \"item\": \"buzz\", \"price\": 200.0}\n",
    "]\n",
    "tbl.add(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9985f6ee-67e1-45a9-b233-94e3d121ecbf",
   "metadata": {},
   "source": [
    "You can also add a large dataset batch in one go using Iterator of supported data types\n",
    "\n",
    "### Adding via Iterator\n",
    "\n",
    "here, we'll use pandas DataFrame Iterator"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "030c7057-b98e-4e2f-be14-b8c1f927f83c",
   "metadata": {},
   "outputs": [],
   "source": [
    "def make_batches():\n",
    "    for i in range(5):\n",
    "        yield [\n",
    "                  {\"vector\": [3.1, 4.1], \"item\": \"foo\", \"price\": 10.0},\n",
    "                  {\"vector\": [1, 1], \"item\": \"bar\", \"price\": 20.0},\n",
    "              ]\n",
    "tbl.add(make_batches())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b8316d5d-0a23-4675-b0ee-178711db873a",
   "metadata": {},
   "source": [
    "## Deleting from a Table\n",
    "\n",
    "Use the `delete()` method on tables to delete rows from a table. To choose which rows to delete, provide a filter that matches on the metadata columns. This can delete any number of rows that match the filter, like:\n",
    "\n",
    "\n",
    "```python\n",
    "tbl.delete('item = \"fizz\"')\n",
    "```\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "e7a17de2-08d2-41b7-bd05-f63d1045ab1f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "22\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "12"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print(len(tbl))\n",
    "      \n",
    "tbl.delete(\"price = 20.0\")\n",
    "      \n",
    "len(tbl)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "74ac180b-5432-4c14-b1a8-22c35ac83af8",
   "metadata": {},
   "source": [
    "### Delete from a list of values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "fe3310bd-08f4-4a22-a63b-b3127d22f9f7",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "         vector  item  price\n",
      "0    [3.1, 4.1]   foo   10.0\n",
      "1    [3.1, 4.1]   foo   10.0\n",
      "2    [3.1, 4.1]   foo   10.0\n",
      "3    [3.1, 4.1]   foo   10.0\n",
      "4    [3.1, 4.1]   foo   10.0\n",
      "5    [1.3, 1.4]  fizz  100.0\n",
      "6   [9.5, 56.2]  buzz  200.0\n",
      "7    [3.1, 4.1]   foo   10.0\n",
      "8    [3.1, 4.1]   foo   10.0\n",
      "9    [3.1, 4.1]   foo   10.0\n",
      "10   [3.1, 4.1]   foo   10.0\n",
      "11   [3.1, 4.1]   foo   10.0\n"
     ]
    },
    {
     "ename": "OSError",
     "evalue": "LanceError(IO): Error during planning: column foo does not exist, /Users/runner/work/lance/lance/rust/lance-core/src/error.rs:212:23",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mOSError\u001b[0m                                   Traceback (most recent call last)",
      "Cell \u001b[0;32mIn[17], line 4\u001b[0m\n\u001b[1;32m      2\u001b[0m to_remove \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m, \u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;241m.\u001b[39mjoin(\u001b[38;5;28mstr\u001b[39m(v) \u001b[38;5;28;01mfor\u001b[39;00m v \u001b[38;5;129;01min\u001b[39;00m to_remove)\n\u001b[1;32m      3\u001b[0m \u001b[38;5;28mprint\u001b[39m(tbl\u001b[38;5;241m.\u001b[39mto_pandas())\n\u001b[0;32m----> 4\u001b[0m \u001b[43mtbl\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mdelete\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43mf\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mitem IN (\u001b[39;49m\u001b[38;5;132;43;01m{\u001b[39;49;00m\u001b[43mto_remove\u001b[49m\u001b[38;5;132;43;01m}\u001b[39;49;00m\u001b[38;5;124;43m)\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\n",
      "File \u001b[0;32m~/Work/LanceDB/lancedb/docs/doc-venv/lib/python3.11/site-packages/lancedb/table.py:872\u001b[0m, in \u001b[0;36mLanceTable.delete\u001b[0;34m(self, where)\u001b[0m\n\u001b[1;32m    871\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mdelete\u001b[39m(\u001b[38;5;28mself\u001b[39m, where: \u001b[38;5;28mstr\u001b[39m):\n\u001b[0;32m--> 872\u001b[0m     \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_dataset\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mdelete\u001b[49m\u001b[43m(\u001b[49m\u001b[43mwhere\u001b[49m\u001b[43m)\u001b[49m\n",
      "File \u001b[0;32m~/Work/LanceDB/lancedb/docs/doc-venv/lib/python3.11/site-packages/lance/dataset.py:596\u001b[0m, in \u001b[0;36mLanceDataset.delete\u001b[0;34m(self, predicate)\u001b[0m\n\u001b[1;32m    594\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(predicate, pa\u001b[38;5;241m.\u001b[39mcompute\u001b[38;5;241m.\u001b[39mExpression):\n\u001b[1;32m    595\u001b[0m     predicate \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mstr\u001b[39m(predicate)\n\u001b[0;32m--> 596\u001b[0m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_ds\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mdelete\u001b[49m\u001b[43m(\u001b[49m\u001b[43mpredicate\u001b[49m\u001b[43m)\u001b[49m\n",
      "\u001b[0;31mOSError\u001b[0m: LanceError(IO): Error during planning: column foo does not exist, /Users/runner/work/lance/lance/rust/lance-core/src/error.rs:212:23"
     ]
    }
   ],
   "source": [
    "to_remove = [\"foo\", \"buzz\"]\n",
    "to_remove = \", \".join(str(v) for v in to_remove)\n",
    "print(tbl.to_pandas())\n",
    "tbl.delete(f\"item IN ({to_remove})\")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "87d5bc21-847f-4c81-b56e-f6dbe5d05aac",
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.DataFrame(\n",
    "                    {\n",
    "                        \"vector\": [[3.1, 4.1], [1, 1]],\n",
    "                        \"item\": [\"foo\", \"bar\"],\n",
    "                        \"price\": [10.0, 20.0],\n",
    "                    })\n",
    "\n",
    "tbl = db.create_table(\"table7\", data=df, mode=\"overwrite\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9cba4519-eb3a-4941-ab7e-873d762e750f",
   "metadata": {},
   "outputs": [],
   "source": [
    "to_remove = [10.0, 20.0]\n",
    "to_remove = \", \".join(str(v) for v in to_remove)\n",
    "\n",
    "tbl.delete(f\"price IN ({to_remove})\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5bdc9801-d5ed-4871-92d0-88b27108e788",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>vector</th>\n",
       "      <th>item</th>\n",
       "      <th>price</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Empty DataFrame\n",
       "Columns: [vector, item, price]\n",
       "Index: []"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tbl.to_pandas()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "752d33d4-ce1c-48e5-90d2-c85f0982182d",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}