{ "cells": [ { "cell_type": "markdown", "id": "c0de1e6a-61f7-4f99-a2fd-1461902ab36a", "metadata": {}, "source": [ "# Async API\n", "\n", "We demonstrate the following functionalities suppored by LanceDB using our asynchonous APIs:\n", "- Automatic versioning\n", "- Instant rollback\n", "- Appends, updates, deletions\n", "- Schema evolution" ] }, { "cell_type": "markdown", "id": "6d810f29", "metadata": {}, "source": [ "Let's first prepare the data. We will be using a CSV file with a bunch of quotes from Rick and Morty" ] }, { "cell_type": "code", "execution_count": 50, "id": "d00ed8e6", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--2024-12-17 15:58:31-- http://vectordb-recipes.s3.us-west-2.amazonaws.com/rick_and_morty_quotes.csv\n", "Resolving vectordb-recipes.s3.us-west-2.amazonaws.com (vectordb-recipes.s3.us-west-2.amazonaws.com)... 3.5.84.162, 3.5.76.76, 52.92.228.138, ...\n", "Connecting to vectordb-recipes.s3.us-west-2.amazonaws.com (vectordb-recipes.s3.us-west-2.amazonaws.com)|3.5.84.162|:80... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 8236 (8.0K) [text/csv]\n", "Saving to: ‘rick_and_morty_quotes.csv.3’\n", "\n", "rick_and_morty_quot 100%[===================>] 8.04K --.-KB/s in 0s \n", "\n", "2024-12-17 15:58:31 (160 MB/s) - ‘rick_and_morty_quotes.csv.3’ saved [8236/8236]\n", "\n", "id,author,quote\n", "1,Rick,\" Morty, you got to come on. You got to come with me.\"\n", "2,Morty,\" Rick, what’s going on?\"\n", "3,Rick,\" I got a surprise for you, Morty.\"\n", "4,Morty,\" It’s the middle of the night. What are you talking about?\"\n", "5,Rick,\" I got a surprise for you.\"\n", "6,Morty,\" Ow! Ow! You’re tugging me too hard.\"\n", "7,Rick,\" I got a surprise for you, Morty.\"\n", "8,Rick,\" What do you think of this flying vehicle, Morty? I built it out of stuff I found in the garage.\"\n", "9,Morty,\" Yeah, Rick, it’s great. Is this the surprise?\"\n" ] } ], "source": [ "!wget http://vectordb-recipes.s3.us-west-2.amazonaws.com/rick_and_morty_quotes.csv\n", "!head rick_and_morty_quotes.csv" ] }, { "cell_type": "markdown", "id": "a5fcdcda-b0fe-4ac4-90b4-6b42cf2ef34d", "metadata": {}, "source": [ "Let's load this into a pandas dataframe.\n", "\n", "It's got 3 columns, a quote id, the quote string, and the first name of the author of the quote:" ] }, { "cell_type": "code", "execution_count": 51, "id": "def3ae59-77d9-43f0-ba6d-415a1503856b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idauthorquote
01RickMorty, you got to come on. You got to come wi...
12MortyRick, what’s going on?
23RickI got a surprise for you, Morty.
34MortyIt’s the middle of the night. What are you ta...
45RickI got a surprise for you.
\n", "
" ], "text/plain": [ " id author quote\n", "0 1 Rick Morty, you got to come on. You got to come wi...\n", "1 2 Morty Rick, what’s going on?\n", "2 3 Rick I got a surprise for you, Morty.\n", "3 4 Morty It’s the middle of the night. What are you ta...\n", "4 5 Rick I got a surprise for you." ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "df = pd.read_csv(\"rick_and_morty_quotes.csv\")\n", "df.head()" ] }, { "cell_type": "markdown", "id": "4ba9ffac-c779-49e3-91a7-f1c00f3fda41", "metadata": {}, "source": [ "Creating a LanceDB table from a pandas dataframe is straightforward using `create_table`" ] }, { "cell_type": "markdown", "id": "392cf0ee", "metadata": {}, "source": [ "We'll start with a local LanceDB connection" ] }, { "cell_type": "code", "execution_count": 35, "id": "91a322dd", "metadata": {}, "outputs": [], "source": [ "!pip install lancedb -q" ] }, { "cell_type": "code", "execution_count": 52, "id": "10715e72", "metadata": {}, "outputs": [], "source": [ "import lancedb\n", "async_db = await lancedb.connect_async(\"~/.lancedb\")" ] }, { "cell_type": "code", "execution_count": 53, "id": "bd981f6d-b921-4b1d-b63a-6c1d59f3a51d", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[2024-12-17T23:58:46Z WARN lance::dataset::write::insert] No existing dataset at ~/.lancedb/rick_and_morty.lance, it will be created\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idauthorquote
01RickMorty, you got to come on. You got to come wi...
12MortyRick, what’s going on?
23RickI got a surprise for you, Morty.
34MortyIt’s the middle of the night. What are you ta...
45RickI got a surprise for you.
56MortyOw! Ow! You’re tugging me too hard.
67RickI got a surprise for you, Morty.
78RickWhat do you think of this flying vehicle, Mor...
89MortyYeah, Rick, it’s great. Is this the surprise?
910RickMorty, I had to I had to I had to I had to ma...
\n", "
" ], "text/plain": [ " id author quote\n", "0 1 Rick Morty, you got to come on. You got to come wi...\n", "1 2 Morty Rick, what’s going on?\n", "2 3 Rick I got a surprise for you, Morty.\n", "3 4 Morty It’s the middle of the night. What are you ta...\n", "4 5 Rick I got a surprise for you.\n", "5 6 Morty Ow! Ow! You’re tugging me too hard.\n", "6 7 Rick I got a surprise for you, Morty.\n", "7 8 Rick What do you think of this flying vehicle, Mor...\n", "8 9 Morty Yeah, Rick, it’s great. Is this the surprise?\n", "9 10 Rick Morty, I had to I had to I had to I had to ma..." ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "await async_db.drop_table(\"rick_and_morty\")\n", "async_table = await async_db.create_table(\"rick_and_morty\", df, mode=\"overwrite\")\n", "await async_table.to_pandas()" ] }, { "cell_type": "markdown", "id": "38d055be-ae3e-4190-b1cf-abf14cdf8975", "metadata": {}, "source": [ "## Updates" ] }, { "cell_type": "markdown", "id": "842550fb-da81-44ea-9e98-d5dbaa6916c7", "metadata": {}, "source": [ "Now, since Rick is the smartest man in the multiverse, he deserves to have his quotes attributed to his full name: Richard Daniel Sanchez.\n", "\n", "This can be done via `LanceTable.update`. It needs two arguments:\n", "\n", "1. A `where` string filter (sql syntax) to determine the rows to update\n", "2. A dict of `updates` where the keys are the column names to update and the values are the new values" ] }, { "cell_type": "code", "execution_count": 54, "id": "9eac4708-a8c4-49aa-bc13-8e60c5bf34a0", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idauthorquote
01RickMorty, you got to come on. You got to come wi...
13RickI got a surprise for you, Morty.
25RickI got a surprise for you.
37RickI got a surprise for you, Morty.
48RickWhat do you think of this flying vehicle, Mor...
510RickMorty, I had to I had to I had to I had to ma...
612RickWe’re gonna drop it down there just get a who...
714RickCome on, Morty. Just take it easy, Morty. It’...
816RickWhen I drop the bomb you know, I want you to ...
918RickAnd Jessica’s gonna be Eve,…
\n", "
" ], "text/plain": [ " id author quote\n", "0 1 Rick Morty, you got to come on. You got to come wi...\n", "1 3 Rick I got a surprise for you, Morty.\n", "2 5 Rick I got a surprise for you.\n", "3 7 Rick I got a surprise for you, Morty.\n", "4 8 Rick What do you think of this flying vehicle, Mor...\n", "5 10 Rick Morty, I had to I had to I had to I had to ma...\n", "6 12 Rick We’re gonna drop it down there just get a who...\n", "7 14 Rick Come on, Morty. Just take it easy, Morty. It’...\n", "8 16 Rick When I drop the bomb you know, I want you to ...\n", "9 18 Rick And Jessica’s gonna be Eve,…" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "await async_table.update(where=\"author='Morty'\", updates={\"author\": \"Richard Daniel Sanchez\"})\n", "await async_table.to_pandas()" ] }, { "cell_type": "markdown", "id": "ac6499ce-af6d-4934-9051-be5f159ce623", "metadata": {}, "source": [ "## Schema evolution" ] }, { "cell_type": "markdown", "id": "022f1334", "metadata": {}, "source": [ "Let's add a `new_id` column to the table, where each value is the original `id` plus 1." ] }, { "cell_type": "code", "execution_count": 55, "id": "a4326a70-9863-47e8-8f3f-565e35d558cf", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idauthorquotenew_id
01RickMorty, you got to come on. You got to come wi...2
13RickI got a surprise for you, Morty.4
25RickI got a surprise for you.6
37RickI got a surprise for you, Morty.8
48RickWhat do you think of this flying vehicle, Mor...9
510RickMorty, I had to I had to I had to I had to ma...11
612RickWe’re gonna drop it down there just get a who...13
714RickCome on, Morty. Just take it easy, Morty. It’...15
816RickWhen I drop the bomb you know, I want you to ...17
918RickAnd Jessica’s gonna be Eve,…19
\n", "
" ], "text/plain": [ " id author quote new_id\n", "0 1 Rick Morty, you got to come on. You got to come wi... 2\n", "1 3 Rick I got a surprise for you, Morty. 4\n", "2 5 Rick I got a surprise for you. 6\n", "3 7 Rick I got a surprise for you, Morty. 8\n", "4 8 Rick What do you think of this flying vehicle, Mor... 9\n", "5 10 Rick Morty, I had to I had to I had to I had to ma... 11\n", "6 12 Rick We’re gonna drop it down there just get a who... 13\n", "7 14 Rick Come on, Morty. Just take it easy, Morty. It’... 15\n", "8 16 Rick When I drop the bomb you know, I want you to ... 17\n", "9 18 Rick And Jessica’s gonna be Eve,… 19" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "await async_table.add_columns({\"new_id\": \"id + 1\"})\n", "await async_table.to_pandas()" ] }, { "cell_type": "markdown", "id": "f590fec8-0ed0-4148-b940-c81abe7b421c", "metadata": {}, "source": [ "If we look at the schema, we see that a new int64 column was added" ] }, { "cell_type": "code", "execution_count": 56, "id": "ca9596a0-b4a0-4a5e-8d9e-967cd13b1eae", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "id: int64\n", "author: string\n", "quote: string\n", "new_id: int64" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "await async_table.schema()" ] }, { "cell_type": "markdown", "id": "f046002c-872c-4c39-ab85-e03c3b45b477", "metadata": {}, "source": [ "## Rollback\n", "\n" ] }, { "cell_type": "markdown", "id": "dbfc298c-ada2-411b-925f-e53dc9d35f3c", "metadata": {}, "source": [ "Suppose we used the table and found that the new column should be a different value. How do we use another new column without losing the change history?" ] }, { "cell_type": "markdown", "id": "dfb116e4-b3b2-4b7e-bbf8-d3e63ca2aa14", "metadata": {}, "source": [ "First, major operations are automatically versioned in LanceDB.\n", "Version 1 is the table creation, with the initial insertion of data.\n", "Versions 2 and 3 represents the update (deletion + append)\n", "Version 4 is adding the new column." ] }, { "cell_type": "code", "execution_count": 57, "id": "a411902b-43d0-4889-8e34-bc5f3c409726", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'version': 1,\n", " 'timestamp': datetime.datetime(2024, 12, 17, 15, 58, 46, 983259),\n", " 'metadata': {}},\n", " {'version': 2,\n", " 'timestamp': datetime.datetime(2024, 12, 17, 15, 59, 0, 291948),\n", " 'metadata': {}},\n", " {'version': 3,\n", " 'timestamp': datetime.datetime(2024, 12, 17, 15, 59, 8, 381165),\n", " 'metadata': {}}]" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "await async_table.checkout_latest()\n", "await async_table.list_versions()" ] }, { "cell_type": "markdown", "id": "7bd5e954-ac0f-4973-81c6-ad6120412d40", "metadata": {}, "source": [ "We can restore version 3, before we added the `new_id` vector column" ] }, { "cell_type": "code", "execution_count": 58, "id": "ad0682cc-7599-459c-bbd8-1cd1f296c845", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idauthorquote
01RickMorty, you got to come on. You got to come wi...
13RickI got a surprise for you, Morty.
25RickI got a surprise for you.
37RickI got a surprise for you, Morty.
48RickWhat do you think of this flying vehicle, Mor...
510RickMorty, I had to I had to I had to I had to ma...
612RickWe’re gonna drop it down there just get a who...
714RickCome on, Morty. Just take it easy, Morty. It’...
816RickWhen I drop the bomb you know, I want you to ...
918RickAnd Jessica’s gonna be Eve,…
\n", "
" ], "text/plain": [ " id author quote\n", "0 1 Rick Morty, you got to come on. You got to come wi...\n", "1 3 Rick I got a surprise for you, Morty.\n", "2 5 Rick I got a surprise for you.\n", "3 7 Rick I got a surprise for you, Morty.\n", "4 8 Rick What do you think of this flying vehicle, Mor...\n", "5 10 Rick Morty, I had to I had to I had to I had to ma...\n", "6 12 Rick We’re gonna drop it down there just get a who...\n", "7 14 Rick Come on, Morty. Just take it easy, Morty. It’...\n", "8 16 Rick When I drop the bomb you know, I want you to ...\n", "9 18 Rick And Jessica’s gonna be Eve,…" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "await async_table.checkout(2)\n", "await async_table.restore()\n", "await async_table.to_pandas()" ] }, { "cell_type": "markdown", "id": "b0a51146-40d0-4f16-9555-5ce68c2c9eee", "metadata": {}, "source": [ "Notice that we now have one more, not less versions. When we restore an old version, we're not deleting the version history, we're just creating a new version where the schema and data is equivalent to the restored old version. In this way, we can keep track of all of the changes and always rollback to a previous state." ] }, { "cell_type": "code", "execution_count": 59, "id": "d5bfb448-20b9-45e9-90ba-8a73abb86668", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'version': 1,\n", " 'timestamp': datetime.datetime(2024, 12, 17, 15, 58, 46, 983259),\n", " 'metadata': {}},\n", " {'version': 2,\n", " 'timestamp': datetime.datetime(2024, 12, 17, 15, 59, 0, 291948),\n", " 'metadata': {}},\n", " {'version': 3,\n", " 'timestamp': datetime.datetime(2024, 12, 17, 15, 59, 8, 381165),\n", " 'metadata': {}},\n", " {'version': 4,\n", " 'timestamp': datetime.datetime(2024, 12, 17, 15, 59, 22, 800694),\n", " 'metadata': {}}]" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "await async_table.list_versions()" ] }, { "cell_type": "markdown", "id": "6713cb53-8cb9-4235-9c55-337c311f0af6", "metadata": {}, "source": [ "### Add another new column\n", "\n", "Now we'll change the value of the `new_id` column and add it to the restored dataset again" ] }, { "cell_type": "code", "execution_count": 60, "id": "cdabeb56", "metadata": {}, "outputs": [], "source": [ "await async_table.add_columns({\"new_id\": \"id + 10\"})" ] }, { "cell_type": "code", "execution_count": 61, "id": "694c46e0-a1c3-4869-a1eb-562f14606ad4", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "id: int64\n", "author: string\n", "quote: string\n", "new_id: int64" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "await async_table.schema()" ] }, { "cell_type": "markdown", "id": "5e4085a5-a2e7-4520-acfc-eabaae2caa7d", "metadata": {}, "source": [ "## Deletion\n", "\n", "What if the whole show was just Rick-isms? \n", "Let's delete any quote not said by Rick" ] }, { "cell_type": "code", "execution_count": 62, "id": "9d11ddf1-b352-496c-91d7-99c70cbf304b", "metadata": {}, "outputs": [], "source": [ "await async_table.delete(\"author != 'Richard Daniel Sanchez'\")" ] }, { "cell_type": "markdown", "id": "77d2f591-e492-423e-b995-2a18ae8cb831", "metadata": {}, "source": [ "We can see that the number of rows has been reduced to 30" ] }, { "cell_type": "code", "execution_count": 63, "id": "20bcce48-a5df-43c7-9ab9-7d59a83055e9", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "34" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "await async_table.count_rows()" ] }, { "cell_type": "markdown", "id": "ef8457b2-1228-4a25-824e-477a07681b48", "metadata": {}, "source": [ "Ok we had our fun, let's get back to the full quote set" ] }, { "cell_type": "code", "execution_count": 67, "id": "6e279635-75b0-400c-8b43-4aa069282ccd", "metadata": {}, "outputs": [], "source": [ "await async_table.checkout(5)\n", "await async_table.restore()" ] }, { "cell_type": "code", "execution_count": 68, "id": "6a65b627-57a2-43b2-8acc-3805591845ad", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "99" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "await async_table.count_rows()" ] }, { "cell_type": "markdown", "id": "ae1a6ee8-8868-49de-82ab-17a0f61f3a47", "metadata": {}, "source": [ "## History\n", "\n", "We now have 9 versions in the data. We can review the operations that corresponds to each version below:" ] }, { "cell_type": "code", "execution_count": 32, "id": "f595c9b8-91ec-48c1-9790-c40e1bd24b60", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "6" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "await async_table.version()" ] }, { "cell_type": "markdown", "id": "774f4eb0-03d4-4fda-a825-6217bf096619", "metadata": {}, "source": [ "\n", "Versions:\n", "- 1 - Create\n", "- 2 - Update\n", "- 3 - Add a new column\n", "- 4 - Restore (2)\n", "- 5 - Add a new column\n", "- 6 - Delete\n", "- 7 - Restore" ] }, { "cell_type": "markdown", "id": "fb0131e6-2b73-442a-b4c6-6976a9cf4c7e", "metadata": {}, "source": [ "## Summary" ] }, { "cell_type": "markdown", "id": "97a1cf79-b46b-40cd-ada0-54edef358627", "metadata": {}, "source": [ "We never had to explicitly manage the versioning. And we never had to create expensive and slow snapshots. LanceDB automatically tracks the full history of operations I created and supports fast rollbacks. In production this is critical for debugging issues and minimizing downtime by rolling back to a previously successful state in seconds." ] } ], "metadata": { "kernelspec": { "display_name": "doc-venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.10" } }, "nbformat": 4, "nbformat_minor": 5 }