Files
lancedb/docs/src/notebooks/hybrid_search.ipynb
Prashanth Rao b014c24e66 [docs]: Fix typos and clarity in hybrid search docs (#966)
- Fixed typos and added some clarity to the hybrid search docs
- Changed "Airbnb" case to be as per the [official company
name](https://en.wikipedia.org/wiki/Airbnb) (the "bnb" shouldn't be
capitalized", and the text in the document aligns with this
- Fixed headers in nav bar
2024-04-05 16:30:30 -07:00

1123 lines
58 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"id": "0daef1cd-9130-46b8-8eb8-1b721860e239",
"metadata": {},
"source": [
"# Example - Airbnb financial data search\n",
"\n",
"<a href=\"https://colab.research.google.com/github/lancedb/lancedb/blob/main/docs/src/notebooks/hybrid_search.ipynb\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"></a> \n",
"\n",
"The code below is an example of hybrid search, a search algorithm that combines FTS and vector search in LanceDB.\n",
"\n",
"Let's get stared with an example. In this notebook we'll use Airbnb financial data documents to search for \"the specific reasons for higher operating costs\" in a particular year."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "819fa612",
"metadata": {},
"outputs": [],
"source": [
"# Setup\n",
"!pip install lancedb pandas langchain langchain-community pypdf openai cohere tiktoken sentence_transformers tantivy==0.20.1"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "b6864d97-7f85-4d9c-bf05-e9cf9db29e81",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" ········\n"
]
}
],
"source": [
"import os\n",
"import getpass\n",
"\n",
"# Set your OpenAI API key\n",
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass()\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "cfce9804-cd1c-48c3-acd2-e74eb4e290c7",
"metadata": {},
"outputs": [],
"source": [
"def pretty_print(docs):\n",
" for doc in docs:\n",
" print(doc + \"\\n\\n\") "
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "efb22cec-5a06-46ac-91c3-53f9b9090109",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import PyPDFLoader\n",
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
"\n",
"# Load $ABNB's financial report. This may take 1-2 minutes since the PDF is large\n",
"sec_filing_pdf = \"https://d18rn0p25nwr6d.cloudfront.net/CIK-0001559720/8a9ebed0-815a-469a-87eb-1767d21d8cec.pdf\"\n",
"\n",
"# Create your PDF loader\n",
"loader = PyPDFLoader(sec_filing_pdf)\n",
"\n",
"# Load the PDF document\n",
"documents = loader.load()\n",
"\n",
"# Chunk the financial report\n",
"text_splitter = RecursiveCharacterTextSplitter(chunk_size=1024, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "d3c5ce69-0f75-44cb-9e49-9be665fc156e",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"[2024-02-12T20:00:04Z WARN lance::dataset] No existing dataset at /Users/ayushchaurasia/langchain/airbnb.lance, it will be created\n"
]
}
],
"source": [
"from langchain_community.vectorstores import LanceDB\n",
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"import lancedb\n",
"from lancedb.embeddings import get_registry\n",
"from lancedb.pydantic import Vector, LanceModel\n",
"\n",
"openai = get_registry().get(\"openai\").create()\n",
"\n",
"class Schema(LanceModel):\n",
" text: str = openai.SourceField()\n",
" vector: Vector(openai.ndims()) = openai.VectorField()\n",
"\n",
"embedding_function = OpenAIEmbeddings()\n",
"\n",
"db = lancedb.connect(\"~/langchain\")\n",
"table = db.create_table(\n",
" \"airbnb\",\n",
" schema=Schema,\n",
" mode=\"overwrite\",\n",
")\n",
"\n",
"# Load the document into LanceDB\n",
"db = LanceDB.from_documents(docs, embedding_function, connection=table)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "4284e67e-3a39-4486-a060-11a18f7c0e1f",
"metadata": {},
"outputs": [],
"source": [
"table.create_fts_index(\"text\")"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "d959a80f-d568-48f4-9d14-7367bcc1ce8d",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>text</th>\n",
" <th>vector</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Table of Contents\\nUNITED STATES\\nSECURITIES A...</td>\n",
" <td>[-0.003405824, -0.03212391, 0.012812538, -0.02...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Class A common stock, par value $0.0001 per sh...</td>\n",
" <td>[-0.019193485, -0.02273649, 0.009623382, -0.02...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>this chapter) during the preceding 12 months (...</td>\n",
" <td>[-0.020692078, -0.016187502, -0.008877442, -0....</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Indicate by check mark whether the registrant ...</td>\n",
" <td>[-0.019304628, -0.0034501317, -0.011525051, -0...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>As of June 30, 2022, the aggregate market valu...</td>\n",
" <td>[-0.014594535, -0.011274607, -0.007967828, -0....</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" text \\\n",
"0 Table of Contents\\nUNITED STATES\\nSECURITIES A... \n",
"1 Class A common stock, par value $0.0001 per sh... \n",
"2 this chapter) during the preceding 12 months (... \n",
"3 Indicate by check mark whether the registrant ... \n",
"4 As of June 30, 2022, the aggregate market valu... \n",
"\n",
" vector \n",
"0 [-0.003405824, -0.03212391, 0.012812538, -0.02... \n",
"1 [-0.019193485, -0.02273649, 0.009623382, -0.02... \n",
"2 [-0.020692078, -0.016187502, -0.008877442, -0.... \n",
"3 [-0.019304628, -0.0034501317, -0.011525051, -0... \n",
"4 [-0.014594535, -0.011274607, -0.007967828, -0.... "
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"table.to_pandas().head()"
]
},
{
"cell_type": "markdown",
"id": "667f4e4a-6ff1-4f1c-ad57-4a2a8b036670",
"metadata": {},
"source": [
"## Vector Search\n",
"\n",
"avg latency - `3.48 ms ± 71.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)`"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "8a5ab2de-6d75-4785-b838-ed6a825dfa6e",
"metadata": {},
"outputs": [],
"source": [
"query = \"What are the specific factors contributing to Airbnb's increased operational expenses in the last fiscal year?\"\n",
"docs = table.search(query).limit(5).to_pandas()[\"text\"].to_list()"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "5423d333-0f6d-4951-ab3f-6941ad30ba8a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"In addition, the number of listings on Airbnb may decline as a result of a number of other factors affecting Hosts, including: the COVID-19 pandemic; enforcement or threatened\n",
"enforcement of laws and regulations, including short-term occupancy and tax laws; private groups, such as homeowners, landlords, and condominium and neighborhood\n",
"associations, adopting and enforcing contracts that prohibit or restrict home sharing; leases, mortgages, and other agreements, or regulations that purport to ban or otherwise restrict\n",
"home sharing; Hosts opting for long-term rentals on other third-party platforms as an alternative to listing on our platform; economic, social, and political factors; perceptions of trust\n",
"and safety on and off our platform; negative experiences with guests, including guests who damage Host property, throw unauthorized parties, or engage in violent and unlawful\n",
"\n",
"\n",
"Made Possible by Hosts, Strangers, AirCover, Categories, and OMG marketing campaigns and launches, a $67.9 million increase in our search engine marketing and advertising\n",
"spend, a $25.1 million increase in payroll-related expenses due to growth in headcount and increase in compensation costs, a $22.0 million increase in third-party service provider\n",
"expenses, and a $11.1 million increase in coupon expense in line with increase in revenue and launch of AirCover for guests, partially offset by a decrease of $22.9 million related to\n",
"the changes in the fair value of contingent consideration related to a 2019 acquisition.\n",
"General and Administrative\n",
"2021 2022 % Change\n",
"(in millions, except percentages)\n",
"General and administrative $ 836 $ 950 14 %\n",
"Percentage of revenue 14 % 11 %\n",
"General and administrative expense increased $114.0 million, or 14%, in 2022 compared to 2021, primarily due to an increase in other business and operational taxes of $41.3\n",
"\n",
"\n",
"Our success depends significantly on existing guests continuing to book and attracting new guests to book on our platform. Our ability to attract and retain guests could be materially\n",
"adversely affected by a number of factors discussed elsewhere in these “Risk Factors,” including:\n",
"•events beyond our control such as the ongoing COVID-19 pandemic, other pandemics and health concerns, restrictions on travel, immigration, trade disputes, economic\n",
"downturns, and the impact of climate change on travel including the availability of preferred destinations and the increase in the frequency and severity of weather-related\n",
"events, including fires, floods, droughts, extreme temperatures and ambient temperature increases, severe weather and other natural disasters, and the impact of other\n",
"climate change on seasonal destinations;\n",
"•political, social, or economic instability;\n",
"•Hosts failing to meet guests expectations, including increased expectations for cleanliness in light of the COVID-19 pandemic;\n",
"\n",
"\n",
"Table of Contents\n",
"Airbnb, Inc.\n",
"Consolidated Statements of Operations\n",
"(in millions, except per share amounts)\n",
"Year Ended December 31,\n",
"2020 2021 2022\n",
"Revenue $ 3,378 $ 5,992 $ 8,399 \n",
"Costs and expenses:\n",
"Cost of revenue 876 1,156 1,499 \n",
"Operations and support 878 847 1,041 \n",
"Product development 2,753 1,425 1,502 \n",
"Sales and marketing 1,175 1,186 1,516 \n",
"General and administrative 1,135 836 950 \n",
"Restructuring charges 151 113 89 \n",
"Total costs and expenses 6,968 5,563 6,597 \n",
"Income (loss) from operations (3,590) 429 1,802 \n",
"Interest income 27 13 186 \n",
"Interest expense (172) (438) (24)\n",
"Other income (expense), net (947) (304) 25 \n",
"Income (loss) before income taxes (4,682) (300) 1,989 \n",
"Provision for (benefit from) income taxes (97) 52 96 \n",
"Net income (loss) $ (4,585)$ (352)$ 1,893 \n",
"Net income (loss) per share attributable to Class A and Class B common stockholders:\n",
"Basic $ (16.12)$ (0.57)$ 2.97 \n",
"Diluted $ (16.12)$ (0.57)$ 2.79\n",
"\n",
"\n",
"Our future revenue growth depends on the growth of supply and demand for listings on our platform, and our business is affected by general economic and business conditions\n",
"worldwide as well as trends in the global travel and hospitality industries and the short and long-term accommodation regulatory landscape. In addition, we believe that our revenue\n",
"growth depends upon a number of factors, including:\n",
"•global macroeconomic conditions, including inflation and rising interest rates and recessionary concerns;\n",
"•our ability to retain and grow the number of guests and Nights and Experiences Booked;\n",
"•our ability to retain and grow the number of Hosts and the number of available listings on our platform;\n",
"•events beyond our control such as pandemics and other health concerns, restrictions on travel and immigration, political, social or economic instability, including international\n",
"\n",
"\n"
]
}
],
"source": [
"pretty_print(docs)"
]
},
{
"cell_type": "markdown",
"id": "8b0150fe-00dc-4aa0-9c8f-33cbf2ed5ac6",
"metadata": {},
"source": [
"## Hybrid Search\n",
"LanceDB support hybrid search with custom Rerankers. Here's the summary of latency numbers of some of the Reranking methods available\n",
"![1_yWDh0Klw8Upsw1V54kkkdQ](https://github.com/AyushExel/assets/assets/15766192/a515fbf7-0553-437e-899e-67691eae3fef)\n",
"\n",
"Let us now perform hybrid search by combining vector and FTS search results. First, we'll cover the default Reranker.\n",
"\n",
"### Linear Combination Reranker\n",
"`LinearCombinationReranker(weight=0.7)` is used as the default reranker for reranking the hybrid search results if the reranker isn't specified explicitly.\n",
"The `weight` param controls the weightage provided to vector search score. The weight of `1-weight` is applied to FTS scores when reranking.\n",
"\n",
"Latency - `71 ms ± 25.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)`"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "d2aa5893-30c4-4beb-9dae-a55665bd82c7",
"metadata": {},
"outputs": [],
"source": [
"docs = table.search(query, query_type=\"hybrid\").limit(5).to_pandas()[\"text\"].to_list()"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "8d6a99c3-92ef-4677-96bb-9b54a11a79fe",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"In addition, the number of listings on Airbnb may decline as a result of a number of other factors affecting Hosts, including: the COVID-19 pandemic; enforcement or threatened\n",
"enforcement of laws and regulations, including short-term occupancy and tax laws; private groups, such as homeowners, landlords, and condominium and neighborhood\n",
"associations, adopting and enforcing contracts that prohibit or restrict home sharing; leases, mortgages, and other agreements, or regulations that purport to ban or otherwise restrict\n",
"home sharing; Hosts opting for long-term rentals on other third-party platforms as an alternative to listing on our platform; economic, social, and political factors; perceptions of trust\n",
"and safety on and off our platform; negative experiences with guests, including guests who damage Host property, throw unauthorized parties, or engage in violent and unlawful\n",
"\n",
"\n",
"(a) The Borrower may, at its election, deliver a Pricing Certificate to the Administrative Agent in respect of the most recently\n",
"ended fiscal year, commencing with the fiscal year ended December 31, 2022, on any date prior to the date that is 270 days following the last\n",
"day of such fiscal year (the\n",
"-50-\n",
"\n",
"\n",
"“Initial Delivery Date”); provided that the Pricing Certificate for any fiscal year may be delivered on any date following the Initial Delivery\n",
"Date that is prior to the date that is 365 days following the last day of the preceding fiscal year, so long as such Pricing Certificate includes a\n",
"certification that delivery of such Pricing Certificate on or before the Initial Delivery Date was not possible because (i) the information\n",
"required to calculate the KPI Metrics for such preceding fiscal year was not available at such time or (ii) the report of the KPI Metrics Auditor,\n",
"if relevant, was not available at such time (the date of the Administrative Agents receipt thereof, each a “Pricing Certificate Date”). Upon\n",
"delivery of a Pricing Certificate in respect of a fiscal year, (i) the Applicable Rate for the Loans incurred by the Borrower shall be increased or\n",
"decreased (or neither increased nor decreased), as applicable, pursuant to the Sustainability Margin Adjustment as set forth in the KPI Metrics\n",
"\n",
"\n",
"including such Sustainability Pricing Adjustment Date and ending on the date immediately preceding the next Sustainability Pricing\n",
"Adjustment Date.\n",
"(b) For the avoidance of doubt, only one Pricing Certificate may be delivered in respect of any fiscal year. It is further understood\n",
"and agreed that the Applicable Rate for Loans incurred by the Borrower will never be reduced or increased by more than 0.050% and that the\n",
"Applicable Rate for the Revolving Commitment Fee will never be reduced or increased by more than 0.010%, pursuant to the Sustainability\n",
"Margin Adjustment and the Sustainability Fee Adjustment, respectively, on any Sustainability Pricing Adjustment Date. For the avoidance of\n",
"doubt, any adjustment to the Applicable Rate for such Loans or such Revolving Commitment Fee by reason of meeting one or both KPI\n",
"Metrics in any fiscal year shall not be cumulative year-over-year. The adjustments pursuant to this Section made on any Sustainability Pricing\n",
"\n",
"\n",
"Adjustment Date shall only apply for the period until the date immediately preceding the next Sustainability Pricing Adjustment Date.\n",
"(c) If, for any fiscal year, either (i) no Pricing Certificate shall have been delivered for such fiscal year or (ii) the Pricing\n",
"Certificate delivered for such fiscal year shall fail to include the Diverse Supplier Spend Percentage or GHG Emissions Intensity for such\n",
"fiscal year, then the Sustainability Margin Adjustment will be positive 0.050% and/or the Sustainability Fee Adjustment will be positive\n",
"0.010%, as applicable, in each case commencing on the last day such Pricing Certificate could have been delivered in accordance with the\n",
"terms of clause (a) above (it being understood that, in the case of the foregoing clause (ii), the Sustainability Margin Adjustment or the\n",
"Sustainability Fee Adjustment will be determined in accordance with such Pricing Certificate to the extent the (A) Sustainability Margin\n",
"\n",
"\n"
]
}
],
"source": [
"pretty_print(docs)"
]
},
{
"cell_type": "markdown",
"id": "c4d3e0f3-8d96-47f5-ad1d-514475f1ae55",
"metadata": {},
"source": [
"### Cohere Reranker\n",
"This uses Cohere's Reranking API to re-rank the results. It accepts the reranking model name as a parameter. By Default it uses the english-v3 model but you can easily switch to a multi-lingual model.\n",
"\n",
"latency - `605 ms ± 78.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)`"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "ce2c43c7-1a96-4856-ad9b-28385164f187",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" ········\n"
]
}
],
"source": [
"# Free API key\n",
"os.environ[\"COHERE_API_KEY\"] = getpass.getpass()"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "4adbb3f1-4d21-427b-9bf0-3d7bebf68cf6",
"metadata": {},
"outputs": [],
"source": [
"from lancedb.rerankers import CohereReranker\n",
"\n",
"reranker = CohereReranker()\n",
"docs = table.search(query, query_type=\"hybrid\").limit(5).rerank(reranker=reranker).to_pandas()[\"text\"].to_list()"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "a071b3e7-3b8b-42e4-a089-4d6c4094873f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Increased operating expenses, decreased revenue, negative publicity, negative reaction from our Hosts and guests and other stakeholders, or other adverse impacts from any of the\n",
"above factors or other risks related to our international operations could materially adversely affect our brand, reputation, business, results of operations, and financial condition.\n",
"In addition, we will continue to incur significant expenses to operate our outbound business in China, and we may never achieve profitability in that market. These factors, combined\n",
"with sentiment of the workforce in China, and Chinas policy towards foreign direct investment may particularly impact our operations in China. In addition, we need to ensure that\n",
"our business practices in China are compliant with local laws and regulations, which may be interpreted and enforced in ways that are different from our interpretation, and/or create\n",
"\n",
"\n",
"Made Possible by Hosts, Strangers, AirCover, Categories, and OMG marketing campaigns and launches, a $67.9 million increase in our search engine marketing and advertising\n",
"spend, a $25.1 million increase in payroll-related expenses due to growth in headcount and increase in compensation costs, a $22.0 million increase in third-party service provider\n",
"expenses, and a $11.1 million increase in coupon expense in line with increase in revenue and launch of AirCover for guests, partially offset by a decrease of $22.9 million related to\n",
"the changes in the fair value of contingent consideration related to a 2019 acquisition.\n",
"General and Administrative\n",
"2021 2022 % Change\n",
"(in millions, except percentages)\n",
"General and administrative $ 836 $ 950 14 %\n",
"Percentage of revenue 14 % 11 %\n",
"General and administrative expense increased $114.0 million, or 14%, in 2022 compared to 2021, primarily due to an increase in other business and operational taxes of $41.3\n",
"\n",
"\n",
"Table of Contents\n",
"Airbnb, Inc.\n",
"Consolidated Statements of Operations\n",
"(in millions, except per share amounts)\n",
"Year Ended December 31,\n",
"2020 2021 2022\n",
"Revenue $ 3,378 $ 5,992 $ 8,399 \n",
"Costs and expenses:\n",
"Cost of revenue 876 1,156 1,499 \n",
"Operations and support 878 847 1,041 \n",
"Product development 2,753 1,425 1,502 \n",
"Sales and marketing 1,175 1,186 1,516 \n",
"General and administrative 1,135 836 950 \n",
"Restructuring charges 151 113 89 \n",
"Total costs and expenses 6,968 5,563 6,597 \n",
"Income (loss) from operations (3,590) 429 1,802 \n",
"Interest income 27 13 186 \n",
"Interest expense (172) (438) (24)\n",
"Other income (expense), net (947) (304) 25 \n",
"Income (loss) before income taxes (4,682) (300) 1,989 \n",
"Provision for (benefit from) income taxes (97) 52 96 \n",
"Net income (loss) $ (4,585)$ (352)$ 1,893 \n",
"Net income (loss) per share attributable to Class A and Class B common stockholders:\n",
"Basic $ (16.12)$ (0.57)$ 2.97 \n",
"Diluted $ (16.12)$ (0.57)$ 2.79\n",
"\n",
"\n",
"Our success depends significantly on existing guests continuing to book and attracting new guests to book on our platform. Our ability to attract and retain guests could be materially\n",
"adversely affected by a number of factors discussed elsewhere in these “Risk Factors,” including:\n",
"•events beyond our control such as the ongoing COVID-19 pandemic, other pandemics and health concerns, restrictions on travel, immigration, trade disputes, economic\n",
"downturns, and the impact of climate change on travel including the availability of preferred destinations and the increase in the frequency and severity of weather-related\n",
"events, including fires, floods, droughts, extreme temperatures and ambient temperature increases, severe weather and other natural disasters, and the impact of other\n",
"climate change on seasonal destinations;\n",
"•political, social, or economic instability;\n",
"•Hosts failing to meet guests expectations, including increased expectations for cleanliness in light of the COVID-19 pandemic;\n",
"\n",
"\n",
"In addition, the number of listings on Airbnb may decline as a result of a number of other factors affecting Hosts, including: the COVID-19 pandemic; enforcement or threatened\n",
"enforcement of laws and regulations, including short-term occupancy and tax laws; private groups, such as homeowners, landlords, and condominium and neighborhood\n",
"associations, adopting and enforcing contracts that prohibit or restrict home sharing; leases, mortgages, and other agreements, or regulations that purport to ban or otherwise restrict\n",
"home sharing; Hosts opting for long-term rentals on other third-party platforms as an alternative to listing on our platform; economic, social, and political factors; perceptions of trust\n",
"and safety on and off our platform; negative experiences with guests, including guests who damage Host property, throw unauthorized parties, or engage in violent and unlawful\n",
"\n",
"\n"
]
}
],
"source": [
"pretty_print(docs)"
]
},
{
"cell_type": "markdown",
"id": "6630f0c0-6070-4ea7-a191-99092e69ca05",
"metadata": {},
"source": [
"Relevance score is returned by Cohere API and is independent of individual FTS and vector search scores."
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "80dc61bb-929c-4fbb-b2cb-20c5d31bc65c",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>text</th>\n",
" <th>vector</th>\n",
" <th>_relevance_score</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Increased operating expenses, decreased revenu...</td>\n",
" <td>[0.0034929817, -0.024774546, 0.012623285, -0.0...</td>\n",
" <td>0.985328</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Made Possible by Hosts, Strangers, AirCover, C...</td>\n",
" <td>[-0.0042489874, -0.005382498, 0.007190078, -0....</td>\n",
" <td>0.979036</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Table of Contents\\nAirbnb, Inc.\\nConsolidated ...</td>\n",
" <td>[-0.008569201, -0.019810658, 0.014144964, -0.0...</td>\n",
" <td>0.696578</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Our success depends significantly on existing ...</td>\n",
" <td>[0.0027109187, -0.028220002, 0.022864284, -0.0...</td>\n",
" <td>0.539923</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>In addition, the number of listings on Airbnb ...</td>\n",
" <td>[0.0068983347, -0.0147690065, 0.042441186, -0....</td>\n",
" <td>0.460713</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" text \\\n",
"0 Increased operating expenses, decreased revenu... \n",
"1 Made Possible by Hosts, Strangers, AirCover, C... \n",
"2 Table of Contents\\nAirbnb, Inc.\\nConsolidated ... \n",
"3 Our success depends significantly on existing ... \n",
"4 In addition, the number of listings on Airbnb ... \n",
"\n",
" vector _relevance_score \n",
"0 [0.0034929817, -0.024774546, 0.012623285, -0.0... 0.985328 \n",
"1 [-0.0042489874, -0.005382498, 0.007190078, -0.... 0.979036 \n",
"2 [-0.008569201, -0.019810658, 0.014144964, -0.0... 0.696578 \n",
"3 [0.0027109187, -0.028220002, 0.022864284, -0.0... 0.539923 \n",
"4 [0.0068983347, -0.0147690065, 0.042441186, -0.... 0.460713 "
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"table.search(query, query_type=\"hybrid\").limit(5).rerank(reranker=reranker).to_pandas()"
]
},
{
"cell_type": "markdown",
"id": "41147a46-7ef8-4266-9cec-08a992697de2",
"metadata": {},
"source": [
"### ColBERT Reranker\n",
"Colber Reranker is powered by ColBERT model. It runs locally using the huggingface implementation.\n",
"\n",
"Latency - `950 ms ± 5.78 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)`\n",
"\n",
"Note: First query might be slow. It is recommended to reuse the `Reranker` objects as the models are cached. Subsequent runs will be faster on reusing the same reranker object"
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "91b06b43-c971-4177-b62f-f941bbbc2ef4",
"metadata": {},
"outputs": [],
"source": [
"from lancedb.rerankers import ColbertReranker\n",
"\n",
"reranker = ColbertReranker()\n",
"docs = table.search(query, query_type=\"hybrid\").limit(5).rerank(reranker=reranker).to_pandas()[\"text\"].to_list()"
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "e42c46bd-7cdd-4d31-9dbb-ddd1bdf979fa",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Made Possible by Hosts, Strangers, AirCover, Categories, and OMG marketing campaigns and launches, a $67.9 million increase in our search engine marketing and advertising\n",
"spend, a $25.1 million increase in payroll-related expenses due to growth in headcount and increase in compensation costs, a $22.0 million increase in third-party service provider\n",
"expenses, and a $11.1 million increase in coupon expense in line with increase in revenue and launch of AirCover for guests, partially offset by a decrease of $22.9 million related to\n",
"the changes in the fair value of contingent consideration related to a 2019 acquisition.\n",
"General and Administrative\n",
"2021 2022 % Change\n",
"(in millions, except percentages)\n",
"General and administrative $ 836 $ 950 14 %\n",
"Percentage of revenue 14 % 11 %\n",
"General and administrative expense increased $114.0 million, or 14%, in 2022 compared to 2021, primarily due to an increase in other business and operational taxes of $41.3\n",
"\n",
"\n",
"Our future revenue growth depends on the growth of supply and demand for listings on our platform, and our business is affected by general economic and business conditions\n",
"worldwide as well as trends in the global travel and hospitality industries and the short and long-term accommodation regulatory landscape. In addition, we believe that our revenue\n",
"growth depends upon a number of factors, including:\n",
"•global macroeconomic conditions, including inflation and rising interest rates and recessionary concerns;\n",
"•our ability to retain and grow the number of guests and Nights and Experiences Booked;\n",
"•our ability to retain and grow the number of Hosts and the number of available listings on our platform;\n",
"•events beyond our control such as pandemics and other health concerns, restrictions on travel and immigration, political, social or economic instability, including international\n",
"\n",
"\n",
"Our success depends significantly on existing guests continuing to book and attracting new guests to book on our platform. Our ability to attract and retain guests could be materially\n",
"adversely affected by a number of factors discussed elsewhere in these “Risk Factors,” including:\n",
"•events beyond our control such as the ongoing COVID-19 pandemic, other pandemics and health concerns, restrictions on travel, immigration, trade disputes, economic\n",
"downturns, and the impact of climate change on travel including the availability of preferred destinations and the increase in the frequency and severity of weather-related\n",
"events, including fires, floods, droughts, extreme temperatures and ambient temperature increases, severe weather and other natural disasters, and the impact of other\n",
"climate change on seasonal destinations;\n",
"•political, social, or economic instability;\n",
"•Hosts failing to meet guests expectations, including increased expectations for cleanliness in light of the COVID-19 pandemic;\n",
"\n",
"\n",
"In addition, the number of listings on Airbnb may decline as a result of a number of other factors affecting Hosts, including: the COVID-19 pandemic; enforcement or threatened\n",
"enforcement of laws and regulations, including short-term occupancy and tax laws; private groups, such as homeowners, landlords, and condominium and neighborhood\n",
"associations, adopting and enforcing contracts that prohibit or restrict home sharing; leases, mortgages, and other agreements, or regulations that purport to ban or otherwise restrict\n",
"home sharing; Hosts opting for long-term rentals on other third-party platforms as an alternative to listing on our platform; economic, social, and political factors; perceptions of trust\n",
"and safety on and off our platform; negative experiences with guests, including guests who damage Host property, throw unauthorized parties, or engage in violent and unlawful\n",
"\n",
"\n",
"Table of Contents\n",
"Airbnb, Inc.\n",
"Consolidated Statements of Operations\n",
"(in millions, except per share amounts)\n",
"Year Ended December 31,\n",
"2020 2021 2022\n",
"Revenue $ 3,378 $ 5,992 $ 8,399 \n",
"Costs and expenses:\n",
"Cost of revenue 876 1,156 1,499 \n",
"Operations and support 878 847 1,041 \n",
"Product development 2,753 1,425 1,502 \n",
"Sales and marketing 1,175 1,186 1,516 \n",
"General and administrative 1,135 836 950 \n",
"Restructuring charges 151 113 89 \n",
"Total costs and expenses 6,968 5,563 6,597 \n",
"Income (loss) from operations (3,590) 429 1,802 \n",
"Interest income 27 13 186 \n",
"Interest expense (172) (438) (24)\n",
"Other income (expense), net (947) (304) 25 \n",
"Income (loss) before income taxes (4,682) (300) 1,989 \n",
"Provision for (benefit from) income taxes (97) 52 96 \n",
"Net income (loss) $ (4,585)$ (352)$ 1,893 \n",
"Net income (loss) per share attributable to Class A and Class B common stockholders:\n",
"Basic $ (16.12)$ (0.57)$ 2.97 \n",
"Diluted $ (16.12)$ (0.57)$ 2.79\n",
"\n",
"\n"
]
}
],
"source": [
"pretty_print(docs)"
]
},
{
"cell_type": "markdown",
"id": "2ba9bc9a-29b0-4faa-b74d-a32af105ed45",
"metadata": {},
"source": [
"### Cross Encoder Reranker\n",
"Uses cross encoder models are rerankers. Uses sentence transformer implemntation locally\n",
"\n",
"Latency - `1.38 s ± 64.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)`"
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "4b9ea674-c8c6-498a-a3cf-9b7fa9cb7334",
"metadata": {},
"outputs": [],
"source": [
"from lancedb.rerankers import CrossEncoderReranker\n",
"\n",
"reranker=CrossEncoderReranker()\n",
"docs = table.search(query, query_type=\"hybrid\").limit(5).rerank(reranker=reranker).to_pandas()[\"text\"].to_list()"
]
},
{
"cell_type": "code",
"execution_count": 34,
"id": "6fe32845-17f1-4977-9bd5-c18528b84656",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Table of Contents\n",
"Airbnb, Inc.\n",
"Consolidated Statements of Operations\n",
"(in millions, except per share amounts)\n",
"Year Ended December 31,\n",
"2020 2021 2022\n",
"Revenue $ 3,378 $ 5,992 $ 8,399 \n",
"Costs and expenses:\n",
"Cost of revenue 876 1,156 1,499 \n",
"Operations and support 878 847 1,041 \n",
"Product development 2,753 1,425 1,502 \n",
"Sales and marketing 1,175 1,186 1,516 \n",
"General and administrative 1,135 836 950 \n",
"Restructuring charges 151 113 89 \n",
"Total costs and expenses 6,968 5,563 6,597 \n",
"Income (loss) from operations (3,590) 429 1,802 \n",
"Interest income 27 13 186 \n",
"Interest expense (172) (438) (24)\n",
"Other income (expense), net (947) (304) 25 \n",
"Income (loss) before income taxes (4,682) (300) 1,989 \n",
"Provision for (benefit from) income taxes (97) 52 96 \n",
"Net income (loss) $ (4,585)$ (352)$ 1,893 \n",
"Net income (loss) per share attributable to Class A and Class B common stockholders:\n",
"Basic $ (16.12)$ (0.57)$ 2.97 \n",
"Diluted $ (16.12)$ (0.57)$ 2.79\n",
"\n",
"\n",
"Made Possible by Hosts, Strangers, AirCover, Categories, and OMG marketing campaigns and launches, a $67.9 million increase in our search engine marketing and advertising\n",
"spend, a $25.1 million increase in payroll-related expenses due to growth in headcount and increase in compensation costs, a $22.0 million increase in third-party service provider\n",
"expenses, and a $11.1 million increase in coupon expense in line with increase in revenue and launch of AirCover for guests, partially offset by a decrease of $22.9 million related to\n",
"the changes in the fair value of contingent consideration related to a 2019 acquisition.\n",
"General and Administrative\n",
"2021 2022 % Change\n",
"(in millions, except percentages)\n",
"General and administrative $ 836 $ 950 14 %\n",
"Percentage of revenue 14 % 11 %\n",
"General and administrative expense increased $114.0 million, or 14%, in 2022 compared to 2021, primarily due to an increase in other business and operational taxes of $41.3\n",
"\n",
"\n",
"Increased operating expenses, decreased revenue, negative publicity, negative reaction from our Hosts and guests and other stakeholders, or other adverse impacts from any of the\n",
"above factors or other risks related to our international operations could materially adversely affect our brand, reputation, business, results of operations, and financial condition.\n",
"In addition, we will continue to incur significant expenses to operate our outbound business in China, and we may never achieve profitability in that market. These factors, combined\n",
"with sentiment of the workforce in China, and Chinas policy towards foreign direct investment may particularly impact our operations in China. In addition, we need to ensure that\n",
"our business practices in China are compliant with local laws and regulations, which may be interpreted and enforced in ways that are different from our interpretation, and/or create\n",
"\n",
"\n",
"In addition, the number of listings on Airbnb may decline as a result of a number of other factors affecting Hosts, including: the COVID-19 pandemic; enforcement or threatened\n",
"enforcement of laws and regulations, including short-term occupancy and tax laws; private groups, such as homeowners, landlords, and condominium and neighborhood\n",
"associations, adopting and enforcing contracts that prohibit or restrict home sharing; leases, mortgages, and other agreements, or regulations that purport to ban or otherwise restrict\n",
"home sharing; Hosts opting for long-term rentals on other third-party platforms as an alternative to listing on our platform; economic, social, and political factors; perceptions of trust\n",
"and safety on and off our platform; negative experiences with guests, including guests who damage Host property, throw unauthorized parties, or engage in violent and unlawful\n",
"\n",
"\n",
"Our future revenue growth depends on the growth of supply and demand for listings on our platform, and our business is affected by general economic and business conditions\n",
"worldwide as well as trends in the global travel and hospitality industries and the short and long-term accommodation regulatory landscape. In addition, we believe that our revenue\n",
"growth depends upon a number of factors, including:\n",
"•global macroeconomic conditions, including inflation and rising interest rates and recessionary concerns;\n",
"•our ability to retain and grow the number of guests and Nights and Experiences Booked;\n",
"•our ability to retain and grow the number of Hosts and the number of available listings on our platform;\n",
"•events beyond our control such as pandemics and other health concerns, restrictions on travel and immigration, political, social or economic instability, including international\n",
"\n",
"\n"
]
}
],
"source": [
"pretty_print(docs)"
]
},
{
"cell_type": "markdown",
"id": "a32f41ea-e087-4e64-b9ec-f6224308fa6d",
"metadata": {},
"source": [
"### (Experimental) OpenAI Reranker\n",
"\n",
"This prompts chat model to rerank results which is not a dedicated reranker model. This should be treated as experimental. You might run out of token limit so set the search limits based on your token limit. \n",
"NOTE: It is recommended to use `gpt-4-turbo-preview`, older models might lead to bad behaviour\n",
"\n",
"Latency - `Can take 10s of seconds if using GPT-4 model`"
]
},
{
"cell_type": "code",
"execution_count": 39,
"id": "da78b250-9938-4e81-825f-c17b7a57e541",
"metadata": {},
"outputs": [],
"source": [
"from lancedb.rerankers import OpenaiReranker\n",
"\n",
"reranker=OpenaiReranker(model_name=\"gpt-4-turbo-preview\")\n",
"docs = table.search(query, query_type=\"hybrid\").limit(5).rerank(reranker=reranker).to_pandas()[\"text\"].to_list()"
]
},
{
"cell_type": "code",
"execution_count": 40,
"id": "98e83f73-1ef3-485f-9871-9bd32937863f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Made Possible by Hosts, Strangers, AirCover, Categories, and OMG marketing campaigns and launches, a $67.9 million increase in our search engine marketing and advertising\n",
"spend, a $25.1 million increase in payroll-related expenses due to growth in headcount and increase in compensation costs, a $22.0 million increase in third-party service provider\n",
"expenses, and a $11.1 million increase in coupon expense in line with increase in revenue and launch of AirCover for guests, partially offset by a decrease of $22.9 million related to\n",
"the changes in the fair value of contingent consideration related to a 2019 acquisition.\n",
"General and Administrative\n",
"2021 2022 % Change\n",
"(in millions, except percentages)\n",
"General and administrative $ 836 $ 950 14 %\n",
"Percentage of revenue 14 % 11 %\n",
"General and administrative expense increased $114.0 million, or 14%, in 2022 compared to 2021, primarily due to an increase in other business and operational taxes of $41.3\n",
"\n",
"\n",
"Table of Contents\n",
"Airbnb, Inc.\n",
"Consolidated Statements of Operations\n",
"(in millions, except per share amounts)\n",
"Year Ended December 31,\n",
"2020 2021 2022\n",
"Revenue $ 3,378 $ 5,992 $ 8,399 \n",
"Costs and expenses:\n",
"Cost of revenue 876 1,156 1,499 \n",
"Operations and support 878 847 1,041 \n",
"Product development 2,753 1,425 1,502 \n",
"Sales and marketing 1,175 1,186 1,516 \n",
"General and administrative 1,135 836 950 \n",
"Restructuring charges 151 113 89 \n",
"Total costs and expenses 6,968 5,563 6,597 \n",
"Income (loss) from operations (-3,590) 429 1,802 \n",
"Interest income 27 13 186 \n",
"Interest expense (-172) (-438) (-24)\n",
"Other income (expense), net (-947) (-304) 25 \n",
"Income (loss) before income taxes (-4,682) (-300) 1,989 \n",
"Provision for (benefit from) income taxes (-97) 52 96 \n",
"Net income (loss) $ (-4,585)$ (-352)$ 1,893 \n",
"Net income (loss) per share attributable to Class A and Class B common stockholders:\n",
"Basic $ (-16.12)$ (-0.57)$ 2.97 \n",
"Diluted $ (-16.12)$ (-0.57)$ 2.79\n",
"\n",
"\n",
"In addition, the number of listings on Airbnb may decline as a result of a number of other factors affecting Hosts, including: the COVID-19 pandemic; enforcement or threatened\n",
"enforcement of laws and regulations, including short-term occupancy and tax laws; private groups, such as homeowners, landlords, and condominium and neighborhood\n",
"associations, adopting and enforcing contracts that prohibit or restrict home sharing; leases, mortgages, and other agreements, or regulations that purport to ban or otherwise restrict\n",
"home sharing; Hosts opting for long-term rentals on other third-party platforms as an alternative to listing on our platform; economic, social, and political factors; perceptions of trust\n",
"and safety on and off our platform; negative experiences with guests, including guests who damage Host property, throw unauthorized parties, or engage in violent and unlawful\n",
"\n",
"\n",
"Our success depends significantly on existing guests continuing to book and attracting new guests to book on our platform. Our ability to attract and retain guests could be materially\n",
"adversely affected by a number of factors discussed elsewhere in these “Risk Factors,” including:\n",
"•events beyond our control such as the ongoing COVID-19 pandemic, other pandemics and health concerns, restrictions on travel, immigration, trade disputes, economic\n",
"downturns, and the impact of climate change on travel including the availability of preferred destinations and the increase in the frequency and severity of weather-related\n",
"events, including fires, floods, droughts, extreme temperatures and ambient temperature increases, severe weather and other natural disasters, and the impact of other\n",
"climate change on seasonal destinations;\n",
"•political, social, or economic instability;\n",
"•Hosts failing to meet guests expectations, including increased expectations for cleanliness in light of the COVID-19 pandemic;\n",
"\n",
"\n",
"Our future revenue growth depends on the growth of supply and demand for listings on our platform, and our business is affected by general economic and business conditions\n",
"worldwide as well as trends in the global travel and hospitality industries and the short and long-term accommodation regulatory landscape. In addition, we believe that our revenue\n",
"growth depends upon a number of factors, including:\n",
"•global macroeconomic conditions, including inflation and rising interest rates and recessionary concerns;\n",
"•our ability to retain and grow the number of guests and Nights and Experiences Booked;\n",
"•our ability to retain and grow the number of Hosts and the number of available listings on our platform;\n",
"•events beyond our control such as pandemics and other health concerns, restrictions on travel and immigration, political, social or economic instability, including international\n",
"\n",
"\n"
]
}
],
"source": [
"pretty_print(docs)"
]
},
{
"cell_type": "markdown",
"id": "42dfdbc5-9006-4398-8465-03828ad48e49",
"metadata": {},
"source": [
"## Use your custom Reranker\n",
"Hybrid search in LanceDB is designed to be very flexible. You can easily plug in your own Re-reranking logic. To do so, you simply need to implement the base Reranker class"
]
},
{
"cell_type": "code",
"execution_count": 42,
"id": "e14503fe-5e9f-4d61-a96b-a5e95d501f61",
"metadata": {},
"outputs": [],
"source": [
"from lancedb.rerankers import Reranker\n",
"import pyarrow as pa\n",
"\n",
"class MyCustomReranker(Reranker):\n",
" def rerank_hybrid(self, query: str, vector_results: pa.Table, fts_results: pa.Table)-> pa.Table:\n",
" combined_results = self.merge(vector_results, fts_results) # Or custom merge algo\n",
" # Custom Reranking logic here\n",
"\n",
" return combined_results"
]
},
{
"cell_type": "markdown",
"id": "0606d4fb-96ef-4440-9363-f5461284d00c",
"metadata": {},
"source": [
"### Custom Reranker based on CohereReranker\n",
"\n",
"For the sake of simplicity let's build custom reranker that just enchances the Cohere Reranker by accepting a filter query, and accept other CohereReranker params as kwags.\n",
"\n",
"For this toy example let's say we want to get rid of docs that represent a table of contents, appendix etc. as these are semantically close of representing costs but this isn't something we are interested in because they don't represent the specific reasons why operating costs were high. They simply represent the costs."
]
},
{
"cell_type": "code",
"execution_count": 56,
"id": "dd1e8110-72c4-423c-90de-ce2b386742c1",
"metadata": {},
"outputs": [],
"source": [
"from typing import List, Union\n",
"import pandas as pd\n",
"from lancedb.rerankers import CohereReranker\n",
"\n",
"class MofidifiedCohereReranker(CohereReranker):\n",
" def __init__(self, filters: Union[str, List[str]], **kwargs):\n",
" super().__init__(**kwargs)\n",
" filters = filters if isinstance(filters, list) else [filters]\n",
" self.filters = filters\n",
"\n",
" def rerank_hybrid(self, query: str, vector_results: pa.Table, fts_results: pa.Table)-> pa.Table:\n",
" combined_result = super().rerank_hybrid(query, vector_results, fts_results)\n",
" df = combined_result.to_pandas()\n",
" for filter in self.filters:\n",
" df = df.query(\"not text.str.contains(@filter)\")\n",
"\n",
" return pa.Table.from_pandas(df)\n",
"\n",
"reranker = MofidifiedCohereReranker(filters=\"Table of Contents\")"
]
},
{
"cell_type": "code",
"execution_count": 57,
"id": "f4e6b496-e0c1-4944-8a6d-127f566812d3",
"metadata": {},
"outputs": [],
"source": [
"docs = table.search(query, query_type=\"hybrid\").limit(5).rerank(reranker=reranker).to_pandas()[\"text\"].to_list()"
]
},
{
"cell_type": "code",
"execution_count": 58,
"id": "5a29d0a2-793a-40a2-ac2d-2edda1102d6e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Increased operating expenses, decreased revenue, negative publicity, negative reaction from our Hosts and guests and other stakeholders, or other adverse impacts from any of the\n",
"above factors or other risks related to our international operations could materially adversely affect our brand, reputation, business, results of operations, and financial condition.\n",
"In addition, we will continue to incur significant expenses to operate our outbound business in China, and we may never achieve profitability in that market. These factors, combined\n",
"with sentiment of the workforce in China, and Chinas policy towards foreign direct investment may particularly impact our operations in China. In addition, we need to ensure that\n",
"our business practices in China are compliant with local laws and regulations, which may be interpreted and enforced in ways that are different from our interpretation, and/or create\n",
"\n",
"\n",
"Made Possible by Hosts, Strangers, AirCover, Categories, and OMG marketing campaigns and launches, a $67.9 million increase in our search engine marketing and advertising\n",
"spend, a $25.1 million increase in payroll-related expenses due to growth in headcount and increase in compensation costs, a $22.0 million increase in third-party service provider\n",
"expenses, and a $11.1 million increase in coupon expense in line with increase in revenue and launch of AirCover for guests, partially offset by a decrease of $22.9 million related to\n",
"the changes in the fair value of contingent consideration related to a 2019 acquisition.\n",
"General and Administrative\n",
"2021 2022 % Change\n",
"(in millions, except percentages)\n",
"General and administrative $ 836 $ 950 14 %\n",
"Percentage of revenue 14 % 11 %\n",
"General and administrative expense increased $114.0 million, or 14%, in 2022 compared to 2021, primarily due to an increase in other business and operational taxes of $41.3\n",
"\n",
"\n",
"Our success depends significantly on existing guests continuing to book and attracting new guests to book on our platform. Our ability to attract and retain guests could be materially\n",
"adversely affected by a number of factors discussed elsewhere in these “Risk Factors,” including:\n",
"•events beyond our control such as the ongoing COVID-19 pandemic, other pandemics and health concerns, restrictions on travel, immigration, trade disputes, economic\n",
"downturns, and the impact of climate change on travel including the availability of preferred destinations and the increase in the frequency and severity of weather-related\n",
"events, including fires, floods, droughts, extreme temperatures and ambient temperature increases, severe weather and other natural disasters, and the impact of other\n",
"climate change on seasonal destinations;\n",
"•political, social, or economic instability;\n",
"•Hosts failing to meet guests expectations, including increased expectations for cleanliness in light of the COVID-19 pandemic;\n",
"\n",
"\n",
"In addition, the number of listings on Airbnb may decline as a result of a number of other factors affecting Hosts, including: the COVID-19 pandemic; enforcement or threatened\n",
"enforcement of laws and regulations, including short-term occupancy and tax laws; private groups, such as homeowners, landlords, and condominium and neighborhood\n",
"associations, adopting and enforcing contracts that prohibit or restrict home sharing; leases, mortgages, and other agreements, or regulations that purport to ban or otherwise restrict\n",
"home sharing; Hosts opting for long-term rentals on other third-party platforms as an alternative to listing on our platform; economic, social, and political factors; perceptions of trust\n",
"and safety on and off our platform; negative experiences with guests, including guests who damage Host property, throw unauthorized parties, or engage in violent and unlawful\n",
"\n",
"\n",
"Our future revenue growth depends on the growth of supply and demand for listings on our platform, and our business is affected by general economic and business conditions\n",
"worldwide as well as trends in the global travel and hospitality industries and the short and long-term accommodation regulatory landscape. In addition, we believe that our revenue\n",
"growth depends upon a number of factors, including:\n",
"•global macroeconomic conditions, including inflation and rising interest rates and recessionary concerns;\n",
"•our ability to retain and grow the number of guests and Nights and Experiences Booked;\n",
"•our ability to retain and grow the number of Hosts and the number of available listings on our platform;\n",
"•events beyond our control such as pandemics and other health concerns, restrictions on travel and immigration, political, social or economic instability, including international\n",
"\n",
"\n"
]
}
],
"source": [
"pretty_print(docs)"
]
},
{
"cell_type": "markdown",
"id": "b3b5464a-7252-4eab-aaac-9b0eae37496f",
"metadata": {},
"source": [
"As you can see the document containing the Table of contetnts of spending no longer shows up"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}