mirror of
https://github.com/lancedb/lancedb.git
synced 2026-05-19 04:50:40 +00:00
[Docs]: Add badges, CTA and updates examples (#358)
<img width="1054" alt="Screenshot 2023-07-24 at 6 13 00 PM" src="https://github.com/lancedb/lancedb/assets/15766192/a263a17e-66d0-4591-adc7-b520aa5b23f6"> Is this a problem? Are we using metadata to track usage or something?
This commit is contained in:
@@ -57,12 +57,14 @@ nav:
|
||||
- Basics: basic.md
|
||||
- Embeddings: embedding.md
|
||||
- Python full-text search: fts.md
|
||||
- Python integrations:
|
||||
- Integrations:
|
||||
- Pandas and PyArrow: python/arrow.md
|
||||
- DuckDB: python/duckdb.md
|
||||
- LangChain 🦜️🔗: https://python.langchain.com/en/latest/modules/indexes/vectorstores/examples/lancedb.html
|
||||
- LangChain JS/TS 🦜️🔗: https://js.langchain.com/docs/modules/data_connection/vectorstores/integrations/lancedb
|
||||
- LlamaIndex 🦙: https://gpt-index.readthedocs.io/en/latest/examples/vector_stores/LanceDBIndexDemo.html
|
||||
- Pydantic: python/pydantic.md
|
||||
- Voxel51: integrations/voxel51.md
|
||||
- Python examples:
|
||||
- YouTube Transcript Search: notebooks/youtube_transcript_search.ipynb
|
||||
- Documentation QA Bot using LangChain: notebooks/code_qa_bot.ipynb
|
||||
@@ -72,6 +74,7 @@ nav:
|
||||
- Javascript examples:
|
||||
- YouTube Transcript Search: examples/youtube_transcript_bot_with_nodejs.md
|
||||
- TransformersJS Embedding Search: examples/transformerjs_embedding_search_nodejs.md
|
||||
|
||||
- References:
|
||||
- Vector Search: search.md
|
||||
- SQL filters: sql.md
|
||||
|
||||
BIN
docs/src/assets/voxel.gif
Normal file
BIN
docs/src/assets/voxel.gif
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 953 KiB |
@@ -4,4 +4,10 @@
|
||||
|
||||
<img id="splash" width="400" alt="youtube transcript search" src="https://user-images.githubusercontent.com/917119/236965568-def7394d-171c-45f2-939d-8edfeaadd88c.png">
|
||||
|
||||
|
||||
<a href="https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/youtube_bot/main.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab">
|
||||
|
||||
Scripts - [](./examples/youtube_bot/main.py) [](./examples/youtube_bot/index.js)
|
||||
|
||||
|
||||
This example is in a [notebook](https://github.com/lancedb/lancedb/blob/main/docs/src/notebooks/youtube_transcript_search.ipynb)
|
||||
|
||||
71
docs/src/integrations/voxel51.md
Normal file
71
docs/src/integrations/voxel51.md
Normal file
@@ -0,0 +1,71 @@
|
||||

|
||||
|
||||
Basic recipe
|
||||
____________
|
||||
|
||||
The basic workflow to use LanceDB to create a similarity index on your FiftyOne
|
||||
datasets and use this to query your data is as follows:
|
||||
|
||||
1) Load a dataset into FiftyOne
|
||||
|
||||
2) Compute embedding vectors for samples or patches in your dataset, or select
|
||||
a model to use to generate embeddings
|
||||
|
||||
3) Use the `compute_similarity()`
|
||||
method to generate a LanceDB table for the samples or object
|
||||
patches embeddings in a dataset by setting the parameter `backend="lancedb"` and
|
||||
specifying a `brain_key` of your choice
|
||||
|
||||
4) Use this LanceDB table to query your data with
|
||||
`sort_by_similarity()`
|
||||
|
||||
5) If desired, delete the table
|
||||
|
||||
The example below demonstrates this workflow.
|
||||
|
||||
!!! Note
|
||||
|
||||
You must install the LanceDB Python client to run this
|
||||
```
|
||||
pip install lancedb
|
||||
```
|
||||
|
||||
```python
|
||||
|
||||
import fiftyone as fo
|
||||
import fiftyone.brain as fob
|
||||
import fiftyone.zoo as foz
|
||||
|
||||
# Step 1: Load your data into FiftyOne
|
||||
dataset = foz.load_zoo_dataset("quickstart")
|
||||
|
||||
# Steps 2 and 3: Compute embeddings and create a similarity index
|
||||
lancedb_index = fob.compute_similarity(
|
||||
dataset,
|
||||
model="clip-vit-base32-torch",
|
||||
brain_key="lancedb_index",
|
||||
backend="lancedb",
|
||||
)
|
||||
```
|
||||
Once the similarity index has been generated, we can query our data in FiftyOne
|
||||
by specifying the `brain_key`:
|
||||
|
||||
```python
|
||||
# Step 4: Query your data
|
||||
query = dataset.first().id # query by sample ID
|
||||
view = dataset.sort_by_similarity(
|
||||
query,
|
||||
brain_key="lancedb_index",
|
||||
k=10, # limit to 10 most similar samples
|
||||
)
|
||||
|
||||
# Step 5 (optional): Cleanup
|
||||
|
||||
# Delete the LanceDB table
|
||||
lancedb_index.cleanup()
|
||||
|
||||
# Delete run record from FiftyOne
|
||||
dataset.delete_brain_run("lancedb_index")
|
||||
```
|
||||
|
||||
More in depth walkthrough of the integration, visit the LanceDB guide on Voxel51 - [LaceDB x Voxel51](https://docs.voxel51.com/integrations/lancedb.html)
|
||||
@@ -10,7 +10,11 @@
|
||||
"\n",
|
||||
"This Q&A bot will allow you to query your own documentation easily using questions. We'll also demonstrate the use of LangChain and LanceDB using the OpenAI API. \n",
|
||||
"\n",
|
||||
"In this example we'll use Pandas 2.0 documentation, but, this could be replaced for your own docs as well"
|
||||
"In this example we'll use Pandas 2.0 documentation, but, this could be replaced for your own docs as well\n",
|
||||
"\n",
|
||||
"<a href=\"https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/Code-Documentation-QA-Bot/main.ipynb\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"></a>\n",
|
||||
"\n",
|
||||
"Scripts - [](./examples/Code-Documentation-QA-Bot/main.py) [](./examples/Code-Documentation-QA-Bot/index.js)"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -1,5 +1,14 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"\n",
|
||||
" <a href=\"https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/multimodal_clip/main.ipynb\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"></a>| [](./examples/multimodal_clip/main.py) |"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
@@ -42,6 +51,19 @@
|
||||
"## First run setup: Download data and pre-process"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"### Get dataset\n",
|
||||
"\n",
|
||||
"!wget https://eto-public.s3.us-west-2.amazonaws.com/datasets/diffusiondb_lance.tar.gz\n",
|
||||
"!tar -xvf diffusiondb_lance.tar.gz\n",
|
||||
"!mv diffusiondb_test rawdata.lance\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 30,
|
||||
@@ -247,7 +269,7 @@
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"display_name": "Python 3.11.4 64-bit",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
@@ -261,7 +283,12 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
"version": "3.11.4"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
"hash": "b0fa6594d8f4cbf19f97940f81e996739fb7646882a419484c72d19e05852a7e"
|
||||
}
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -8,7 +8,12 @@
|
||||
"source": [
|
||||
"# Youtube Transcript Search QA Bot\n",
|
||||
"\n",
|
||||
"This Q&A bot will allow you to search through youtube transcripts using natural language! By going through this notebook, we'll introduce how you can use LanceDB to store and manage your data easily."
|
||||
"This Q&A bot will allow you to search through youtube transcripts using natural language! By going through this notebook, we'll introduce how you can use LanceDB to store and manage your data easily.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"<a href=\"https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/youtube_bot/main.ipynb\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\">\n",
|
||||
"\n",
|
||||
"Scripts - [](./examples/youtube_bot/main.py) [](./examples/youtube_bot/index.js)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -7,7 +7,8 @@ excluded_files = [
|
||||
"../src/embedding.md",
|
||||
"../src/examples/serverless_lancedb_with_s3_and_lambda.md",
|
||||
"../src/examples/serverless_qa_bot_with_modal_and_langchain.md",
|
||||
"../src/examples/youtube_transcript_bot_with_nodejs.md"
|
||||
"../src/examples/youtube_transcript_bot_with_nodejs.md",
|
||||
"../src/integrations/voxel51.md",
|
||||
]
|
||||
|
||||
python_prefix = "py"
|
||||
|
||||
Reference in New Issue
Block a user