<img width="1054" alt="Screenshot 2023-07-24 at 6 13 00 PM" src="https://github.com/lancedb/lancedb/assets/15766192/a263a17e-66d0-4591-adc7-b520aa5b23f6"> Is this a problem? Are we using metadata to track usage or something?
1.8 KiB
Basic recipe
The basic workflow to use LanceDB to create a similarity index on your FiftyOne datasets and use this to query your data is as follows:
-
Load a dataset into FiftyOne
-
Compute embedding vectors for samples or patches in your dataset, or select a model to use to generate embeddings
-
Use the
compute_similarity()method to generate a LanceDB table for the samples or object patches embeddings in a dataset by setting the parameterbackend="lancedb"and specifying abrain_keyof your choice -
Use this LanceDB table to query your data with
sort_by_similarity() -
If desired, delete the table
The example below demonstrates this workflow.
!!! Note
You must install the LanceDB Python client to run this
```
pip install lancedb
```
import fiftyone as fo
import fiftyone.brain as fob
import fiftyone.zoo as foz
# Step 1: Load your data into FiftyOne
dataset = foz.load_zoo_dataset("quickstart")
# Steps 2 and 3: Compute embeddings and create a similarity index
lancedb_index = fob.compute_similarity(
dataset,
model="clip-vit-base32-torch",
brain_key="lancedb_index",
backend="lancedb",
)
Once the similarity index has been generated, we can query our data in FiftyOne
by specifying the brain_key:
# Step 4: Query your data
query = dataset.first().id # query by sample ID
view = dataset.sort_by_similarity(
query,
brain_key="lancedb_index",
k=10, # limit to 10 most similar samples
)
# Step 5 (optional): Cleanup
# Delete the LanceDB table
lancedb_index.cleanup()
# Delete run record from FiftyOne
dataset.delete_brain_run("lancedb_index")
More in depth walkthrough of the integration, visit the LanceDB guide on Voxel51 - LaceDB x Voxel51
