mirror of https://github.com/lancedb/lancedb.git synced 2026-01-04 10:52:56 +00:00

Files

Prashanth Rao 119b928a52 docs: Updates and refactor (#683 )

This PR makes incremental changes to the documentation.

* Closes #697 
* Closes #698

## Chores
- [x] Add dark mode
- [x] Fix headers in navbar
- [x] Add `extra.css` to customize navbar styles
- [x] Customize fonts for prose/code blocks, navbar and admonitions
- [x] Inspect all admonition boxes (remove redundant dropdowns) and
improve clarity and readability
- [x] Ensure that all images in the docs have white background (not
transparent) to be viewable in dark mode
- [x] Improve code formatting in code blocks to make them consistent
with autoformatters (eslint/ruff)
- [x] Add bolder weight to h1 headers
- [x] Add diagram showing the difference between embedded (OSS) and
serverless (Cloud)
- [x] Fix [Creating an empty
table](https://lancedb.github.io/lancedb/guides/tables/#creating-empty-table)
section: right now, the subheaders are not clickable.
- [x] In critical data ingestion methods like `table.add` (among
others), the type signature often does not match the actual code
- [x] Proof-read each documentation section and rewrite as necessary to
provide more context, use cases, and explanations so it reads less like
reference documentation. This is especially important for CRUD and
search sections since those are so central to the user experience.

## Restructure/new content 
- [x] The section for [Adding
data](https://lancedb.github.io/lancedb/guides/tables/#adding-to-a-table)
only shows examples for pandas and iterables. We should include pydantic
models, arrow tables, etc.
- [x] Add conceptual tutorial for IVF-PQ index
- [x] Clearly separate vector search, FTS and filtering sections so that
these are easier to find
- [x] Add docs on refine factor to explain its importance for recall.
Closes #716
- [x] Add an FAQ page showing answers to commonly asked questions about
LanceDB. Closes #746
- [x] Add simple polars example to the integrations section. Closes #756
and closes #153
- [ ] Add basic docs for the Rust API (more detailed API docs can come
later). Closes #781
- [x] Add a section on the various storage options on local vs. cloud
(S3, EBS, EFS, local disk, etc.) and the tradeoffs involved. Closes #782
- [x] Revamp filtering docs: add pre-filtering examples and redo headers
and update content for SQL filters. Closes #783 and closes #784.
- [x] Add docs for data management: compaction, cleaning up old versions
and incremental indexing. Closes #785
- [ ] Add a benchmark section that also discusses some best practices.
Closes #787

---------

Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com>
Co-authored-by: Will Jones <willjones127@gmail.com>

2024-01-19 00:18:37 +05:30

2.0 KiB

Raw Blame History

FiftyOne

FiftyOne is an open source toolkit for building high-quality datasets and computer vision models. It provides an API to create LanceDB tables and run similarity queries, both programmatically in Python and via point-and-click in the App.

Basic recipe

The basic workflow shown below uses LanceDB to create a similarity index on your FiftyOne datasets:

Load a dataset into FiftyOne.
Compute embedding vectors for samples or patches in your dataset, or select a model to use to generate embeddings.
Use the compute_similarity() method to generate a LanceDB table for the samples or object patches embeddings in a dataset by setting the parameter backend="lancedb" and specifying a brain_key of your choice.
Use this LanceDB table to query your data with sort_by_similarity().
If desired, delete the table.

The example below demonstrates this workflow.

!!! Note

Install the LanceDB Python client to run the code shown below.
```
pip install lancedb
```


import fiftyone as fo
import fiftyone.brain as fob
import fiftyone.zoo as foz

# Step 1: Load your data into FiftyOne
dataset = foz.load_zoo_dataset("quickstart")

# Steps 2 and 3: Compute embeddings and create a similarity index
lancedb_index = fob.compute_similarity(
    dataset, 
    model="clip-vit-base32-torch",
    brain_key="lancedb_index",
    backend="lancedb",
)

Once the similarity index has been generated, we can query our data in FiftyOne by specifying the brain_key:

# Step 4: Query your data
query = dataset.first().id  # query by sample ID
view = dataset.sort_by_similarity(
    query, 
    brain_key="lancedb_index",
    k=10,  # limit to 10 most similar samples
)

# Step 5 (optional): Cleanup

# Delete the LanceDB table
lancedb_index.cleanup()

# Delete run record from FiftyOne
dataset.delete_brain_run("lancedb_index")

For a much more in depth walkthrough of the integration, visit the LanceDB x Voxel51 docs page.

2.0 KiB Raw Blame History

FiftyOne

Basic recipe

2.0 KiB

Raw Blame History