mirror of https://github.com/lancedb/lancedb.git synced 2026-01-03 18:32:55 +00:00

Files

Prashanth Rao 4d5d748acd docs: Updates and refactor (#683 )

This PR makes incremental changes to the documentation.

* Closes #697
* Closes #698

- [x] Add dark mode
- [x] Fix headers in navbar
- [x] Add `extra.css` to customize navbar styles
- [x] Customize fonts for prose/code blocks, navbar and admonitions
- [x] Inspect all admonition boxes (remove redundant dropdowns) and
improve clarity and readability
- [x] Ensure that all images in the docs have white background (not
transparent) to be viewable in dark mode
- [x] Improve code formatting in code blocks to make them consistent
with autoformatters (eslint/ruff)
- [x] Add bolder weight to h1 headers
- [x] Add diagram showing the difference between embedded (OSS) and
serverless (Cloud)
- [x] Fix [Creating an empty
table](https://lancedb.github.io/lancedb/guides/tables/#creating-empty-table)
section: right now, the subheaders are not clickable.
- [x] In critical data ingestion methods like `table.add` (among
others), the type signature often does not match the actual code
- [x] Proof-read each documentation section and rewrite as necessary to
provide more context, use cases, and explanations so it reads less like
reference documentation. This is especially important for CRUD and
search sections since those are so central to the user experience.

- [x] The section for [Adding
data](https://lancedb.github.io/lancedb/guides/tables/#adding-to-a-table)
only shows examples for pandas and iterables. We should include pydantic
models, arrow tables, etc.
- [x] Add conceptual tutorial for IVF-PQ index
- [x] Clearly separate vector search, FTS and filtering sections so that
these are easier to find
- [x] Add docs on refine factor to explain its importance for recall.
Closes #716
- [x] Add an FAQ page showing answers to commonly asked questions about
LanceDB. Closes #746
- [x] Add simple polars example to the integrations section. Closes #756
and closes #153
- [ ] Add basic docs for the Rust API (more detailed API docs can come
later). Closes #781
- [x] Add a section on the various storage options on local vs. cloud
(S3, EBS, EFS, local disk, etc.) and the tradeoffs involved. Closes #782
- [x] Revamp filtering docs: add pre-filtering examples and redo headers
and update content for SQL filters. Closes #783 and closes #784.
- [x] Add docs for data management: compaction, cleaning up old versions
and incremental indexing. Closes #785
- [ ] Add a benchmark section that also discusses some best practices.
Closes #787

---------

Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com>
Co-authored-by: Will Jones <willjones127@gmail.com>

2024-04-05 16:27:12 -07:00

3.8 KiB

Raw Blame History

LanceDB

LanceDB is an open-source vector database for AI that's designed to store, manage, query and retrieve embeddings on large-scale multi-modal data. The core of LanceDB is written in Rust 🦀 and is built on top of Lance, an open-source columnar data format designed for performant ML workloads and fast random access.

Both the database and the underlying data format are designed from the ground up to be easy-to-use, scalable and cost-effective.

Most existing vector databases that store and query just the embeddings and their metadata. The actual data is stored elsewhere, requiring you to manage their storage and versioning separately.

LanceDB supports storage of the actual data itself, alongside the embeddings and metadata. You can persist your images, videos, text documents, audio files and more in the Lance format, which provides automatic data versioning and blazing fast retrievals and filtering via LanceDB.

Open-source and cloud solutions

LanceDB is available in two flavors: OSS and Cloud.

LanceDB OSS is an open-source, batteries-included embedded vector database that you can run on your own infrastructure. "Embedded" means that it runs in-process, making it incredibly simple to self-host your own AI retrieval workflows for RAG and more. No servers, no hassle.

LanceDB Cloud is a SaaS (software-as-a-service) solution that runs serverless in the cloud, making the storage clearly separated from compute. It's designed to be cost-effective and highly scalable without breaking the bank. LanceDB Cloud is currently in private beta with general availability coming soon, but you can apply for early access with the private beta release by signing up below.

Try out LanceDB Cloud{ .md-button .md-button--primary }

Why use LanceDB?

Embedded (OSS) and serverless (Cloud) - no need to manage servers
Fast production-scale vector similarity, full-text & hybrid search and a SQL query interface (via DataFusion)
Native Python and Javascript/Typescript support
Store, query & manage multi-modal data (text, images, videos, point clouds, etc.), not just the embeddings and metadata
Tight integration with the Arrow ecosystem, allowing true zero-copy access in shared memory with SIMD and GPU acceleration
Automatic data versioning to manage versions of your data without needing extra infrastructure
Disk-based index & storage, allowing for massive scalability without breaking the bank
Ingest your favorite data formats directly, like pandas DataFrames, Pydantic objects, Polars (coming soon), and more

Documentation guide

The following pages go deeper into the internal of LanceDB and how to use it.

Quick start: Get started with LanceDB and vector DB concepts
Vector search concepts: Understand the basics of vector search
Working with tables: Learn how to work with tables and their associated functions
Indexing: Understand how to create indexes
Vector search: Learn how to perform vector similarity search
Full-text search: Learn how to perform full-text search
Managing embeddings: Managing embeddings and the embedding functions API in LanceDB
Ecosystem Integrations: Integrate LanceDB with other tools in the data ecosystem
Python API Reference: Python OSS and Cloud API references
JavaScript API Reference: JavaScript OSS and Cloud API references

3.8 KiB Raw Blame History

LanceDB

Truly multi-modal

Open-source and cloud solutions

Why use LanceDB?

Documentation guide

3.8 KiB

Raw Blame History