mirror of
https://github.com/lancedb/lancedb.git
synced 2026-01-07 04:12:59 +00:00
This PR makes incremental changes to the documentation. * Closes #697 * Closes #698 ## Chores - [x] Add dark mode - [x] Fix headers in navbar - [x] Add `extra.css` to customize navbar styles - [x] Customize fonts for prose/code blocks, navbar and admonitions - [x] Inspect all admonition boxes (remove redundant dropdowns) and improve clarity and readability - [x] Ensure that all images in the docs have white background (not transparent) to be viewable in dark mode - [x] Improve code formatting in code blocks to make them consistent with autoformatters (eslint/ruff) - [x] Add bolder weight to h1 headers - [x] Add diagram showing the difference between embedded (OSS) and serverless (Cloud) - [x] Fix [Creating an empty table](https://lancedb.github.io/lancedb/guides/tables/#creating-empty-table) section: right now, the subheaders are not clickable. - [x] In critical data ingestion methods like `table.add` (among others), the type signature often does not match the actual code - [x] Proof-read each documentation section and rewrite as necessary to provide more context, use cases, and explanations so it reads less like reference documentation. This is especially important for CRUD and search sections since those are so central to the user experience. ## Restructure/new content - [x] The section for [Adding data](https://lancedb.github.io/lancedb/guides/tables/#adding-to-a-table) only shows examples for pandas and iterables. We should include pydantic models, arrow tables, etc. - [x] Add conceptual tutorial for IVF-PQ index - [x] Clearly separate vector search, FTS and filtering sections so that these are easier to find - [x] Add docs on refine factor to explain its importance for recall. Closes #716 - [x] Add an FAQ page showing answers to commonly asked questions about LanceDB. Closes #746 - [x] Add simple polars example to the integrations section. Closes #756 and closes #153 - [ ] Add basic docs for the Rust API (more detailed API docs can come later). Closes #781 - [x] Add a section on the various storage options on local vs. cloud (S3, EBS, EFS, local disk, etc.) and the tradeoffs involved. Closes #782 - [x] Revamp filtering docs: add pre-filtering examples and redo headers and update content for SQL filters. Closes #783 and closes #784. - [x] Add docs for data management: compaction, cleaning up old versions and incremental indexing. Closes #785 - [ ] Add a benchmark section that also discusses some best practices. Closes #787 --------- Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com> Co-authored-by: Will Jones <willjones127@gmail.com>
231 lines
6.7 KiB
Markdown
231 lines
6.7 KiB
Markdown
# Quick start
|
|
|
|
!!! info "LanceDB can be run in a number of ways:"
|
|
|
|
* Embedded within an existing backend (like your Django, Flask, Node.js or FastAPI application)
|
|
* Connected to directly from a client application like a Jupyter notebook for analytical workloads
|
|
* Deployed as a remote serverless database
|
|
|
|

|
|
|
|
## Installation
|
|
|
|
=== "Python"
|
|
```shell
|
|
pip install lancedb
|
|
```
|
|
|
|
=== "Javascript"
|
|
```shell
|
|
npm install vectordb
|
|
```
|
|
|
|
## How to connect to a database
|
|
|
|
=== "Python"
|
|
```python
|
|
import lancedb
|
|
uri = "data/sample-lancedb"
|
|
db = lancedb.connect(uri)
|
|
```
|
|
|
|
LanceDB will create the directory if it doesn't exist (including parent directories).
|
|
|
|
If you need a reminder of the uri, use the `db.uri` property.
|
|
|
|
=== "Javascript"
|
|
```javascript
|
|
const lancedb = require("vectordb");
|
|
|
|
const uri = "data/sample-lancedb";
|
|
const db = await lancedb.connect(uri);
|
|
```
|
|
|
|
LanceDB will create the directory if it doesn't exist (including parent directories).
|
|
|
|
If you need a reminder of the uri, you can call `db.uri()`.
|
|
|
|
## How to create a table
|
|
|
|
=== "Python"
|
|
```python
|
|
tbl = db.create_table("my_table",
|
|
data=[{"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
|
|
{"vector": [5.9, 26.5], "item": "bar", "price": 20.0}])
|
|
```
|
|
|
|
If the table already exists, LanceDB will raise an error by default.
|
|
If you want to overwrite the table, you can pass in `mode="overwrite"`
|
|
to the `create_table` method.
|
|
|
|
You can also pass in a pandas DataFrame directly:
|
|
```python
|
|
import pandas as pd
|
|
df = pd.DataFrame([{"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
|
|
{"vector": [5.9, 26.5], "item": "bar", "price": 20.0}])
|
|
tbl = db.create_table("table_from_df", data=df)
|
|
```
|
|
|
|
=== "Javascript"
|
|
```javascript
|
|
const tb = await db.createTable(
|
|
"myTable",
|
|
[{"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
|
|
{"vector": [5.9, 26.5], "item": "bar", "price": 20.0}]
|
|
)
|
|
```
|
|
|
|
If the table already exists, LanceDB will raise an error by default.
|
|
If you want to overwrite the table, you can pass in `mode="overwrite"`
|
|
to the `createTable` function.
|
|
|
|
|
|
!!! info "Under the hood, LanceDB is converting the input data into an Apache Arrow table and persisting it to disk in [Lance format](https://www.github.com/lancedb/lance)."
|
|
|
|
### Creating an empty table
|
|
|
|
Sometimes you may not have the data to insert into the table at creation time.
|
|
In this case, you can create an empty table and specify the schema.
|
|
|
|
=== "Python"
|
|
```python
|
|
import pyarrow as pa
|
|
schema = pa.schema([pa.field("vector", pa.list_(pa.float32(), list_size=2))])
|
|
tbl = db.create_table("empty_table", schema=schema)
|
|
```
|
|
|
|
## How to open an existing table
|
|
|
|
Once created, you can open a table using the following code:
|
|
|
|
=== "Python"
|
|
```python
|
|
tbl = db.open_table("my_table")
|
|
```
|
|
|
|
If you forget the name of your table, you can always get a listing of all table names:
|
|
|
|
```python
|
|
print(db.table_names())
|
|
```
|
|
|
|
=== "Javascript"
|
|
```javascript
|
|
const tbl = await db.openTable("myTable");
|
|
```
|
|
|
|
If you forget the name of your table, you can always get a listing of all table names:
|
|
|
|
```javascript
|
|
console.log(await db.tableNames());
|
|
```
|
|
|
|
## How to add data to a table
|
|
|
|
After a table has been created, you can always add more data to it using
|
|
|
|
=== "Python"
|
|
```python
|
|
|
|
# Option 1: Add a list of dicts to a table
|
|
data = [{"vector": [1.3, 1.4], "item": "fizz", "price": 100.0},
|
|
{"vector": [9.5, 56.2], "item": "buzz", "price": 200.0}]
|
|
tbl.add(data)
|
|
|
|
# Option 2: Add a pandas DataFrame to a table
|
|
df = pd.DataFrame(data)
|
|
tbl.add(data)
|
|
```
|
|
|
|
=== "Javascript"
|
|
```javascript
|
|
await tbl.add([{vector: [1.3, 1.4], item: "fizz", price: 100.0},
|
|
{vector: [9.5, 56.2], item: "buzz", price: 200.0}])
|
|
```
|
|
|
|
## How to search for (approximate) nearest neighbors
|
|
|
|
Once you've embedded the query, you can find its nearest neighbors using the following code:
|
|
|
|
=== "Python"
|
|
```python
|
|
tbl.search([100, 100]).limit(2).to_pandas()
|
|
```
|
|
|
|
This returns a pandas DataFrame with the results.
|
|
|
|
=== "Javascript"
|
|
```javascript
|
|
const query = await tbl.search([100, 100]).limit(2).execute();
|
|
```
|
|
|
|
## How to delete rows from a table
|
|
|
|
Use the `delete()` method on tables to delete rows from a table. To choose
|
|
which rows to delete, provide a filter that matches on the metadata columns.
|
|
This can delete any number of rows that match the filter.
|
|
|
|
=== "Python"
|
|
```python
|
|
tbl.delete('item = "fizz"')
|
|
```
|
|
|
|
=== "Javascript"
|
|
```javascript
|
|
await tbl.delete('item = "fizz"')
|
|
```
|
|
|
|
The deletion predicate is a SQL expression that supports the same expressions
|
|
as the `where()` clause on a search. They can be as simple or complex as needed.
|
|
To see what expressions are supported, see the [SQL filters](sql.md) section.
|
|
|
|
|
|
=== "Python"
|
|
|
|
Read more: [lancedb.table.Table.delete][]
|
|
|
|
=== "Javascript"
|
|
|
|
Read more: [vectordb.Table.delete](javascript/interfaces/Table.md#delete)
|
|
|
|
## How to remove a table
|
|
|
|
Use the `drop_table()` method on the database to remove a table.
|
|
|
|
=== "Python"
|
|
```python
|
|
db.drop_table("my_table")
|
|
```
|
|
|
|
This permanently removes the table and is not recoverable, unlike deleting rows.
|
|
By default, if the table does not exist an exception is raised. To suppress this,
|
|
you can pass in `ignore_missing=True`.
|
|
|
|
=== "JavaScript"
|
|
```javascript
|
|
await db.dropTable('myTable')
|
|
```
|
|
|
|
This permanently removes the table and is not recoverable, unlike deleting rows.
|
|
If the table does not exist an exception is raised.
|
|
|
|
!!! note "Bundling `vectordb` apps with Webpack"
|
|
|
|
If you're using the `vectordb` module in JavaScript, since LanceDB contains a prebuilt Node binary, you must configure `next.config.js` to exclude it from webpack. This is required for both using Next.js and deploying a LanceDB app on Vercel.
|
|
|
|
```javascript
|
|
/** @type {import('next').NextConfig} */
|
|
module.exports = ({
|
|
webpack(config) {
|
|
config.externals.push({ vectordb: 'vectordb' })
|
|
return config;
|
|
}
|
|
})
|
|
```
|
|
|
|
## What's next
|
|
|
|
This section covered the very basics of using LanceDB. If you're learning about vector databases for the first time, you may want to read the page on [indexing](concepts/index_ivfpq.md) to get familiar with the concepts.
|
|
|
|
If you've already worked with other vector databases, you may want to read the [guides](guides/tables.md) to learn how to work with LanceDB in more detail.
|