Files
lancedb/docs/src/basic.md
Prashanth Rao 119b928a52 docs: Updates and refactor (#683)
This PR makes incremental changes to the documentation.

* Closes #697 
* Closes #698

## Chores
- [x] Add dark mode
- [x] Fix headers in navbar
- [x] Add `extra.css` to customize navbar styles
- [x] Customize fonts for prose/code blocks, navbar and admonitions
- [x] Inspect all admonition boxes (remove redundant dropdowns) and
improve clarity and readability
- [x] Ensure that all images in the docs have white background (not
transparent) to be viewable in dark mode
- [x] Improve code formatting in code blocks to make them consistent
with autoformatters (eslint/ruff)
- [x] Add bolder weight to h1 headers
- [x] Add diagram showing the difference between embedded (OSS) and
serverless (Cloud)
- [x] Fix [Creating an empty
table](https://lancedb.github.io/lancedb/guides/tables/#creating-empty-table)
section: right now, the subheaders are not clickable.
- [x] In critical data ingestion methods like `table.add` (among
others), the type signature often does not match the actual code
- [x] Proof-read each documentation section and rewrite as necessary to
provide more context, use cases, and explanations so it reads less like
reference documentation. This is especially important for CRUD and
search sections since those are so central to the user experience.

## Restructure/new content 
- [x] The section for [Adding
data](https://lancedb.github.io/lancedb/guides/tables/#adding-to-a-table)
only shows examples for pandas and iterables. We should include pydantic
models, arrow tables, etc.
- [x] Add conceptual tutorial for IVF-PQ index
- [x] Clearly separate vector search, FTS and filtering sections so that
these are easier to find
- [x] Add docs on refine factor to explain its importance for recall.
Closes #716
- [x] Add an FAQ page showing answers to commonly asked questions about
LanceDB. Closes #746
- [x] Add simple polars example to the integrations section. Closes #756
and closes #153
- [ ] Add basic docs for the Rust API (more detailed API docs can come
later). Closes #781
- [x] Add a section on the various storage options on local vs. cloud
(S3, EBS, EFS, local disk, etc.) and the tradeoffs involved. Closes #782
- [x] Revamp filtering docs: add pre-filtering examples and redo headers
and update content for SQL filters. Closes #783 and closes #784.
- [x] Add docs for data management: compaction, cleaning up old versions
and incremental indexing. Closes #785
- [ ] Add a benchmark section that also discusses some best practices.
Closes #787

---------

Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
2024-01-19 00:18:37 +05:30

231 lines
6.7 KiB
Markdown

# Quick start
!!! info "LanceDB can be run in a number of ways:"
* Embedded within an existing backend (like your Django, Flask, Node.js or FastAPI application)
* Connected to directly from a client application like a Jupyter notebook for analytical workloads
* Deployed as a remote serverless database
![](assets/lancedb_embedded_explanation.png)
## Installation
=== "Python"
```shell
pip install lancedb
```
=== "Javascript"
```shell
npm install vectordb
```
## How to connect to a database
=== "Python"
```python
import lancedb
uri = "data/sample-lancedb"
db = lancedb.connect(uri)
```
LanceDB will create the directory if it doesn't exist (including parent directories).
If you need a reminder of the uri, use the `db.uri` property.
=== "Javascript"
```javascript
const lancedb = require("vectordb");
const uri = "data/sample-lancedb";
const db = await lancedb.connect(uri);
```
LanceDB will create the directory if it doesn't exist (including parent directories).
If you need a reminder of the uri, you can call `db.uri()`.
## How to create a table
=== "Python"
```python
tbl = db.create_table("my_table",
data=[{"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
{"vector": [5.9, 26.5], "item": "bar", "price": 20.0}])
```
If the table already exists, LanceDB will raise an error by default.
If you want to overwrite the table, you can pass in `mode="overwrite"`
to the `create_table` method.
You can also pass in a pandas DataFrame directly:
```python
import pandas as pd
df = pd.DataFrame([{"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
{"vector": [5.9, 26.5], "item": "bar", "price": 20.0}])
tbl = db.create_table("table_from_df", data=df)
```
=== "Javascript"
```javascript
const tb = await db.createTable(
"myTable",
[{"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
{"vector": [5.9, 26.5], "item": "bar", "price": 20.0}]
)
```
If the table already exists, LanceDB will raise an error by default.
If you want to overwrite the table, you can pass in `mode="overwrite"`
to the `createTable` function.
!!! info "Under the hood, LanceDB is converting the input data into an Apache Arrow table and persisting it to disk in [Lance format](https://www.github.com/lancedb/lance)."
### Creating an empty table
Sometimes you may not have the data to insert into the table at creation time.
In this case, you can create an empty table and specify the schema.
=== "Python"
```python
import pyarrow as pa
schema = pa.schema([pa.field("vector", pa.list_(pa.float32(), list_size=2))])
tbl = db.create_table("empty_table", schema=schema)
```
## How to open an existing table
Once created, you can open a table using the following code:
=== "Python"
```python
tbl = db.open_table("my_table")
```
If you forget the name of your table, you can always get a listing of all table names:
```python
print(db.table_names())
```
=== "Javascript"
```javascript
const tbl = await db.openTable("myTable");
```
If you forget the name of your table, you can always get a listing of all table names:
```javascript
console.log(await db.tableNames());
```
## How to add data to a table
After a table has been created, you can always add more data to it using
=== "Python"
```python
# Option 1: Add a list of dicts to a table
data = [{"vector": [1.3, 1.4], "item": "fizz", "price": 100.0},
{"vector": [9.5, 56.2], "item": "buzz", "price": 200.0}]
tbl.add(data)
# Option 2: Add a pandas DataFrame to a table
df = pd.DataFrame(data)
tbl.add(data)
```
=== "Javascript"
```javascript
await tbl.add([{vector: [1.3, 1.4], item: "fizz", price: 100.0},
{vector: [9.5, 56.2], item: "buzz", price: 200.0}])
```
## How to search for (approximate) nearest neighbors
Once you've embedded the query, you can find its nearest neighbors using the following code:
=== "Python"
```python
tbl.search([100, 100]).limit(2).to_pandas()
```
This returns a pandas DataFrame with the results.
=== "Javascript"
```javascript
const query = await tbl.search([100, 100]).limit(2).execute();
```
## How to delete rows from a table
Use the `delete()` method on tables to delete rows from a table. To choose
which rows to delete, provide a filter that matches on the metadata columns.
This can delete any number of rows that match the filter.
=== "Python"
```python
tbl.delete('item = "fizz"')
```
=== "Javascript"
```javascript
await tbl.delete('item = "fizz"')
```
The deletion predicate is a SQL expression that supports the same expressions
as the `where()` clause on a search. They can be as simple or complex as needed.
To see what expressions are supported, see the [SQL filters](sql.md) section.
=== "Python"
Read more: [lancedb.table.Table.delete][]
=== "Javascript"
Read more: [vectordb.Table.delete](javascript/interfaces/Table.md#delete)
## How to remove a table
Use the `drop_table()` method on the database to remove a table.
=== "Python"
```python
db.drop_table("my_table")
```
This permanently removes the table and is not recoverable, unlike deleting rows.
By default, if the table does not exist an exception is raised. To suppress this,
you can pass in `ignore_missing=True`.
=== "JavaScript"
```javascript
await db.dropTable('myTable')
```
This permanently removes the table and is not recoverable, unlike deleting rows.
If the table does not exist an exception is raised.
!!! note "Bundling `vectordb` apps with Webpack"
If you're using the `vectordb` module in JavaScript, since LanceDB contains a prebuilt Node binary, you must configure `next.config.js` to exclude it from webpack. This is required for both using Next.js and deploying a LanceDB app on Vercel.
```javascript
/** @type {import('next').NextConfig} */
module.exports = ({
webpack(config) {
config.externals.push({ vectordb: 'vectordb' })
return config;
}
})
```
## What's next
This section covered the very basics of using LanceDB. If you're learning about vector databases for the first time, you may want to read the page on [indexing](concepts/index_ivfpq.md) to get familiar with the concepts.
If you've already worked with other vector databases, you may want to read the [guides](guides/tables.md) to learn how to work with LanceDB in more detail.