Files
lancedb/docs/src/basic.md
Prashanth Rao 119b928a52 docs: Updates and refactor (#683)
This PR makes incremental changes to the documentation.

* Closes #697 
* Closes #698

## Chores
- [x] Add dark mode
- [x] Fix headers in navbar
- [x] Add `extra.css` to customize navbar styles
- [x] Customize fonts for prose/code blocks, navbar and admonitions
- [x] Inspect all admonition boxes (remove redundant dropdowns) and
improve clarity and readability
- [x] Ensure that all images in the docs have white background (not
transparent) to be viewable in dark mode
- [x] Improve code formatting in code blocks to make them consistent
with autoformatters (eslint/ruff)
- [x] Add bolder weight to h1 headers
- [x] Add diagram showing the difference between embedded (OSS) and
serverless (Cloud)
- [x] Fix [Creating an empty
table](https://lancedb.github.io/lancedb/guides/tables/#creating-empty-table)
section: right now, the subheaders are not clickable.
- [x] In critical data ingestion methods like `table.add` (among
others), the type signature often does not match the actual code
- [x] Proof-read each documentation section and rewrite as necessary to
provide more context, use cases, and explanations so it reads less like
reference documentation. This is especially important for CRUD and
search sections since those are so central to the user experience.

## Restructure/new content 
- [x] The section for [Adding
data](https://lancedb.github.io/lancedb/guides/tables/#adding-to-a-table)
only shows examples for pandas and iterables. We should include pydantic
models, arrow tables, etc.
- [x] Add conceptual tutorial for IVF-PQ index
- [x] Clearly separate vector search, FTS and filtering sections so that
these are easier to find
- [x] Add docs on refine factor to explain its importance for recall.
Closes #716
- [x] Add an FAQ page showing answers to commonly asked questions about
LanceDB. Closes #746
- [x] Add simple polars example to the integrations section. Closes #756
and closes #153
- [ ] Add basic docs for the Rust API (more detailed API docs can come
later). Closes #781
- [x] Add a section on the various storage options on local vs. cloud
(S3, EBS, EFS, local disk, etc.) and the tradeoffs involved. Closes #782
- [x] Revamp filtering docs: add pre-filtering examples and redo headers
and update content for SQL filters. Closes #783 and closes #784.
- [x] Add docs for data management: compaction, cleaning up old versions
and incremental indexing. Closes #785
- [ ] Add a benchmark section that also discusses some best practices.
Closes #787

---------

Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
2024-01-19 00:18:37 +05:30

6.7 KiB

Quick start

!!! info "LanceDB can be run in a number of ways:"

* Embedded within an existing backend (like your Django, Flask, Node.js or FastAPI application)
* Connected to directly from a client application like a Jupyter notebook for analytical workloads
* Deployed as a remote serverless database

Installation

=== "Python" shell pip install lancedb

=== "Javascript" shell npm install vectordb

How to connect to a database

=== "Python" python import lancedb uri = "data/sample-lancedb" db = lancedb.connect(uri)

  LanceDB will create the directory if it doesn't exist (including parent directories).

  If you need a reminder of the uri, use the `db.uri` property.

=== "Javascript" ```javascript const lancedb = require("vectordb");

  const uri = "data/sample-lancedb";
  const db = await lancedb.connect(uri);
  ```
  
  LanceDB will create the directory if it doesn't exist (including parent directories).

  If you need a reminder of the uri, you can call `db.uri()`.

How to create a table

=== "Python" python tbl = db.create_table("my_table", data=[{"vector": [3.1, 4.1], "item": "foo", "price": 10.0}, {"vector": [5.9, 26.5], "item": "bar", "price": 20.0}])

If the table already exists, LanceDB will raise an error by default.
If you want to overwrite the table, you can pass in `mode="overwrite"`
to the `create_table` method.

You can also pass in a pandas DataFrame directly:
```python
import pandas as pd
df = pd.DataFrame([{"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
                   {"vector": [5.9, 26.5], "item": "bar", "price": 20.0}])
tbl = db.create_table("table_from_df", data=df)
```

=== "Javascript" javascript const tb = await db.createTable( "myTable", [{"vector": [3.1, 4.1], "item": "foo", "price": 10.0}, {"vector": [5.9, 26.5], "item": "bar", "price": 20.0}] )

If the table already exists, LanceDB will raise an error by default.
If you want to overwrite the table, you can pass in `mode="overwrite"`
to the `createTable` function.

!!! info "Under the hood, LanceDB is converting the input data into an Apache Arrow table and persisting it to disk in Lance format."

Creating an empty table

Sometimes you may not have the data to insert into the table at creation time. In this case, you can create an empty table and specify the schema.

=== "Python" python import pyarrow as pa schema = pa.schema([pa.field("vector", pa.list_(pa.float32(), list_size=2))]) tbl = db.create_table("empty_table", schema=schema)

How to open an existing table

Once created, you can open a table using the following code:

=== "Python" python tbl = db.open_table("my_table")

  If you forget the name of your table, you can always get a listing of all table names:

  ```python
  print(db.table_names())
  ```

=== "Javascript" javascript const tbl = await db.openTable("myTable");

  If you forget the name of your table, you can always get a listing of all table names:

  ```javascript
  console.log(await db.tableNames());
  ```

How to add data to a table

After a table has been created, you can always add more data to it using

=== "Python" ```python

  # Option 1: Add a list of dicts to a table
  data = [{"vector": [1.3, 1.4], "item": "fizz", "price": 100.0},
          {"vector": [9.5, 56.2], "item": "buzz", "price": 200.0}]
  tbl.add(data)

  # Option 2: Add a pandas DataFrame to a table
  df = pd.DataFrame(data)
  tbl.add(data)
  ```

=== "Javascript" javascript await tbl.add([{vector: [1.3, 1.4], item: "fizz", price: 100.0}, {vector: [9.5, 56.2], item: "buzz", price: 200.0}])

How to search for (approximate) nearest neighbors

Once you've embedded the query, you can find its nearest neighbors using the following code:

=== "Python" python tbl.search([100, 100]).limit(2).to_pandas()

  This returns a pandas DataFrame with the results.

=== "Javascript" javascript const query = await tbl.search([100, 100]).limit(2).execute();

How to delete rows from a table

Use the delete() method on tables to delete rows from a table. To choose which rows to delete, provide a filter that matches on the metadata columns. This can delete any number of rows that match the filter.

=== "Python" python tbl.delete('item = "fizz"')

=== "Javascript" javascript await tbl.delete('item = "fizz"')

The deletion predicate is a SQL expression that supports the same expressions as the where() clause on a search. They can be as simple or complex as needed. To see what expressions are supported, see the SQL filters section.

=== "Python"

  Read more: [lancedb.table.Table.delete][]

=== "Javascript"

  Read more: [vectordb.Table.delete](javascript/interfaces/Table.md#delete)

How to remove a table

Use the drop_table() method on the database to remove a table.

=== "Python" python db.drop_table("my_table")

  This permanently removes the table and is not recoverable, unlike deleting rows.
  By default, if the table does not exist an exception is raised. To suppress this,
  you can pass in `ignore_missing=True`.

=== "JavaScript" javascript await db.dropTable('myTable')

  This permanently removes the table and is not recoverable, unlike deleting rows.
  If the table does not exist an exception is raised. 

!!! note "Bundling vectordb apps with Webpack"

If you're using the `vectordb` module in JavaScript, since LanceDB contains a prebuilt Node binary, you must configure `next.config.js` to exclude it from webpack. This is required for both using Next.js and deploying a LanceDB app on Vercel.

```javascript
/** @type {import('next').NextConfig} */
module.exports = ({
webpack(config) {
    config.externals.push({ vectordb: 'vectordb' })
    return config;
}
})
```

What's next

This section covered the very basics of using LanceDB. If you're learning about vector databases for the first time, you may want to read the page on indexing to get familiar with the concepts.

If you've already worked with other vector databases, you may want to read the guides to learn how to work with LanceDB in more detail.