The LanceDB embeddings registry allows users to annotate the pydantic
model used as table schema with the desired embedding function, e.g.:
```python
class Schema(LanceModel):
id: str
vector: Vector(openai.ndims()) = openai.VectorField()
text: str = openai.SourceField()
```
Tables created like this does not require embeddings to be calculated by
the user explicitly, e.g. this works:
```python
table.add([{"id": "foo", "text": "rust all the things"}])
```
However, trying to construct pydantic model instances without vector
doesn't because it's a required field.
Instead, you need add a default value:
```python
class Schema(LanceModel):
id: str
vector: Vector(openai.ndims()) = openai.VectorField(default=None)
text: str = openai.SourceField()
```
then this completes without errors:
```python
table.add([Schema(id="foo", text="rust all the things")])
```
However, all of the vectors are filled with zeros. Instead in
add_vector_col we have to add an additional check so that the embedding
generation is called.
LanceDB is an open-source database for vector-search built with persistent storage, which greatly simplifies retrevial, filtering and management of embeddings.
The key features of LanceDB include:
-
Production-scale vector search with no servers to manage.
-
Store, query and filter vectors, metadata and multi-modal data (text, images, videos, point clouds, and more).
-
Support for vector similarity search, full-text search and SQL.
-
Native Python and Javascript/Typescript support.
-
Zero-copy, automatic versioning, manage versions of your data without needing extra infrastructure.
-
GPU support in building vector index(*).
-
Ecosystem integrations with LangChain 🦜️🔗, LlamaIndex 🦙, Apache-Arrow, Pandas, Polars, DuckDB and more on the way.
LanceDB's core is written in Rust 🦀 and is built using Lance, an open-source columnar format designed for performant ML workloads.
Quick Start
Javascript
npm install vectordb
const lancedb = require('vectordb');
const db = await lancedb.connect('data/sample-lancedb');
const table = await db.createTable({
name: 'vectors',
data: [
{ id: 1, vector: [0.1, 0.2], item: "foo", price: 10 },
{ id: 2, vector: [1.1, 1.2], item: "bar", price: 50 }
]
})
const query = table.search([0.1, 0.3]).limit(2);
const results = await query.execute();
// You can also search for rows by specific criteria without involving a vector search.
const rowsByCriteria = await table.search(undefined).where("price >= 10").execute();
Python
pip install lancedb
import lancedb
uri = "data/sample-lancedb"
db = lancedb.connect(uri)
table = db.create_table("my_table",
data=[{"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
{"vector": [5.9, 26.5], "item": "bar", "price": 20.0}])
result = table.search([100, 100]).limit(2).to_pandas()
