lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2026-01-08 04:42:57 +00:00

Go to file

Cory Grinstead 9d2fb7d602 feat: rust embedding registry (#1259 )

Todo:

- [x] add proper documentation
- [x] add unit tests
- [x] better handling of the registry**1
- [x] allow user defined registry**2

**1 The python implementation just uses a global registry so it makes
things a bit easier. I attached it to the db/connection to prevent
future conflicts if running multiple connections/databases. I mostly
modeled the registry & pattern off of datafusion's
[FunctionRegistry](https://docs.rs/datafusion/latest/datafusion/execution/trait.FunctionRegistry.html).

**2 Ideally, the user should be able to provide it's own registry
entirely, but currently it just uses an in memory registry by default
(_which isn't configurable_)

`rust/lancedb/examples/embedding_registry.rs` provides a thorough
example of expected usage.

---

Some additional notes:

This does not provide any of the out of box functionality that the
python registry does.

_i.e there are no built-in embedding functions._ 

You can think of this as the ground work for adding those built in
functions, So while this is part of
https://github.com/lancedb/lancedb/issues/994, it does not yet offer
feature parity.

2024-05-06 18:39:07 -05:00

.cargo

fix: use static C runtime on Windows (#979 )

2024-04-05 16:30:40 -07:00

.github

ci: fix failures in release scripts (#1215 )

2024-04-10 13:09:39 -07:00

feat: add publish step for nodejs (#1155 )

2024-04-05 16:33:37 -07:00

dockerfiles

A simple base usage that install the dependencies necessary to use FT… (#1036 )

2024-04-05 16:31:36 -07:00

docs

fix: Docs for embed_func fixed in youtube transcript search notebook (#1269 )

2024-05-06 11:48:25 +05:30

node

Updating package-lock.json

2024-04-30 20:57:12 +00:00

nodejs

chore(nodejs): update docs on "table.ts" (#1255 )

2024-05-01 23:00:22 -05:00

python

Chore (python): Better retry loop logging when embedding api fails (#1267 )

2024-05-06 11:49:11 +05:30

rust

feat: rust embedding registry (#1259 )

2024-05-06 18:39:07 -05:00

.bumpversion.cfg

Bump version: 0.4.17 → 0.4.18

2024-04-30 19:21:37 +00:00

.gitignore

feat: rust embedding registry (#1259 )

2024-05-06 18:39:07 -05:00

.pre-commit-config.yaml

feat: page_token / limit to native table_names function. Use async table_names function from sync table_names function (#1059 )

2024-04-05 16:31:45 -07:00

Cargo.toml

chore: update to Lance version 0.10.16 and Arrow version 51 (#1247 )

2024-04-26 16:26:57 -07:00

docker-compose.yml

feat: expose storage options in LanceDB (#1204 )

2024-04-10 10:12:04 -07:00

LICENSE

initial commit

2023-03-17 18:15:19 -07:00

README.md

Update README.md to correct LangChain URL (#1262 )

2024-05-06 11:50:34 +05:30

README.md

Developer-friendly, database for multimodal AI

LanceDB is an open-source database for vector-search built with persistent storage, which greatly simplifies retrieval, filtering and management of embeddings.

The key features of LanceDB include:

Production-scale vector search with no servers to manage.
Store, query and filter vectors, metadata and multi-modal data (text, images, videos, point clouds, and more).
Support for vector similarity search, full-text search and SQL.
Native Python and Javascript/Typescript support.
Zero-copy, automatic versioning, manage versions of your data without needing extra infrastructure.
GPU support in building vector index(*).
Ecosystem integrations with LangChain 🦜️🔗, LlamaIndex 🦙, Apache-Arrow, Pandas, Polars, DuckDB and more on the way.

LanceDB's core is written in Rust 🦀 and is built using Lance, an open-source columnar format designed for performant ML workloads.

Quick Start

Javascript

npm install vectordb

const lancedb = require('vectordb');
const db = await lancedb.connect('data/sample-lancedb');

const table = await db.createTable({
  name: 'vectors',
  data:  [
    { id: 1, vector: [0.1, 0.2], item: "foo", price: 10 },
    { id: 2, vector: [1.1, 1.2], item: "bar", price: 50 }
  ]
})

const query = table.search([0.1, 0.3]).limit(2);
const results = await query.execute();

// You can also search for rows by specific criteria without involving a vector search.
const rowsByCriteria = await table.search(undefined).where("price >= 10").execute();

Python

pip install lancedb

import lancedb

uri = "data/sample-lancedb"
db = lancedb.connect(uri)
table = db.create_table("my_table",
                         data=[{"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
                               {"vector": [5.9, 26.5], "item": "bar", "price": 20.0}])
result = table.search([100, 100]).limit(2).to_pandas()

Blogs, Tutorials & Videos

Languages

Rust 42.8%

Python 41.9%

TypeScript 14.2%

Shell 0.6%

Java 0.3%

README.md Unescape Escape

Quick Start

Blogs, Tutorials & Videos

README.md