lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2026-07-04 03:20:40 +00:00

Go to file

Will Jones a975cc0a94 fix: prevent duplicate data in FTS index (#728 )

This forces the user to replace the whole FTS directory when re-creating
the index, prevent duplicate data from being created. Previously, the
whole dataset was re-added to the existing index, duplicating existing
rows in the index.

This (in combination with lancedb/lance#1707) caused #726, since the
duplicate data emitted duplicate indices for `take()` and an upstream
issue caused those queries to fail.

This solution isn't ideal, since it makes the FTS index temporarily
unavailable while the index is built. In the future, we should have
multiple FTS index directories, which would allow atomic commits of new
indexes (as well as multiple indexes for different columns).

Fixes #498.
Fixes #726.

---------

Co-authored-by: Chang She <759245+changhiskhan@users.noreply.github.com>

2024-04-05 16:24:30 -07:00

.github/workflows

feat: support nested pydantic schema (#707 )

2024-04-05 16:24:30 -07:00

chore: set error handling to immediate (#686 )

2024-04-05 16:23:49 -07:00

docs

feat(python): add option to flatten output in to_pandas (#722 )

2024-04-05 16:24:30 -07:00

node

feat: Node create index API (#720 )

2024-04-05 16:24:30 -07:00

python

fix: prevent duplicate data in FTS index (#728 )

2024-04-05 16:24:30 -07:00

rust

upgrade lance to v0.9.1 (#727 )

2024-04-05 16:24:30 -07:00

.bumpversion.cfg

Bump version: 0.3.11 → 0.4.0

2024-04-05 16:24:30 -07:00

.gitignore

feat(node): pull node binaries into separate packages (3) (#285 )

2023-07-12 16:52:04 -07:00

.pre-commit-config.yaml

Handle NaN input data (#241 )

2023-07-04 20:00:46 -07:00

Cargo.toml

upgrade lance to v0.9.1 (#727 )

2024-04-05 16:24:30 -07:00

docker-compose.yml

add health check to wait for all service ready before next step (#501 )

2023-09-18 15:17:45 -04:00

LICENSE

initial commit

2023-03-17 18:15:19 -07:00

README.md

docs: Add badges (#694 )

2024-04-05 16:23:49 -07:00

README.md

Developer-friendly, serverless vector database for AI applications

LanceDB is an open-source database for vector-search built with persistent storage, which greatly simplifies retrevial, filtering and management of embeddings.

The key features of LanceDB include:

Production-scale vector search with no servers to manage.
Store, query and filter vectors, metadata and multi-modal data (text, images, videos, point clouds, and more).
Support for vector similarity search, full-text search and SQL.
Native Python and Javascript/Typescript support.
Zero-copy, automatic versioning, manage versions of your data without needing extra infrastructure.
GPU support in building vector index(*).
Ecosystem integrations with LangChain 🦜️🔗, LlamaIndex 🦙, Apache-Arrow, Pandas, Polars, DuckDB and more on the way.

LanceDB's core is written in Rust 🦀 and is built using Lance, an open-source columnar format designed for performant ML workloads.

Quick Start

Javascript

npm install vectordb

const lancedb = require('vectordb');
const db = await lancedb.connect('data/sample-lancedb');

const table = await db.createTable('vectors',
      [{ id: 1, vector: [0.1, 0.2], item: "foo", price: 10 },
       { id: 2, vector: [1.1, 1.2], item: "bar", price: 50 }])

const query = table.search([0.1, 0.3]).limit(2);
const results = await query.execute();

Python

pip install lancedb

import lancedb

uri = "data/sample-lancedb"
db = lancedb.connect(uri)
table = db.create_table("my_table",
                         data=[{"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
                               {"vector": [5.9, 26.5], "item": "bar", "price": 20.0}])
result = table.search([100, 100]).limit(2).to_pandas()

Blogs, Tutorials & Videos

Languages

HTML 35.2%

Rust 32.7%

Python 23.8%

TypeScript 7.8%

Shell 0.3%

Other 0.1%

README.md Unescape Escape

Quick Start

Blogs, Tutorials & Videos

README.md