lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2025-12-26 22:59:57 +00:00

Go to file

Chang She bc83bc9838 feat(python): add post filtering for full text search (#739 )

Closes #721 

fts will return results as a pyarrow table. Pyarrow tables has a
`filter` method but it does not take sql filter strings (only pyarrow
compute expressions). Instead, we do one of two things to support
`tbl.search("keywords").where("foo=5").limit(10).to_arrow()`:

Default path: If duckdb is available then use duckdb to execute the sql
filter string on the pyarrow table.
Backup path: Otherwise, write the pyarrow table to a lance dataset and
then do `to_table(filter=<filter>)`

Neither is ideal. 
Default path has two issues:
1. requires installing an extra library (duckdb)
2. duckdb mangles some fields (like fixed size list => list)

Backup path incurs a latency penalty (~20ms on ssd) to write the
resultset to disk.

In the short term, once #676 is addressed, we can write the dataset to
"memory://" instead of disk, this makes the post filter evaluate much
quicker (ETA next week).

In the longer term, we'd like to be able to evaluate the filter string
on the pyarrow Table directly, one possibility being that we use
Substrait to generate pyarrow compute expressions from sql string. Or if
there's enough progress on pyarrow, it could support Substrait
expressions directly (no ETA)

---------

Co-authored-by: Will Jones <willjones127@gmail.com>

2024-04-05 16:25:02 -07:00

.github

bug(python): fix path handling in windows (#724 )

2024-04-05 16:24:45 -07:00

chore: set error handling to immediate (#686 )

2024-04-05 16:23:49 -07:00

docs

feat(python): add post filtering for full text search (#739 )

2024-04-05 16:25:02 -07:00

node

fix: createIndex index cache size (#741 )

2024-04-05 16:25:02 -07:00

python

feat(python): add post filtering for full text search (#739 )

2024-04-05 16:25:02 -07:00

rust

Bump version: 0.4.0 → 0.4.1

2024-04-05 16:25:02 -07:00

.bumpversion.cfg

Bump version: 0.4.0 → 0.4.1

2024-04-05 16:25:02 -07:00

.gitignore

feat(node): pull node binaries into separate packages (3) (#285 )

2023-07-12 16:52:04 -07:00

.pre-commit-config.yaml

Handle NaN input data (#241 )

2023-07-04 20:00:46 -07:00

Cargo.toml

upgrade lance to v0.9.1 (#727 )

2024-04-05 16:24:30 -07:00

docker-compose.yml

add health check to wait for all service ready before next step (#501 )

2023-09-18 15:17:45 -04:00

LICENSE

initial commit

2023-03-17 18:15:19 -07:00

README.md

docs: Add badges (#694 )

2024-04-05 16:23:49 -07:00

README.md

Developer-friendly, serverless vector database for AI applications

LanceDB is an open-source database for vector-search built with persistent storage, which greatly simplifies retrevial, filtering and management of embeddings.

The key features of LanceDB include:

Production-scale vector search with no servers to manage.
Store, query and filter vectors, metadata and multi-modal data (text, images, videos, point clouds, and more).
Support for vector similarity search, full-text search and SQL.
Native Python and Javascript/Typescript support.
Zero-copy, automatic versioning, manage versions of your data without needing extra infrastructure.
GPU support in building vector index(*).
Ecosystem integrations with LangChain 🦜️🔗, LlamaIndex 🦙, Apache-Arrow, Pandas, Polars, DuckDB and more on the way.

LanceDB's core is written in Rust 🦀 and is built using Lance, an open-source columnar format designed for performant ML workloads.

Quick Start

Javascript

npm install vectordb

const lancedb = require('vectordb');
const db = await lancedb.connect('data/sample-lancedb');

const table = await db.createTable('vectors',
      [{ id: 1, vector: [0.1, 0.2], item: "foo", price: 10 },
       { id: 2, vector: [1.1, 1.2], item: "bar", price: 50 }])

const query = table.search([0.1, 0.3]).limit(2);
const results = await query.execute();

Python

pip install lancedb

import lancedb

uri = "data/sample-lancedb"
db = lancedb.connect(uri)
table = db.create_table("my_table",
                         data=[{"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
                               {"vector": [5.9, 26.5], "item": "bar", "price": 20.0}])
result = table.search([100, 100]).limit(2).to_pandas()

Blogs, Tutorials & Videos

Languages

Rust 42.8%

Python 41.8%

TypeScript 14.3%

Shell 0.6%

Java 0.3%

README.md Unescape Escape

Quick Start

Blogs, Tutorials & Videos

README.md