lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2026-05-14 02:20:40 +00:00

Go to file

Will Jones 39cc2fd62b feat(python): add read_consistency_interval argument (#828 )

This PR refactors how we handle read consistency: does the `LanceTable`
class always pick up modifications to the table made by other instance
or processes. Users have three options they can set at the connection
level:

1. (Default) `read_consistency_interval=None` means it will not check at
all. Users can call `table.checkout_latest()` to manually check for
updates.
2. `read_consistency_interval=timedelta(0)` means **always** check for
updates, giving strong read consistency.
3. `read_consistency_interval=timedelta(seconds=20)` means check for
updates every 20 seconds. This is eventual consistency, a compromise
between the two options above.

There is now an explicit difference between a `LanceTable` that tracks
the current version and one that is fixed at a historical version. We
now enforce that users cannot write if they have checked out an old
version. They are instructed to call `checkout_latest()` before calling
the write methods.

Since `conn.open_table()` doesn't have a parameter for version, users
will only get fixed references if they call `table.checkout()`.

The difference between these two can be seen in the repr: Table that are
fixed at a particular version will have a `version` displayed in the
repr. Otherwise, the version will not be shown.

```python
>>> table
LanceTable(connection=..., name="my_table")
>>> table.checkout(1)
>>> table
LanceTable(connection=..., name="my_table", version=1)
```

I decided to not create different classes for these states, because I
think we already have enough complexity with the Cloud vs OSS table
references.

Based on #812

2024-04-05 16:29:57 -07:00

.cargo

chore: add global cargo config to enable minimal cpu target (#925 )

2024-04-05 16:29:13 -07:00

.github

ci: bump to new version of python action to use node 20 gIthub action runtime (#909 )

2024-04-05 16:29:05 -07:00

chore: set error handling to immediate (#686 )

2024-04-05 16:23:49 -07:00

docs

fix hybrid search example (#922 )

2024-04-05 16:29:13 -07:00

node

Bump version: 0.4.7 → 0.4.8

2024-04-05 16:29:05 -07:00

nodejs

chore: add global cargo config to enable minimal cpu target (#925 )

2024-04-05 16:29:13 -07:00

python

feat(python): add read_consistency_interval argument (#828 )

2024-04-05 16:29:57 -07:00

rust

chore: add global cargo config to enable minimal cpu target (#925 )

2024-04-05 16:29:13 -07:00

.bumpversion.cfg

Bump version: 0.4.7 → 0.4.8

2024-04-05 16:29:05 -07:00

.gitignore

feat: rework NodeJS SDK using napi (#847 )

2024-04-05 16:27:51 -07:00

.pre-commit-config.yaml

Handle NaN input data (#241 )

2023-07-04 20:00:46 -07:00

Cargo.toml

chore: add global cargo config to enable minimal cpu target (#925 )

2024-04-05 16:29:13 -07:00

docker-compose.yml

add health check to wait for all service ready before next step (#501 )

2023-09-18 15:17:45 -04:00

LICENSE

initial commit

2023-03-17 18:15:19 -07:00

README.md

chore: update JS/TS example in README (#898 )

2024-04-05 16:28:56 -07:00

README.md

Developer-friendly, serverless vector database for AI applications

LanceDB is an open-source database for vector-search built with persistent storage, which greatly simplifies retrevial, filtering and management of embeddings.

The key features of LanceDB include:

Production-scale vector search with no servers to manage.
Store, query and filter vectors, metadata and multi-modal data (text, images, videos, point clouds, and more).
Support for vector similarity search, full-text search and SQL.
Native Python and Javascript/Typescript support.
Zero-copy, automatic versioning, manage versions of your data without needing extra infrastructure.
GPU support in building vector index(*).
Ecosystem integrations with LangChain 🦜️🔗, LlamaIndex 🦙, Apache-Arrow, Pandas, Polars, DuckDB and more on the way.

LanceDB's core is written in Rust 🦀 and is built using Lance, an open-source columnar format designed for performant ML workloads.

Quick Start

Javascript

npm install vectordb

const lancedb = require('vectordb');
const db = await lancedb.connect('data/sample-lancedb');

const table = await db.createTable({
  name: 'vectors',
  data:  [
    { id: 1, vector: [0.1, 0.2], item: "foo", price: 10 },
    { id: 2, vector: [1.1, 1.2], item: "bar", price: 50 }
  ]
})

const query = table.search([0.1, 0.3]).limit(2);
const results = await query.execute();

// You can also search for rows by specific criteria without involving a vector search.
const rowsByCriteria = await table.search(undefined).where("price >= 10").execute();

Python

pip install lancedb

import lancedb

uri = "data/sample-lancedb"
db = lancedb.connect(uri)
table = db.create_table("my_table",
                         data=[{"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
                               {"vector": [5.9, 26.5], "item": "bar", "price": 20.0}])
result = table.search([100, 100]).limit(2).to_pandas()

Blogs, Tutorials & Videos

Languages

HTML 39.8%

Rust 28.9%

Python 23.1%

TypeScript 7.7%

Shell 0.3%

Other 0.1%

README.md Unescape Escape

Quick Start

Blogs, Tutorials & Videos

README.md