lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2025-12-22 21:09:58 +00:00

Go to file

msu-reevo cc81f3e1a5 fix(python): typing (#2167 )

@wjones127 is there a standard way you guys setup your virtualenv? I can
either relist all the dependencies in the pyright precommit section, or
specify a venv, or the user has to be in the virtual environment when
they run git commit. If the venv location was standardized or a python
manager like `uv` was used it would be easier to avoid duplicating the
pyright dependency list.

Per your suggestion, in `pyproject.toml` I added in all the passing
files to the `includes` section.

For ruff I upgraded the version and removed "TCH" which doesn't exist as
an option.

I added a `pyright_report.csv` which contains a list of all files sorted
by pyright errors ascending as a todo list to work on.

I fixed about 30 issues in `table.py` stemming from str's being passed
into methods that required a string within a set of string Literals by
extracting them into `types.py`

Can you verify in the rust bridge that the schema should be a property
and not a method here? If it's a method, then there's another place in
the code where `inner.schema` should be `inner.schema()`
``` python
class RecordBatchStream:
    @property
    def schema(self) -> pa.Schema: ...
```

Also unless the `_lancedb.pyi` file is wrong, then there is no
`__anext__` here for `__inner` when it's not an `AsyncGenerator` and
only `next` is defined:
``` python
    async def __anext__(self) -> pa.RecordBatch:
        return await self._inner.__anext__()
        if isinstance(self._inner, AsyncGenerator):
            batch = await self._inner.__anext__()
        else:
            batch = await self._inner.next()
        if batch is None:
            raise StopAsyncIteration
        return batch
```
in the else statement, `_inner` is a `RecordBatchStream`
```python
class RecordBatchStream:
    @property
    def schema(self) -> pa.Schema: ...
    async def next(self) -> Optional[pa.RecordBatch]: ...
```

---------

Co-authored-by: Will Jones <willjones127@gmail.com>

2025-03-10 09:01:23 -07:00

.cargo

ci: musl x64,arm64 (#1853 )

2024-11-20 10:53:19 -08:00

.github

fix(python): typing (#2167 )

2025-03-10 09:01:23 -07:00

fix(python): typing (#2167 )

2025-03-10 09:01:23 -07:00

dockerfiles

A simple base usage that install the dependencies necessary to use FT… (#1036 )

2024-04-05 16:31:36 -07:00

docs

chore: add reo integration (#2149 )

2025-02-28 07:51:34 -08:00

java

Bump version: 0.17.0 → 0.18.0-beta.0

2025-02-26 20:11:07 +00:00

node

Updating package-lock.json

2025-02-26 21:23:39 +00:00

nodejs

Updating package-lock.json

2025-02-26 20:11:37 +00:00

python

fix(python): typing (#2167 )

2025-03-10 09:01:23 -07:00

rust

feat: respect datafusion's batch size when running as a table provider (#2187 )

2025-03-07 05:53:36 -08:00

.bumpversion.toml

Bump version: 0.17.0 → 0.18.0-beta.0

2025-02-26 20:11:07 +00:00

.gitignore

ci(rust): caching improvements (up to 2.8x faster builds) (#2075 )

2025-01-29 08:26:45 -08:00

.pre-commit-config.yaml

fix(python): typing (#2167 )

2025-03-10 09:01:23 -07:00

Cargo.lock

feat: respect datafusion's batch size when running as a table provider (#2187 )

2025-03-07 05:53:36 -08:00

Cargo.toml

chore: upgrade lance to 0.24.0-beta.1 (#2171 )

2025-03-03 12:32:12 +08:00

CONTRIBUTING.md

docs: contributing guide (#1970 )

2025-01-07 15:11:16 -08:00

docker-compose.yml

feat: expose storage options in LanceDB (#1204 )

2024-04-10 10:12:04 -07:00

LICENSE

initial commit

2023-03-17 18:15:19 -07:00

pyright_report.csv

fix(python): typing (#2167 )

2025-03-10 09:01:23 -07:00

README.md

docs: introducing LanceDB Guru on Gurubase.io (#1797 )

2024-11-08 10:55:22 -08:00

release_process.md

ci: enable java auto release (#1602 )

2024-09-19 10:51:03 -07:00

rust-toolchain.toml

ci(rust): check MSRV and upgrade toolchain (#1960 )

2024-12-19 08:43:25 -08:00

README.md

Developer-friendly, database for multimodal AI

LanceDB is an open-source database for vector-search built with persistent storage, which greatly simplifies retrieval, filtering and management of embeddings.

The key features of LanceDB include:

Production-scale vector search with no servers to manage.
Store, query and filter vectors, metadata and multi-modal data (text, images, videos, point clouds, and more).
Support for vector similarity search, full-text search and SQL.
Native Python and Javascript/Typescript support.
Zero-copy, automatic versioning, manage versions of your data without needing extra infrastructure.
GPU support in building vector index(*).
Ecosystem integrations with LangChain 🦜️🔗, LlamaIndex 🦙, Apache-Arrow, Pandas, Polars, DuckDB and more on the way.

LanceDB's core is written in Rust 🦀 and is built using Lance, an open-source columnar format designed for performant ML workloads.

Quick Start

Javascript

npm install @lancedb/lancedb

import * as lancedb from "@lancedb/lancedb";

const db = await lancedb.connect("data/sample-lancedb");
const table = await db.createTable("vectors", [
	{ id: 1, vector: [0.1, 0.2], item: "foo", price: 10 },
	{ id: 2, vector: [1.1, 1.2], item: "bar", price: 50 },
], {mode: 'overwrite'});


const query = table.vectorSearch([0.1, 0.3]).limit(2);
const results = await query.toArray();

// You can also search for rows by specific criteria without involving a vector search.
const rowsByCriteria = await table.query().where("price >= 10").toArray();

Python

pip install lancedb

import lancedb

uri = "data/sample-lancedb"
db = lancedb.connect(uri)
table = db.create_table("my_table",
                         data=[{"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
                               {"vector": [5.9, 26.5], "item": "bar", "price": 20.0}])
result = table.search([100, 100]).limit(2).to_pandas()

Blogs, Tutorials & Videos

Languages

Rust 42.8%

Python 41.8%

TypeScript 14.3%

Shell 0.6%

Java 0.3%

README.md Unescape Escape

Quick Start

Blogs, Tutorials & Videos

README.md