mirror of https://github.com/lancedb/lancedb.git synced 2026-07-06 20:40:41 +00:00

Go to file

Will Jones c0230f91d2 feat(rust)!: accept RecordBatch, Vec<RecordBatch> in create_table() and Table.add() (#2948 )

BREAKING CHANGE: Arbitrary `impl RecordBatchReader` is no longer
accepted, it must be made into `Box<dyn RecordBatchReader>`.

This PR replaces `IntoArrow` with a new trait `Scannable` to define
input row data. This provides the following advantages:

1. **We can implement `Scannable` for more types than `IntoArrow`, such
as `RecordBatch` and `Vec<RecordBatch>`.** The `IntoArrow` trait was
implemented for arbitrary `T: RecordBatchReader`, and the Rust compiler
would prevent us from implementing it for foreign types like
`RecordBatch` because (theoretically) those types might implement
`RecordBatchReader` in the future. That's why we implement `Scannable`
for `Box<dyn RecordBatchReader>` instead; since it's a concrete type it
doesn't block implementing for other foreign types.
2. **We can potentially replay `Scannable` values**. Previously, we had
to choose between buffering all data in memory and supporting retries of
writes. But because `Scannable` things can optionally support
re-scanning, we now have a way of supporting retries while also
streaming.
3. **`Scannable` can provide hints like `num_rows`, which can be used to
schedule parallel writers.** Without knowing the total number of rows,
it's difficult to know whether it's worth writing multiple files in
parallel.

We don't yet fully take advantage of (2) and (3) yet, but will in future
PRs. For (2), in order to be ready to leverage this, we need to hook the
`Scannable` implementation up to Python and NodeJS bindings. Right now
they always pass down a stream, but we want to make sure they support
retries when possible. And for (3), this will need to be hooked up to
#2939 and to a pipeline for running pre-processing steps (like embedding
generation).

## Other changes

* Moved `create_table` and `add_data` into their own modules. I've
created a follow up issue to split up `table.rs` further, as it's by far
the largest file: https://github.com/lancedb/lancedb/issues/2949
* Eliminated the `HAS_DATA` generic for `CreateTableBuilder`. I didn't
see any public-facing places where we differentiated methods, which is
why I felt this simplification was okay.
* Added an `Error::External` variant and integrated some conversions to
allow certain errors to pass through transparently. This will fully work
once we upgrade Lance and get to take advantage of changes in
https://github.com/lance-format/lance/pull/5606
* Added LZ4 compression support for write requests to remote endpoints.
I checked and this has been supported on the server for > 1 year.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-13 14:18:36 -08:00

.cargo

chore: clippy::string_to_string has been replaced by implicit_clone (#2817 )

2025-11-26 16:30:35 +08:00

.github

ci: upgrade node version for publishing (#2993 )

2026-02-06 16:30:46 -08:00

test: convert test_table_names to test both remote and local (#2888 )

2026-01-02 15:08:44 -08:00

dockerfiles

A simple base usage that install the dependencies necessary to use FT… (#1036 )

2024-04-05 16:31:36 -07:00

docs

Bump version: 0.26.2-beta.0 → 0.26.2

2026-02-09 06:06:22 +00:00

java

feat: update lance dependency to v2.0.1 (#3027 )

2026-02-13 13:53:02 -08:00

nodejs

feat(rust)!: accept RecordBatch, Vec<RecordBatch> in create_table() and Table.add() (#2948 )

2026-02-13 14:18:36 -08:00

python

feat(rust)!: accept RecordBatch, Vec<RecordBatch> in create_table() and Table.add() (#2948 )

2026-02-13 14:18:36 -08:00

rust

feat(rust)!: accept RecordBatch, Vec<RecordBatch> in create_table() and Table.add() (#2948 )

2026-02-13 14:18:36 -08:00

.bumpversion.toml

Bump version: 0.26.2-beta.0 → 0.26.2

2026-02-09 06:06:22 +00:00

.gitignore

feat: bump lance version to 0.40-0-beta.2 (#2772 )

2025-11-10 14:36:37 -08:00

.pre-commit-config.yaml

fix(python): typing (#2167 )

2025-03-10 09:01:23 -07:00

about.hbs

feat: add third party licenses lists (#3010 )

2026-02-09 16:16:46 -08:00

about.toml

feat: add third party licenses lists (#3010 )

2026-02-09 16:16:46 -08:00

AGENTS.md

ci: add agents and add reviewing instructions (#2754 )

2025-10-29 17:28:26 -07:00

Cargo.lock

feat: update lance dependency to v2.0.1 (#3027 )

2026-02-13 13:53:02 -08:00

Cargo.toml

feat: update lance dependency to v2.0.1 (#3027 )

2026-02-13 13:53:02 -08:00

CLAUDE.md

ci: add agents and add reviewing instructions (#2754 )

2025-10-29 17:28:26 -07:00

CONTRIBUTING.md

docs: contributing guide (#1970 )

2025-01-07 15:11:16 -08:00

docker-compose.yml

feat: expose storage options in LanceDB (#1204 )

2024-04-10 10:12:04 -07:00

LICENSE

initial commit

2023-03-17 18:15:19 -07:00

Makefile

feat: add third party licenses lists (#3010 )

2026-02-09 16:16:46 -08:00

pyright_report.csv

fix(python): typing (#2167 )

2025-03-10 09:01:23 -07:00

README.md

docs: update REST API link in README.md (#2906 )

2026-01-30 15:49:41 -08:00

release_process.md

ci: enable java auto release (#1602 )

2024-09-19 10:51:03 -07:00

RUST_THIRD_PARTY_LICENSES.html

feat: add third party licenses lists (#3010 )

2026-02-09 16:16:46 -08:00

rust-toolchain.toml

feat: bump lance to 0.38.3-beta.2 and rust to 1.90.0 (#2714 )

2025-10-10 14:02:41 -07:00

README.md

The Multimodal AI Lakehouse

How to Install ✦ Detailed Documentation ✦ Tutorials and Recipes ✦ Contributors

The ultimate multimodal data platform for AI/ML applications.

LanceDB is designed for fast, scalable, and production-ready vector search. It is built on top of the Lance columnar format. You can store, index, and search over petabytes of multimodal data and vectors with ease. LanceDB is a central location where developers can build, train and analyze their AI workloads.

Demo: Multimodal Search by Keyword, Vector or with SQL

Star LanceDB to get updates!

⭐ Click here ⭐ to see how fast we're growing!

Key Features:

Fast Vector Search: Search billions of vectors in milliseconds with state-of-the-art indexing.
Comprehensive Search: Support for vector similarity search, full-text search and SQL.
Multimodal Support: Store, query and filter vectors, metadata and multimodal data (text, images, videos, point clouds, and more).
Advanced Features: Zero-copy, automatic versioning, manage versions of your data without needing extra infrastructure. GPU support in building vector index.

Products:

Open Source & Local: 100% open source, runs locally or in your cloud. No vendor lock-in.
Cloud and Enterprise: Production-scale vector search with no servers to manage. Complete data sovereignty and security.

Ecosystem:

Columnar Storage: Built on the Lance columnar format for efficient storage and analytics.
Seamless Integration: Python, Node.js, Rust, and REST APIs for easy integration. Native Python and Javascript/Typescript support.
Rich Ecosystem: Integrations with LangChain 🦜️🔗, LlamaIndex 🦙, Apache-Arrow, Pandas, Polars, DuckDB and more on the way.

How to Install:

Follow the Quickstart doc to set up LanceDB locally.

API & SDK: We also support Python, Typescript and Rust SDKs

Interface	Documentation
Python SDK	https://lancedb.github.io/lancedb/python/python/
Typescript SDK	https://lancedb.github.io/lancedb/js/globals/
Rust SDK	https://docs.rs/lancedb/latest/lancedb/index.html
REST API	https://docs.lancedb.com/api-reference/rest

Join Us and Contribute

We welcome contributions from everyone! Whether you're a developer, researcher, or just someone who wants to help out.

If you have any suggestions or feature requests, please feel free to open an issue on GitHub or discuss it on our Discord server.

Check out the GitHub Issues if you would like to work on the features that are planned for the future. If you have any suggestions or feature requests, please feel free to open an issue on GitHub.

Contributors

Stay in Touch With Us

Languages

HTML 34.6%

Rust 32.4%

Python 24.8%

TypeScript 7.7%

Shell 0.3%

Other 0.1%

README.md Unescape Escape

The Multimodal AI Lakehouse

Demo: Multimodal Search by Keyword, Vector or with SQL

Star LanceDB to get updates!

Key Features:

Products:

Ecosystem:

How to Install:

Join Us and Contribute

Contributors

Stay in Touch With Us

README.md