mirror of https://github.com/lancedb/lancedb.git synced 2026-05-15 02:50:44 +00:00

Go to file

Will Jones fbf4a53475 feat(rust): implement TableProvider::insert_into() for LanceDB tables (#2939 )

Implements `InsertExec` and `RemoteInsertExec` to support running
inserts in DataFusion.

## Context

In https://github.com/lancedb/lancedb/pull/2929, I've prototyped moving
the insert pipeline into DataFusion. This will enable parallelism at two
levels:

1. Running preprocessing, such as casting the input schema or computing
embeddings
2. Writing out files

This PR is just the first part of running the actual writes. In the end,
the plans might look like:

```
InsertExec
  RepartitionExec num_partitions=<write_parallelism>
    ProjectionExec vector=compute_embedding()
      RepartitionExec num_partitions=<num_cpus>
        DataSourceExec
```

where `num_cpus` is used to take advantage of all cores, while
`write_parallelism` might be less than `num_cpus` if there are too few
rows to want to split writes across `num_cpus` files.

Later PRs will move the preprocessing steps into DataFusion, and then
hook this up to the `Table::add()` implementations.

## Relation to future SQL work

We eventually plan on having the Remote SDK go through a FlightSQL
endpoint. Then for most queries we will send just the SQL string to the
server, and not run any sort of DataFusion plan on the client.

However, I think writes will be a little special, especially bulk writes
where we need to upload large streams of data and likely want
parallelism. So we'll have different code paths for writes, and I think
using DataFusion makes sense, especially as long as we are doing the
pre-processing on the client side still.

2026-02-03 10:38:02 -08:00

.cargo

chore: clippy::string_to_string has been replaced by implicit_clone (#2817 )

2025-11-26 16:30:35 +08:00

.github

chore!: change support python version from 3.10 to 3.13 (#2955 )

2026-01-30 01:47:50 +08:00

test: convert test_table_names to test both remote and local (#2888 )

2026-01-02 15:08:44 -08:00

dockerfiles

A simple base usage that install the dependencies necessary to use FT… (#1036 )

2024-04-05 16:31:36 -07:00

docs

fix(node): allow bigint[] for takeRowIds (#2916 )

2026-02-03 10:09:51 -08:00

java

Bump version: 0.24.1 → 0.25.0-beta.0

2026-02-03 04:48:34 +00:00

nodejs

fix(node): allow bigint[] for takeRowIds (#2916 )

2026-02-03 10:09:51 -08:00

python

feat(python): expose fast_search in synchronous API (Fixes #2612 ) (#2962 )

2026-02-03 09:17:27 -08:00

rust

feat(rust): implement TableProvider::insert_into() for LanceDB tables (#2939 )

2026-02-03 10:38:02 -08:00

.bumpversion.toml

Bump version: 0.24.1 → 0.25.0-beta.0

2026-02-03 04:48:34 +00:00

.gitignore

feat: bump lance version to 0.40-0-beta.2 (#2772 )

2025-11-10 14:36:37 -08:00

.pre-commit-config.yaml

fix(python): typing (#2167 )

2025-03-10 09:01:23 -07:00

AGENTS.md

ci: add agents and add reviewing instructions (#2754 )

2025-10-29 17:28:26 -07:00

Cargo.lock

feat(rust): implement TableProvider::insert_into() for LanceDB tables (#2939 )

2026-02-03 10:38:02 -08:00

Cargo.toml

feat(rust): implement TableProvider::insert_into() for LanceDB tables (#2939 )

2026-02-03 10:38:02 -08:00

CLAUDE.md

ci: add agents and add reviewing instructions (#2754 )

2025-10-29 17:28:26 -07:00

CONTRIBUTING.md

docs: contributing guide (#1970 )

2025-01-07 15:11:16 -08:00

docker-compose.yml

feat: expose storage options in LanceDB (#1204 )

2024-04-10 10:12:04 -07:00

LICENSE

initial commit

2023-03-17 18:15:19 -07:00

pyright_report.csv

fix(python): typing (#2167 )

2025-03-10 09:01:23 -07:00

README.md

docs: update REST API link in README.md (#2906 )

2026-01-30 15:49:41 -08:00

release_process.md

ci: enable java auto release (#1602 )

2024-09-19 10:51:03 -07:00

rust-toolchain.toml

feat: bump lance to 0.38.3-beta.2 and rust to 1.90.0 (#2714 )

2025-10-10 14:02:41 -07:00

README.md

The Multimodal AI Lakehouse

How to Install ✦ Detailed Documentation ✦ Tutorials and Recipes ✦ Contributors

The ultimate multimodal data platform for AI/ML applications.

LanceDB is designed for fast, scalable, and production-ready vector search. It is built on top of the Lance columnar format. You can store, index, and search over petabytes of multimodal data and vectors with ease. LanceDB is a central location where developers can build, train and analyze their AI workloads.

Demo: Multimodal Search by Keyword, Vector or with SQL

Star LanceDB to get updates!

⭐ Click here ⭐ to see how fast we're growing!

Key Features:

Fast Vector Search: Search billions of vectors in milliseconds with state-of-the-art indexing.
Comprehensive Search: Support for vector similarity search, full-text search and SQL.
Multimodal Support: Store, query and filter vectors, metadata and multimodal data (text, images, videos, point clouds, and more).
Advanced Features: Zero-copy, automatic versioning, manage versions of your data without needing extra infrastructure. GPU support in building vector index.

Products:

Open Source & Local: 100% open source, runs locally or in your cloud. No vendor lock-in.
Cloud and Enterprise: Production-scale vector search with no servers to manage. Complete data sovereignty and security.

Ecosystem:

Columnar Storage: Built on the Lance columnar format for efficient storage and analytics.
Seamless Integration: Python, Node.js, Rust, and REST APIs for easy integration. Native Python and Javascript/Typescript support.
Rich Ecosystem: Integrations with LangChain 🦜️🔗, LlamaIndex 🦙, Apache-Arrow, Pandas, Polars, DuckDB and more on the way.

How to Install:

Follow the Quickstart doc to set up LanceDB locally.

API & SDK: We also support Python, Typescript and Rust SDKs

Interface	Documentation
Python SDK	https://lancedb.github.io/lancedb/python/python/
Typescript SDK	https://lancedb.github.io/lancedb/js/globals/
Rust SDK	https://docs.rs/lancedb/latest/lancedb/index.html
REST API	https://docs.lancedb.com/api-reference/rest

Join Us and Contribute

We welcome contributions from everyone! Whether you're a developer, researcher, or just someone who wants to help out.

If you have any suggestions or feature requests, please feel free to open an issue on GitHub or discuss it on our Discord server.

Check out the GitHub Issues if you would like to work on the features that are planned for the future. If you have any suggestions or feature requests, please feel free to open an issue on GitHub.

Contributors

Stay in Touch With Us

Languages

HTML 39.5%

Rust 29%

Python 23%

TypeScript 8%

Shell 0.3%

Other 0.1%

README.md Unescape Escape

The Multimodal AI Lakehouse

Demo: Multimodal Search by Keyword, Vector or with SQL

Star LanceDB to get updates!

Key Features:

Products:

Ecosystem:

How to Install:

Join Us and Contribute

Contributors

Stay in Touch With Us

README.md