add() (#3067)
## Summary
Adds progress reporting for `table.add()` so users can track large write
operations. The progress callback is available in Rust, Python (sync and
async), and through the PyO3 bindings.
### Usage
Pass `progress=True` to get an automatic tqdm bar:
```python
table.add(data, progress=True)
# 100%|██████████| 1000000/1000000 [00:12<00:00, 82345 rows/s, 45.2 MB/s | 4/4 workers]
```
Or pass a tqdm bar for more control:
```python
from tqdm import tqdm
with tqdm(unit=" rows") as pbar:
table.add(data, progress=pbar)
```
Or use a callback for custom progress handling:
```python
def on_progress(p):
print(f"{p['output_rows']}/{p['total_rows']} rows, "
f"{p['active_tasks']}/{p['total_tasks']} workers, "
f"done={p['done']}")
table.add(data, progress=on_progress)
```
In Rust:
```rust
table.add(data)
.progress(|p| println!("{}/{:?} rows", p.output_rows(), p.total_rows()))
.execute()
.await?;
```
### Details
- `WriteProgress` struct in Rust with getters for `elapsed`,
`output_rows`, `output_bytes`, `total_rows`, `active_tasks`,
`total_tasks`, and `done`. Fields are private behind getters so new
fields can be added without breaking changes.
- `WriteProgressTracker` tracks progress across parallel write tasks
using a mutex for row/byte counts and atomics for active task counts.
- Active task tracking uses an RAII guard pattern (`ActiveTaskGuard`)
that increments on creation and decrements on drop.
- For remote writes, `output_bytes` reflects IPC wire bytes rather than
in-memory Arrow size. For local writes it uses in-memory Arrow size as a
proxy (see TODO below).
- tqdm postfix displays throughput (MB/s) and worker utilization
(active/total).
- The `done` callback always fires, even on error (via `FinishOnDrop`),
so progress bars are always finalized.
### TODO
- Track actual bytes written to disk for local tables. This requires
Lance to expose a progress callback from its write path. See
lance-format/lance#6247.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
The Multimodal AI Lakehouse
How to Install ✦ Detailed Documentation ✦ Tutorials and Recipes ✦ Contributors
The ultimate multimodal data platform for AI/ML applications.
LanceDB is designed for fast, scalable, and production-ready vector search. It is built on top of the Lance columnar format. You can store, index, and search over petabytes of multimodal data and vectors with ease. LanceDB is a central location where developers can build, train and analyze their AI workloads.
Demo: Multimodal Search by Keyword, Vector or with SQL
Star LanceDB to get updates!
Key Features:
- Fast Vector Search: Search billions of vectors in milliseconds with state-of-the-art indexing.
- Comprehensive Search: Support for vector similarity search, full-text search and SQL.
- Multimodal Support: Store, query and filter vectors, metadata and multimodal data (text, images, videos, point clouds, and more).
- Advanced Features: Zero-copy, automatic versioning, manage versions of your data without needing extra infrastructure. GPU support in building vector index.
Products:
- Open Source & Local: 100% open source, runs locally or in your cloud. No vendor lock-in.
- Cloud and Enterprise: Production-scale vector search with no servers to manage. Complete data sovereignty and security.
Ecosystem:
- Columnar Storage: Built on the Lance columnar format for efficient storage and analytics.
- Seamless Integration: Python, Node.js, Rust, and REST APIs for easy integration. Native Python and Javascript/Typescript support.
- Rich Ecosystem: Integrations with LangChain 🦜️🔗, LlamaIndex 🦙, Apache-Arrow, Pandas, Polars, DuckDB and more on the way.
How to Install:
Follow the Quickstart doc to set up LanceDB locally.
API & SDK: We also support Python, Typescript and Rust SDKs
| Interface | Documentation |
|---|---|
| Python SDK | https://lancedb.github.io/lancedb/python/python/ |
| Typescript SDK | https://lancedb.github.io/lancedb/js/globals/ |
| Rust SDK | https://docs.rs/lancedb/latest/lancedb/index.html |
| REST API | https://docs.lancedb.com/api-reference/rest |
Join Us and Contribute
We welcome contributions from everyone! Whether you're a developer, researcher, or just someone who wants to help out.
If you have any suggestions or feature requests, please feel free to open an issue on GitHub or discuss it on our Discord server.
Check out the GitHub Issues if you would like to work on the features that are planned for the future. If you have any suggestions or feature requests, please feel free to open an issue on GitHub.
