mirror of
https://github.com/lancedb/lancedb.git
synced 2026-04-02 22:10:41 +00:00
## Summary
Adds progress reporting for `table.add()` so users can track large write
operations. The progress callback is available in Rust, Python (sync and
async), and through the PyO3 bindings.
### Usage
Pass `progress=True` to get an automatic tqdm bar:
```python
table.add(data, progress=True)
# 100%|██████████| 1000000/1000000 [00:12<00:00, 82345 rows/s, 45.2 MB/s | 4/4 workers]
```
Or pass a tqdm bar for more control:
```python
from tqdm import tqdm
with tqdm(unit=" rows") as pbar:
table.add(data, progress=pbar)
```
Or use a callback for custom progress handling:
```python
def on_progress(p):
print(f"{p['output_rows']}/{p['total_rows']} rows, "
f"{p['active_tasks']}/{p['total_tasks']} workers, "
f"done={p['done']}")
table.add(data, progress=on_progress)
```
In Rust:
```rust
table.add(data)
.progress(|p| println!("{}/{:?} rows", p.output_rows(), p.total_rows()))
.execute()
.await?;
```
### Details
- `WriteProgress` struct in Rust with getters for `elapsed`,
`output_rows`, `output_bytes`, `total_rows`, `active_tasks`,
`total_tasks`, and `done`. Fields are private behind getters so new
fields can be added without breaking changes.
- `WriteProgressTracker` tracks progress across parallel write tasks
using a mutex for row/byte counts and atomics for active task counts.
- Active task tracking uses an RAII guard pattern (`ActiveTaskGuard`)
that increments on creation and decrements on drop.
- For remote writes, `output_bytes` reflects IPC wire bytes rather than
in-memory Arrow size. For local writes it uses in-memory Arrow size as a
proxy (see TODO below).
- tqdm postfix displays throughput (MB/s) and worker utilization
(active/total).
- The `done` callback always fires, even on error (via `FinishOnDrop`),
so progress bars are always finalized.
### TODO
- Track actual bytes written to disk for local tables. This requires
Lance to expose a progress callback from its write path. See
lance-format/lance#6247.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
49 lines
1.3 KiB
TOML
49 lines
1.3 KiB
TOML
[package]
|
|
name = "lancedb-python"
|
|
version = "0.30.1"
|
|
edition.workspace = true
|
|
description = "Python bindings for LanceDB"
|
|
license.workspace = true
|
|
repository.workspace = true
|
|
keywords.workspace = true
|
|
categories.workspace = true
|
|
rust-version = "1.91.0"
|
|
|
|
[lib]
|
|
name = "_lancedb"
|
|
crate-type = ["cdylib"]
|
|
|
|
[dependencies]
|
|
arrow = { version = "57.2", features = ["pyarrow"] }
|
|
async-trait = "0.1"
|
|
bytes = "1"
|
|
lancedb = { path = "../rust/lancedb", default-features = false }
|
|
lance-core.workspace = true
|
|
lance-namespace.workspace = true
|
|
lance-namespace-impls.workspace = true
|
|
lance-io.workspace = true
|
|
env_logger.workspace = true
|
|
log.workspace = true
|
|
pyo3 = { version = "0.26", features = ["extension-module", "abi3-py39"] }
|
|
pyo3-async-runtimes = { version = "0.26", features = [
|
|
"attributes",
|
|
"tokio-runtime",
|
|
] }
|
|
pin-project = "1.1.5"
|
|
futures.workspace = true
|
|
serde = "1"
|
|
serde_json = "1"
|
|
snafu.workspace = true
|
|
tokio = { version = "1.40", features = ["sync"] }
|
|
|
|
[build-dependencies]
|
|
pyo3-build-config = { version = "0.26", features = [
|
|
"extension-module",
|
|
"abi3-py39",
|
|
] }
|
|
|
|
[features]
|
|
default = ["remote", "lancedb/aws", "lancedb/gcs", "lancedb/azure", "lancedb/dynamodb", "lancedb/oss", "lancedb/huggingface"]
|
|
fp16kernels = ["lancedb/fp16kernels"]
|
|
remote = ["lancedb/remote"]
|