lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2026-05-14 10:30:40 +00:00

Author	SHA1	Message	Date
Weston Pace	a17c241e86	feat(python): make Permutation fork-safe for PyTorch DataLoader workers (#3339 ) ## Summary PyTorch's `DataLoader` uses fork-based multiprocessing by default on Linux, but threads do not survive `fork()`. LanceDB's Python bindings drive async work through two threaded layers, both of which become inert in a forked child: - `BackgroundEventLoop` runs an asyncio loop on a Python `threading.Thread`. - `pyo3-async-runtimes::tokio` holds a global multi-threaded tokio runtime whose worker threads also die on fork — and its runtime lives in a `OnceLock` that cannot be replaced after first use. As a result, any `Permutation` (or other async API) used inside a fork-based `DataLoader` worker hangs indefinitely. This PR makes both layers fork-safe so `Permutation` works as a `torch.utils.data.Dataset` with `num_workers > 0`. ## Approach ### Rust — new `python/src/runtime.rs` Mirrors the pattern used in [Lance's Python bindings](`456198cd6f/python/src/lib.rs (L139)`), adapted for the async-bridge use case. - `LanceRuntime` implements `pyo3_async_runtimes::generic::Runtime + ContextExt`, backed by an `AtomicPtr<tokio::runtime::Runtime>` we own (sidestepping `pyo3-async-runtimes`'s frozen `OnceLock` global). - A `pthread_atfork(after_in_child)` handler nulls the pointer; the next `spawn` rebuilds the runtime in the child. The previous runtime is intentionally leaked — calling `Drop` would try to join now-dead worker threads and hang. - `runtime::future_into_py` is a drop-in for `pyo3_async_runtimes::tokio::future_into_py`. All ~80 call sites in `arrow.rs` / `connection.rs` / `permutation.rs` / `query.rs` / `table.rs` are updated to route through it. - `python/Cargo.toml` adds `libc = "0.2"` and the tokio `rt-multi-thread` feature. ### Python — `lancedb/background_loop.py` - Refactors `BackgroundEventLoop.__init__` to a reusable `_start()` method. - An `os.register_at_fork(after_in_child=…)` hook calls `LOOP._start()` to give the singleton a fresh asyncio loop and thread in place. This matters because the rest of the codebase imports `LOOP` via `from .background_loop import LOOP` — rebinding the module attribute would leave those references holding the dead loop. ### Python — `lancedb/__init__.py` Removes the `__warn_on_fork` pre-fork warning (and the now-unused `import warnings`). Fork is supported. ## Test plan - [x] New `test_permutation_dataloader_fork_workers` in `python/tests/test_torch.py`: runs a `Permutation` through `torch.utils.data.DataLoader(num_workers=2, multiprocessing_context="fork")` inside a spawn-isolated child with a 30s hang detector. Pre-fix: timed out at 36s. Post-fix: passes in ~3.6s. - [x] New `test_remote_connection_after_fork` in `python/tests/test_remote_db.py`: forks a child that creates a fresh `lancedb.connect(...)` against a mock HTTP server and calls `table_names()`; passes in <1s, validates the runtime reset is sufficient for fresh remote clients. - [x] All 62 tests in `test_torch.py` + `test_permutation.py` pass. - [x] All 35 tests in `test_remote_db.py` pass. - [x] `test_table.py` (87) + `test_db.py` + `test_query.py` (157, minus one unrelated `sentence_transformers` import skip) — 244 passing. - [x] `cargo clippy -p lancedb-python --tests` clean. - [x] `cargo fmt`, `ruff check`, `ruff format` all clean. ## Known limitation (follow-up) This PR makes a freshly-built `lancedb.connect(...)` work in a forked child. An inherited `Connection` from the parent still carries an inherited `reqwest::Client` whose hyper connection pool references socket FDs and TCP/TLS state shared with the parent — using it from the child after fork is unsafe (especially with HTTP/1.1 keep-alive). The recommended pattern for fork-based `DataLoader` workers that hit a remote DB is to construct a new connection inside the worker. Auto-clearing inherited HTTP client pools on fork would require tracking live `Connection` instances in `lancedb` core and is left for a follow-up PR. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 13:44:10 -07:00
Lance Release	a2aea7b4e5	Bump version: 0.31.0-beta.10 → 0.31.0-beta.11	2026-04-29 17:53:22 +00:00
LanceDB Robot	4a5341edb1	chore: update lance dependency to v6.0.0-beta.7 (#3334 ) ## Summary - Update Lance Rust dependencies to `6.0.0-beta.7` using `ci/set_lance_version.py`. - Update Java `lance-core.version` to `6.0.0-beta.7`. - Align Arrow/DataFusion/PyO3 dependency versions and apply required compatibility fixes for the Lance upgrade. Triggering tag: [v6.0.0-beta.7](https://github.com/lance-format/lance/releases/tag/v6.0.0-beta.7) ## Verification - `cargo clippy --workspace --tests --all-features -- -D warnings` - `cargo fmt --all`	2026-04-29 10:52:25 -07:00
Lance Release	2e36cd9dad	Bump version: 0.31.0-beta.9 → 0.31.0-beta.10	2026-04-28 13:29:00 +00:00
Will Jones	d135c18db6	ci: add cargo-deny configuration and CI check (#3307 ) Adds a `deny.toml` at the workspace root and a `deny` CI job that runs `cargo deny check` on every PR. Catches yanked crates, license drift, banned or wildcard dependencies, unapproved sources, and new RUSTSEC advisories. As part of wiring this up: - Updated `aws-lc-rs` 1.13.0 → 1.16.3 / `aws-lc-sys` 0.28.0 → 0.40.0 to clear four 2026 AWS-LC advisories (timing side-channel, PKCS7 bypass, CRL scope). Removed the `=0.28.0` workaround pin; the original build failure no longer reproduces. - Updated `bytes`, `zlib-rs`, `rand`, `rustls-webpki`, `lz4_flex` to clear their current advisories. - Marked `lancedb-nodejs` and `lancedb-python` as `publish = false` and pinned `lzma-sys` from `*` to `0.1` so `bans.wildcards = "deny"` can be enforced. 10 remaining advisories have no safe upgrade available (transitive via opendal, lance, datafusion, async-openai, aws-sdk on the legacy rustls 0.21 chain). Each is ignored in `deny.toml` with a per-entry rationale and a link to the RUSTSEC advisory. New advisories still fail CI. Fixes #3297 --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 20:53:15 -07:00
Lance Release	2a886141f7	Bump version: 0.31.0-beta.8 → 0.31.0-beta.9	2026-04-19 20:39:04 +00:00
Lance Release	9ad2dfe601	Bump version: 0.31.0-beta.7 → 0.31.0-beta.8	2026-04-19 04:18:45 +00:00
Lance Release	5ce3d8d141	Bump version: 0.31.0-beta.6 → 0.31.0-beta.7	2026-04-17 08:12:03 +00:00
Lance Release	2ed5452e1c	Bump version: 0.31.0-beta.5 → 0.31.0-beta.6	2026-04-16 18:57:05 +00:00
Lance Release	13d2759356	Bump version: 0.31.0-beta.4 → 0.31.0-beta.5	2026-04-12 23:50:50 +00:00
Lance Release	231f0655ce	Bump version: 0.31.0-beta.3 → 0.31.0-beta.4	2026-04-12 03:57:35 +00:00
Lance Release	1f1726369d	Bump version: 0.31.0-beta.2 → 0.31.0-beta.3	2026-04-11 22:44:25 +00:00
Lance Release	11bc674548	Bump version: 0.31.0-beta.1 → 0.31.0-beta.2	2026-04-11 07:05:36 +00:00
Lance Release	0ac59de5f1	Bump version: 0.31.0-beta.0 → 0.31.0-beta.1	2026-04-05 02:50:52 +00:00
Lance Release	590c0c1e77	Bump version: 0.30.2 → 0.31.0-beta.0	2026-04-03 08:45:29 +00:00
Lance Release	5d550124bd	Bump version: 0.30.2-beta.2 → 0.30.2	2026-03-31 21:25:04 +00:00
Lance Release	c57cb310a2	Bump version: 0.30.2-beta.1 → 0.30.2-beta.2	2026-03-31 21:25:02 +00:00
Lance Release	76429730c0	Bump version: 0.30.2-beta.0 → 0.30.2-beta.1	2026-03-25 16:21:26 +00:00
Lance Release	f4d613565e	Bump version: 0.30.1 → 0.30.2-beta.0	2026-03-25 03:22:55 +00:00
Will Jones	1d6e00b902	feat: progress bar for `add()` (#3067 ) ## Summary Adds progress reporting for `table.add()` so users can track large write operations. The progress callback is available in Rust, Python (sync and async), and through the PyO3 bindings. ### Usage Pass `progress=True` to get an automatic tqdm bar: ```python table.add(data, progress=True) # 100%\|██████████\| 1000000/1000000 [00:12<00:00, 82345 rows/s, 45.2 MB/s \| 4/4 workers] ``` Or pass a tqdm bar for more control: ```python from tqdm import tqdm with tqdm(unit=" rows") as pbar: table.add(data, progress=pbar) ``` Or use a callback for custom progress handling: ```python def on_progress(p): print(f"{p['output_rows']}/{p['total_rows']} rows, " f"{p['active_tasks']}/{p['total_tasks']} workers, " f"done={p['done']}") table.add(data, progress=on_progress) ``` In Rust: ```rust table.add(data) .progress(\|p\| println!("{}/{:?} rows", p.output_rows(), p.total_rows())) .execute() .await?; ``` ### Details - `WriteProgress` struct in Rust with getters for `elapsed`, `output_rows`, `output_bytes`, `total_rows`, `active_tasks`, `total_tasks`, and `done`. Fields are private behind getters so new fields can be added without breaking changes. - `WriteProgressTracker` tracks progress across parallel write tasks using a mutex for row/byte counts and atomics for active task counts. - Active task tracking uses an RAII guard pattern (`ActiveTaskGuard`) that increments on creation and decrements on drop. - For remote writes, `output_bytes` reflects IPC wire bytes rather than in-memory Arrow size. For local writes it uses in-memory Arrow size as a proxy (see TODO below). - tqdm postfix displays throughput (MB/s) and worker utilization (active/total). - The `done` callback always fires, even on error (via `FinishOnDrop`), so progress bars are always finalized. ### TODO - Track actual bytes written to disk for local tables. This requires Lance to expose a progress callback from its write path. See lance-format/lance#6247. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-23 16:14:13 -07:00
Lance Release	f5b21c0aa4	Bump version: 0.30.1-beta.0 → 0.30.1	2026-03-20 00:35:03 +00:00
Lance Release	e927924d26	Bump version: 0.30.0 → 0.30.1-beta.0	2026-03-20 00:35:02 +00:00
Lance Release	c89240b16c	Bump version: 0.30.0-beta.6 → 0.30.0	2026-03-16 22:46:19 +00:00
Lance Release	099ff355a4	Bump version: 0.30.0-beta.5 → 0.30.0-beta.6	2026-03-16 22:46:17 +00:00
Lance Release	6de8f42dcd	Bump version: 0.30.0-beta.4 → 0.30.0-beta.5	2026-03-09 19:56:15 +00:00
Lance Release	f31561c5bb	Bump version: 0.30.0-beta.3 → 0.30.0-beta.4	2026-03-09 08:45:25 +00:00
Jack Ye	e0c5ceac03	fix: propagate managed versioning for namespace connection (#3111 ) Without this fix, if user directly use the native table to do operations like `add_columns`, even if it is configured to use namespace db connection, it is not really propagated through. The fix is to bring lancedb's python binding up to date and do a similar implementation as https://github.com/lance-format/lance/pull/5968, and make sure the namespace is fully propagated through all the related calls. --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-09 01:44:31 -07:00
Lance Release	aeb1c3ee6a	Bump version: 0.30.0-beta.2 → 0.30.0-beta.3	2026-02-28 01:29:53 +00:00
Lance Release	e253f5d9b6	Bump version: 0.30.0-beta.1 → 0.30.0-beta.2	2026-02-25 07:46:06 +00:00
Lance Release	1ea22ee5ef	Bump version: 0.30.0-beta.0 → 0.30.0-beta.1	2026-02-23 18:33:28 +00:00
LanceDB Robot	8cef8806e9	chore: update lance dependency to v3.0.0-beta.5 (#3058 ) ## Summary - Bump Lance Rust dependencies and Java `lance-core` to v3.0.0-beta.5 (refs/tags/v3.0.0-beta.5). - Update workspace toolchain and dependency defaults needed for the new Lance release. - Resolve new clippy lint defaults introduced by the toolchain update. ## Validation - `cargo clippy --workspace --tests --all-features -- -D warnings` - `cargo fmt --all` --------- Co-authored-by: Jack Ye <yezhaoqin@gmail.com>	2026-02-23 00:39:30 -08:00
Lance Release	d9e2d51f51	Bump version: 0.29.2 → 0.30.0-beta.0	2026-02-17 00:27:45 +00:00
Lance Release	027d53500b	Bump version: 0.29.2-beta.0 → 0.29.2	2026-02-09 06:05:42 +00:00
Lance Release	9098f47e73	Bump version: 0.29.1 → 0.29.2-beta.0	2026-02-09 06:05:40 +00:00
Lance Release	5cdb15feef	Bump version: 0.29.1-beta.0 → 0.29.1	2026-02-07 00:32:44 +00:00
Lance Release	7a3eea927f	Bump version: 0.29.0 → 0.29.1-beta.0	2026-02-07 00:32:42 +00:00
Lance Release	071f467571	Bump version: 0.29.0-beta.0 → 0.29.0	2026-02-06 18:07:49 +00:00
Lance Release	f83aa25119	Bump version: 0.28.0-beta.0 → 0.29.0-beta.0	2026-02-06 18:07:48 +00:00
Jack Ye	0a8fe4d026	ci: fix python version for latest release (#2989 ) It was accidentally corrupted in https://github.com/lancedb/lancedb/pull/2972	2026-02-06 10:07:03 -08:00
Jack Ye	bd2c6d0763	chore: update lance dependency to v2.0.0-rc.4 (#2972 )	2026-02-03 14:38:39 -08:00
Lance Release	972c682857	Bump version: 0.27.1 → 0.28.0-beta.0	2026-02-03 04:47:20 +00:00
Lei Xu	357197bacc	chore!: change support python version from 3.10 to 3.13 (#2955 ) Python 3.9 is EOL since Oct 2025. and last two pyarrow builts were against python3.10-3.13. * This PR is contributed by codex-gpt5.2	2026-01-30 01:47:50 +08:00
Lance Release	cc5f8070d7	Bump version: 0.27.1-beta.0 → 0.27.1	2026-01-26 23:38:24 +00:00
Lance Release	dc0fb01f6b	Bump version: 0.27.0 → 0.27.1-beta.0	2026-01-26 23:38:23 +00:00
Jack Ye	e4552e577a	chore(revert): revert update lance dependency to v2.0.0-rc.1 (#2936 ) (#2941 ) This reverts commit `bd84bba14d`, so that we can bump version to 1.0.4-rc.1	2026-01-26 11:13:59 -08:00
Will Jones	f979a902ad	ci(rust): fix MSRV check (#2940 ) Realized our MSRV check was inert because `rust-toolchain.toml` was overriding the Rust version. We set the `RUSTUP_TOOLCHAIN` environment variable, which overrides that. Also needed to update to MSRV 1.88 (due to dependencies like Lance and DataFusion) and fix some clippy warnings.	2026-01-23 15:57:09 -08:00
LanceDB Robot	bd84bba14d	chore: update lance dependency to v2.0.0-rc.1 (#2936 ) ## Summary - bump Lance dependencies to v2.0.0-rc.1 (git tag) - align Arrow/DataFusion/PyO3 versions for the new Lance release - update Python bindings for PyO3 0.26 (attach API + Py<PyAny>) ## Verification - `cargo clippy --workspace --tests --all-features -- -D warnings` - `cargo fmt --all` ## Reference - https://github.com/lance-format/lance/releases/tag/v2.0.0-rc.1 --------- Co-authored-by: Jack Ye <yezhaoqin@gmail.com> Co-authored-by: Will Jones <willjones127@gmail.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: BubbleCal <bubble_cal@outlook.com>	2026-01-22 13:14:38 -08:00
Lance Release	042bc22468	Bump version: 0.27.0-beta.1 → 0.27.0	2026-01-22 01:09:32 +00:00
Lance Release	68569906c6	Bump version: 0.27.0-beta.0 → 0.27.0-beta.1	2026-01-22 01:09:31 +00:00
Jack Ye	4e65748abf	chore: update lance dependency to v1.0.3-rc.1 (#2927 ) Supercedes https://github.com/lancedb/lancedb/pull/2925 We accidentally upgraded lance to 2.0.0-beta.8. This PR reverts that first and then bump to 1.0.3-rc.1 --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-21 11:52:07 -08:00

1 2 3 4 5

234 Commits