lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2026-07-03 19:10:41 +00:00

Author	SHA1	Message	Date
Weston Pace	a17c241e86	feat(python): make Permutation fork-safe for PyTorch DataLoader workers (#3339 ) ## Summary PyTorch's `DataLoader` uses fork-based multiprocessing by default on Linux, but threads do not survive `fork()`. LanceDB's Python bindings drive async work through two threaded layers, both of which become inert in a forked child: - `BackgroundEventLoop` runs an asyncio loop on a Python `threading.Thread`. - `pyo3-async-runtimes::tokio` holds a global multi-threaded tokio runtime whose worker threads also die on fork — and its runtime lives in a `OnceLock` that cannot be replaced after first use. As a result, any `Permutation` (or other async API) used inside a fork-based `DataLoader` worker hangs indefinitely. This PR makes both layers fork-safe so `Permutation` works as a `torch.utils.data.Dataset` with `num_workers > 0`. ## Approach ### Rust — new `python/src/runtime.rs` Mirrors the pattern used in [Lance's Python bindings](`456198cd6f/python/src/lib.rs (L139)`), adapted for the async-bridge use case. - `LanceRuntime` implements `pyo3_async_runtimes::generic::Runtime + ContextExt`, backed by an `AtomicPtr<tokio::runtime::Runtime>` we own (sidestepping `pyo3-async-runtimes`'s frozen `OnceLock` global). - A `pthread_atfork(after_in_child)` handler nulls the pointer; the next `spawn` rebuilds the runtime in the child. The previous runtime is intentionally leaked — calling `Drop` would try to join now-dead worker threads and hang. - `runtime::future_into_py` is a drop-in for `pyo3_async_runtimes::tokio::future_into_py`. All ~80 call sites in `arrow.rs` / `connection.rs` / `permutation.rs` / `query.rs` / `table.rs` are updated to route through it. - `python/Cargo.toml` adds `libc = "0.2"` and the tokio `rt-multi-thread` feature. ### Python — `lancedb/background_loop.py` - Refactors `BackgroundEventLoop.__init__` to a reusable `_start()` method. - An `os.register_at_fork(after_in_child=…)` hook calls `LOOP._start()` to give the singleton a fresh asyncio loop and thread in place. This matters because the rest of the codebase imports `LOOP` via `from .background_loop import LOOP` — rebinding the module attribute would leave those references holding the dead loop. ### Python — `lancedb/__init__.py` Removes the `__warn_on_fork` pre-fork warning (and the now-unused `import warnings`). Fork is supported. ## Test plan - [x] New `test_permutation_dataloader_fork_workers` in `python/tests/test_torch.py`: runs a `Permutation` through `torch.utils.data.DataLoader(num_workers=2, multiprocessing_context="fork")` inside a spawn-isolated child with a 30s hang detector. Pre-fix: timed out at 36s. Post-fix: passes in ~3.6s. - [x] New `test_remote_connection_after_fork` in `python/tests/test_remote_db.py`: forks a child that creates a fresh `lancedb.connect(...)` against a mock HTTP server and calls `table_names()`; passes in <1s, validates the runtime reset is sufficient for fresh remote clients. - [x] All 62 tests in `test_torch.py` + `test_permutation.py` pass. - [x] All 35 tests in `test_remote_db.py` pass. - [x] `test_table.py` (87) + `test_db.py` + `test_query.py` (157, minus one unrelated `sentence_transformers` import skip) — 244 passing. - [x] `cargo clippy -p lancedb-python --tests` clean. - [x] `cargo fmt`, `ruff check`, `ruff format` all clean. ## Known limitation (follow-up) This PR makes a freshly-built `lancedb.connect(...)` work in a forked child. An inherited `Connection` from the parent still carries an inherited `reqwest::Client` whose hyper connection pool references socket FDs and TCP/TLS state shared with the parent — using it from the child after fork is unsafe (especially with HTTP/1.1 keep-alive). The recommended pattern for fork-based `DataLoader` workers that hit a remote DB is to construct a new connection inside the worker. Auto-clearing inherited HTTP client pools on fork would require tracking live `Connection` instances in `lancedb` core and is left for a follow-up PR. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 13:44:10 -07:00
Weston Pace	1fc23e5473	fix(python): make Permutation picklable for PyTorch multiprocessing (#3335 ) ## Summary When pytorch is used with multiprocessing and the mp mode is spawn then the Permutation needs to be pickled. It could not be pickled because `Table` and `Connection` are not serializable. This PR adds pickle support to Permutation without adding general pickle support to `Table` or `Connection`. To add general support we probably need to start by adding serialization in the namespace client. In the meantime this PR enable pickling by adding special cases for: * In-memory tables (just serialize as Arrow IPC) * Native tables (serialize the URI) If a user is not using one of the above cases (e.g. using a remote connection) then they will need to provide a connection factory that can be pickled. ## Breaking change `PermutationBuilder.persist(...)` is removed from the Python bindings; the permutation table is now always in-memory. The underlying Rust `PermutationBuilder::persist` API is untouched and can be re-exposed later if needed. It probably won't make sense to do that until we have a way to serialize `Table` and `Connection`. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 21:37:58 -07:00
Nitesh Yadav	59db036118	fix(python): add missing space in hybrid query error message (#3340 ) Hi, the hybrid query error message looks like it can use a space, just added it. ```python def _validate_query(self, query, vector=None, text=None): if query is not None and (vector is not None or text is not None): raise ValueError( "You can either provide a string query in search() method" "or set `vector()` and `text()` explicitly for hybrid search." "But not both." ) ```	2026-05-02 15:51:00 -07:00
Lance Release	a2aea7b4e5	Bump version: 0.31.0-beta.10 → 0.31.0-beta.11	2026-04-29 17:53:22 +00:00
LanceDB Robot	4a5341edb1	chore: update lance dependency to v6.0.0-beta.7 (#3334 ) ## Summary - Update Lance Rust dependencies to `6.0.0-beta.7` using `ci/set_lance_version.py`. - Update Java `lance-core.version` to `6.0.0-beta.7`. - Align Arrow/DataFusion/PyO3 dependency versions and apply required compatibility fixes for the Lance upgrade. Triggering tag: [v6.0.0-beta.7](https://github.com/lance-format/lance/releases/tag/v6.0.0-beta.7) ## Verification - `cargo clippy --workspace --tests --all-features -- -D warnings` - `cargo fmt --all`	2026-04-29 10:52:25 -07:00
Jack Ye	25dfe2cfd4	feat: add manifest-enabled directory namespace mode (#3332 ) Adds manifest_enabled for local/native connections so directory namespace manifests can be the source of truth, including migration from directory listing and Azure credential vending feature wiring. Also exposes the option through Rust, Python, and Node bindings with focused validation.	2026-04-29 09:22:06 -07:00
Lance Release	2e36cd9dad	Bump version: 0.31.0-beta.9 → 0.31.0-beta.10	2026-04-28 13:29:00 +00:00
Will Jones	d135c18db6	ci: add cargo-deny configuration and CI check (#3307 ) Adds a `deny.toml` at the workspace root and a `deny` CI job that runs `cargo deny check` on every PR. Catches yanked crates, license drift, banned or wildcard dependencies, unapproved sources, and new RUSTSEC advisories. As part of wiring this up: - Updated `aws-lc-rs` 1.13.0 → 1.16.3 / `aws-lc-sys` 0.28.0 → 0.40.0 to clear four 2026 AWS-LC advisories (timing side-channel, PKCS7 bypass, CRL scope). Removed the `=0.28.0` workaround pin; the original build failure no longer reproduces. - Updated `bytes`, `zlib-rs`, `rand`, `rustls-webpki`, `lz4_flex` to clear their current advisories. - Marked `lancedb-nodejs` and `lancedb-python` as `publish = false` and pinned `lzma-sys` from `*` to `0.1` so `bans.wildcards = "deny"` can be enforced. 10 remaining advisories have no safe upgrade available (transitive via opendal, lance, datafusion, async-openai, aws-sdk on the legacy rustls 0.21 chain). Each is ignored in `deny.toml` with a per-entry rationale and a link to the RUSTSEC advisory. New advisories still fail CI. Fixes #3297 --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 20:53:15 -07:00
Xuanwo	c54888a83a	refactor(python): remove legacy tantivy FTS support (#3282 ) This follows the Rust-side Tantivy removal by deleting the remaining Python Tantivy runtime, tests, and packaging references. It also turns the legacy Python-only Tantivy parameters into explicit errors and stops reading legacy `_indices/fts` directories so Python FTS is fully native-only.	2026-04-20 09:28:45 +08:00
Lance Release	2a886141f7	Bump version: 0.31.0-beta.8 → 0.31.0-beta.9	2026-04-19 20:39:04 +00:00
Lance Release	9ad2dfe601	Bump version: 0.31.0-beta.7 → 0.31.0-beta.8	2026-04-19 04:18:45 +00:00
Jack Ye	f909df3e87	fix(python): use namespace-backed rust connection for namespace tables (#3286 ) So far, I have been using a hacky approach that creates and opens namespace-backed table, by getting its location and use a temporary lancedb connection to create or open it. This was working for features like credentials vending but is no longer fully working for the managed versioning feature, recently geneva tests have been failing here and there and various patches are not addressing the root cause. This PR fully fixes this and implements proper rust binding for it. Specifically: - build a real Rust namespace-backed connection from the Python namespace client - route namespace table create/open through that connection instead of resolved-location temp connections - keep namespace client naming consistent in the Rust bridge and preserve federated namespace + DuckDB behavior	2026-04-18 21:17:52 -07:00
Lance Release	5ce3d8d141	Bump version: 0.31.0-beta.6 → 0.31.0-beta.7	2026-04-17 08:12:03 +00:00
Jack Ye	5eaac178b1	fix(python): pass namespace client on schema-only table create (#3283 ) ## Summary - pass `namespace_client` through the Python create-table path - ensure schema-only namespace table creation uses the namespace-aware empty-table flow - fix reopening namespace tables created without initial data	2026-04-17 01:11:18 -07:00
Lance Release	2ed5452e1c	Bump version: 0.31.0-beta.5 → 0.31.0-beta.6	2026-04-16 18:57:05 +00:00
Jack Ye	97a4b38f19	feat(rust): support nested namespace ops in listing db (#3279 ) ## Summary - delegate child-namespace `ListingDatabase` operations through an eagerly initialized `LanceNamespaceDatabase` - support nested namespace create/open/list/drop flows without requiring callers to inject explicit locations - add `namespace_client_properties` plumbing for local and namespace connections so directory namespace settings like `table_version_tracking_enabled` can be configured - add regression tests for nested namespace ops and namespace client property propagation	2026-04-16 10:12:28 -07:00
Gezi-lzq	10879d99b8	docs: fix broken documentation links (#3278 )	2026-04-15 20:56:59 +08:00
Lance Release	13d2759356	Bump version: 0.31.0-beta.4 → 0.31.0-beta.5	2026-04-12 23:50:50 +00:00
Jack Ye	7f52ec8c36	feat(python): support child namepsace operations and json serialization for LanceDBConnection (#3265 ) ## Summary Add connection serialization and child namespace support to `LanceDBConnection`. - `DBConnection.serialize()` / `lancedb.deserialize()` for connection reconstruction in remote workers - Cache `namespace_client()` in `LanceDBConnection` to avoid repeated DirectoryNamespace builds - `LanceDBConnection` transparently delegates child namespace operations (open_table, create_table, list_tables, drop_table, create_namespace, etc.) to `LanceNamespaceDBConnection` via `_namespace_conn()` - Root namespace operations still go through the original Rust path - Generic worker property override mechanism: any `namespace_client_properties` key prefixed with `_lancedb_worker_` has the prefix stripped and overrides the corresponding property when `deserialize(data, for_worker=True)` - `LanceNamespaceDBConnection` stores `namespace_client_impl`/`namespace_client_properties` for serialization roundtrip --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 16:49:45 -07:00
Lance Release	231f0655ce	Bump version: 0.31.0-beta.3 → 0.31.0-beta.4	2026-04-12 03:57:35 +00:00
Lance Release	1f1726369d	Bump version: 0.31.0-beta.2 → 0.31.0-beta.3	2026-04-11 22:44:25 +00:00
Lance Release	11bc674548	Bump version: 0.31.0-beta.1 → 0.31.0-beta.2	2026-04-11 07:05:36 +00:00
Dhruv Garg	4761fa9bcb	fix(python): migrate gemini-text provider to google-genai sdk (#3250 ) ## Summary - migrate gemini-text embedding provider from deprecated google.generativeai to google.genai - update Python embedding extra dependency to google-genai - update default model name to gemini-embedding-001 - adapt embed calls to Client().models.embed_content(...) - apply lint fixes from CI ## Related - Closes #3191	2026-04-09 15:28:34 -07:00
lennylxx	4c2939d66e	fix(python): guard against None before .decode() on split_names metadata key (#3229 ) `.get(b"split_names", None).decode()` was called unconditionally in both Permutations.__init__ and Permutation.from_tables(), crashing with AttributeError when schema metadata existed but lacked the split_names key. Guard the decode behind a None check and add regression tests.	2026-04-08 16:04:13 -07:00
yaommen	a813ce2f71	fix(python): sanitize bad vectors before Arrow cast (#3158 ) ## Problem `on_bad_vectors="drop"` is supposed to remove invalid vector rows before write, but for some schema-defined vector columns it can still fail later during Arrow cast instead of dropping the bad row. Repro: ```python class MySchema(LanceModel): text: str embedding: Vector(16) table = db.create_table("test", schema=MySchema) table.add( [ {"text": "hello", "embedding": []}, {"text": "bar", "embedding": [0.1] * 16}, ], on_bad_vectors="drop", ) ``` Before: ``` RuntimeError Arrow error: C Data interface error: Invalid: ListType can only be casted to FixedSizeListType if the lists are all the expected size. ``` After: ``` rows 1 texts ['bar'] ``` ## Solution Make bad-vector sanitization use schema dimensions before cast, while keeping the handling scoped to vector columns identified by schema metadata or existing vector-name heuristics. This also preserves existing integer vector inputs and avoids applying on_bad_vectors to unrelated fixed-size float columns. Fixes #1670 Signed-off-by: yaommen <myanstu@163.com>	2026-04-08 09:09:41 -07:00
Jack Ye	a898dc81c2	feat: add user_id field to ClientConfig for user identification (#3240 ) ## Summary - Add a `user_id` field to `ClientConfig` that allows users to identify themselves to LanceDB Cloud/Enterprise - The user_id is sent as the `x-lancedb-user-id` HTTP header in all requests - Supports three configuration methods: - Direct assignment via `ClientConfig.user_id` - Environment variable `LANCEDB_USER_ID` - Indirect env var lookup via `LANCEDB_USER_ID_ENV_KEY` Closes #3230 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-04-06 11:20:10 -07:00
Lance Release	0ac59de5f1	Bump version: 0.31.0-beta.0 → 0.31.0-beta.1	2026-04-05 02:50:52 +00:00
LanceDB Robot	d082c2d2ac	chore: update lance dependency to v5.0.0-beta.5 (#3237 ) ## Summary - update Rust Lance workspace dependencies to `v5.0.0-beta.5` using `ci/set_lance_version.py` - update Java `lance-core` dependency property to `5.0.0-beta.5` - refresh Cargo lockfile to the new Lance tag ## Verification - `cargo clippy --workspace --tests --all-features -- -D warnings` - `cargo fmt --all` ## Upstream Tag - https://github.com/lance-format/lance/releases/tag/v5.0.0-beta.5 --------- Co-authored-by: Jack Ye <yezhaoqin@gmail.com>	2026-04-04 19:49:51 -07:00
Zelys	9d8699f99e	feat(python): support Enum types in Pydantic to Arrow schema conversion (#3232 ) ## Summary Fixes #1846. Python `Enum` fields raised `TypeError: Converting Pydantic type to Arrow Type: unsupported type <enum 'SomethingTypes'>` when converting a Pydantic model to an Arrow schema. The fix adds Enum detection in `_pydantic_type_to_arrow_type`. When an Enum subclass is encountered, the value type of its members is inspected and mapped to the appropriate Arrow type: - `str`-valued enums (e.g. `class Status(str, Enum)`) → `pa.utf8()` - `int`-valued enums (e.g. `class Priority(int, Enum)`) → `pa.int64()` - Other homogeneous value types → the Arrow type for that Python type - Mixed-value or empty enums → `pa.utf8()` (safe fallback) This covers the common `(str, Enum)` and `(int, Enum)` mixin patterns used in practice. ## Changes - `python/python/lancedb/pydantic.py`: add Enum branch in `_pydantic_type_to_arrow_type` - `python/python/tests/test_pydantic.py`: add `test_enum_types` covering `str`, `int`, and `Optional` Enum fields ## Note on #2395 PR #2395 handles `StrEnum` (Python 3.11+) specifically, using a dictionary-encoded type. This PR handles the broader `(str, Enum)` / `(int, Enum)` mixin pattern that works across all Python versions and stores values as their natural Arrow type. AI assistance was used in developing this fix.	2026-04-03 10:40:49 -07:00
Lance Release	590c0c1e77	Bump version: 0.30.2 → 0.31.0-beta.0	2026-04-03 08:45:29 +00:00
Jack Ye	e26b22bcca	refactor!: consolidate namespace related naming and enterprise integration (#3205 ) 1. Refactored every client (Rust core, Python, Node/TypeScript) so “namespace” usage is explicit: code now keeps namespace paths (namespace_path) separate from namespace clients (namespace_client). Connections propagate the client, table creation routes through it, and managed versioning defaults are resolved from namespace metadata. Python gained LanceNamespaceDBConnection/async counterparts, and the namespace-focused tests were rewritten to match the clarified API surface. 2. Synchronized the workspace with Lance 5.0.0-beta.3 (see https://github.com/lance-format/lance/pull/6186 for the upstream namespace refactor), updating Cargo/uv lockfiles and ensuring all bindings align with the new namespace semantics. 3. Added a namespace-backed code path to lancedb.connect() via new keyword arguments (namespace_client_impl, namespace_client_properties, plus the existing pushdown-ops flag). When those kwargs are supplied, connect() delegates to connect_namespace, so users can opt into namespace clients without changing APIs. (The async helper will gain parity in a later change)	2026-04-03 00:09:03 -07:00
Lance Release	5d550124bd	Bump version: 0.30.2-beta.2 → 0.30.2	2026-03-31 21:25:04 +00:00
Lance Release	c57cb310a2	Bump version: 0.30.2-beta.1 → 0.30.2-beta.2	2026-03-31 21:25:02 +00:00
Dan Tasse	97754f5123	fix: change _client reference to _conn (#3188 ) This code previously referenced `self._client`, which does not exist. This change makes it correctly call `self._conn.close()`	2026-03-31 13:29:17 -07:00
Pratik Dey	7b1c063848	feat(python): add type-safe expression builder API (#3150 ) Introduces col(), lit(), func(), and Expr class as alternatives to raw SQL strings in .where() and .select(). Expressions are backed by DataFusion's Expr AST and serialized to SQL for remote table compat. Resolves: - https://github.com/lancedb/lancedb/issues/3044 (python api's) - https://github.com/lancedb/lancedb/issues/3043 (support for filter) - https://github.com/lancedb/lancedb/issues/3045 (support for projection) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-31 11:32:49 -07:00
Will Jones	e3d53dd185	fix(python): skip test_url_retrieve_downloads_image when PIL not installed (#3208 ) The test added in #3190 unconditionally imports `PIL`, which is an optional dependency. This causes CI failures in environments where Pillow isn't installed (`ModuleNotFoundError: No module named 'PIL'`). Use `pytest.importorskip` to skip gracefully when Pillow is unavailable. Fixes CI failure on main. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-30 14:48:49 -07:00
Will Jones	66804e99fc	fix(python): use correct exception types in namespace tests (#3206 ) ## Summary - Namespace tests expected `RuntimeError` for table-not-found and namespace-not-empty cases, but `lance_namespace` raises `TableNotFoundError` and `NamespaceNotEmptyError` which inherit from `Exception`, not `RuntimeError`. - Updated `pytest.raises` to use the correct exception types. ## Test plan - [x] CI passes on `test_namespace.py` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 12:55:54 -07:00
lennylxx	9f85d4c639	fix(embeddings): add missing urllib.request import in url_retrieve (#3190 ) url_retrieve() calls urllib.request.urlopen() but only urllib.error was imported, causing AttributeError for any HTTP URL input. This affects open-clip, siglip, and jinaai embedding functions when processing image URLs. The bug has existed since the embeddings API refactor (#580) but was masked because most users pass local file paths or bytes rather than HTTP URLs.	2026-03-30 12:03:44 -07:00
lif	4c44587af0	fix: table.add(mode='overwrite') infers vector column types (#3184 ) Fixes #3183 ## Summary When `table.add(mode='overwrite')` is called, PyArrow infers input data types (e.g. `list<double>`) which differ from the original table schema (e.g. `fixed_size_list<float32>`). Previously, overwrite mode bypassed `cast_to_table_schema()` entirely, so the inferred types replaced the original schema, breaking vector search. This fix builds a merged target schema for overwrite: columns present in the existing table schema keep their original types, while columns unique to the input pass through as-is. This way `cast_to_table_schema()` is applied unconditionally, preserving vector column types without blocking schema evolution. ## Changes - `rust/lancedb/src/table/add_data.rs`: For overwrite mode, construct a target schema by matching input columns against the existing table schema, then cast. Non-overwrite (append) path is unchanged. - Added `test_add_overwrite_preserves_vector_type` test that creates a table with `fixed_size_list<float32>`, overwrites with `list<double>` input, and asserts the original type is preserved. ## Test Plan - `cargo test --features remote -p lancedb -- test_add_overwrite` — all 4 overwrite tests pass - Full suite: 454 passed, 2 failed (pre-existing `remote::retry` flakes unrelated to this change) --------- Signed-off-by: majiayu000 <1835304752@qq.com>	2026-03-30 10:57:33 -07:00
lennylxx	1d1cafb59c	fix(python): don't assign dict.update() return value in _sanitize_data (#3198 ) dict.update() mutates in place and returns None. Assigning its result caused with_metadata(None) to strip all schema metadata when embedding metadata was merged during create_table with embedding_functions.	2026-03-30 10:15:45 -07:00
Dan Tasse	cca6a7c989	fix: raise instead of return ValueError (#3189 ) These couple of cases used to return ValueError; should raise it instead.	2026-03-25 18:49:29 -07:00
Lance Release	76429730c0	Bump version: 0.30.2-beta.0 → 0.30.2-beta.1	2026-03-25 16:21:26 +00:00
Lance Release	f4d613565e	Bump version: 0.30.1 → 0.30.2-beta.0	2026-03-25 03:22:55 +00:00
Will Jones	1d6e00b902	feat: progress bar for `add()` (#3067 ) ## Summary Adds progress reporting for `table.add()` so users can track large write operations. The progress callback is available in Rust, Python (sync and async), and through the PyO3 bindings. ### Usage Pass `progress=True` to get an automatic tqdm bar: ```python table.add(data, progress=True) # 100%\|██████████\| 1000000/1000000 [00:12<00:00, 82345 rows/s, 45.2 MB/s \| 4/4 workers] ``` Or pass a tqdm bar for more control: ```python from tqdm import tqdm with tqdm(unit=" rows") as pbar: table.add(data, progress=pbar) ``` Or use a callback for custom progress handling: ```python def on_progress(p): print(f"{p['output_rows']}/{p['total_rows']} rows, " f"{p['active_tasks']}/{p['total_tasks']} workers, " f"done={p['done']}") table.add(data, progress=on_progress) ``` In Rust: ```rust table.add(data) .progress(\|p\| println!("{}/{:?} rows", p.output_rows(), p.total_rows())) .execute() .await?; ``` ### Details - `WriteProgress` struct in Rust with getters for `elapsed`, `output_rows`, `output_bytes`, `total_rows`, `active_tasks`, `total_tasks`, and `done`. Fields are private behind getters so new fields can be added without breaking changes. - `WriteProgressTracker` tracks progress across parallel write tasks using a mutex for row/byte counts and atomics for active task counts. - Active task tracking uses an RAII guard pattern (`ActiveTaskGuard`) that increments on creation and decrements on drop. - For remote writes, `output_bytes` reflects IPC wire bytes rather than in-memory Arrow size. For local writes it uses in-memory Arrow size as a proxy (see TODO below). - tqdm postfix displays throughput (MB/s) and worker utilization (active/total). - The `done` callback always fires, even on error (via `FinishOnDrop`), so progress bars are always finalized. ### TODO - Track actual bytes written to disk for local tables. This requires Lance to expose a progress callback from its write path. See lance-format/lance#6247. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-23 16:14:13 -07:00
Prashanth Rao	ed7e01a58b	docs: fix rendering issues with missing index types in API docs (#3143 ) ## Problem The generated Python API docs for `lancedb.table.IndexStatistics.index_type` were misleading because mkdocstrings renders that field’s type annotation directly, and the existing `Literal[...]` listed only a subset of the actual canonical SDK index type strings. Current (missing index types): <img width="823" height="83" alt="image" src="https://github.com/user-attachments/assets/f6f29fe3-4c16-4d00-a4e9-28a7cd6e19ec" /> ## Fix - Update the `IndexStatistics.index_type` annotation in `python/python/lancedb/table.py` to include the full supported set of canonical values, so the generated docs show all valid index_type strings inline. - Add a small regression test in `python/python/tests/test_index.py` to ensure the docs-facing annotation does not drift silently again in case we add a new index/quantization type in the future. - Bumps mkdocs and material theme versions to mkdocs 1.6 to allow access to more features like hooks After fix (all index types are included and tested for in the annotations): <img width="1017" height="93" alt="image" src="https://github.com/user-attachments/assets/66c74d5c-34b3-4b44-8173-3ee23e3648ac" />	2026-03-20 09:34:42 -07:00
Lance Release	f5b21c0aa4	Bump version: 0.30.1-beta.0 → 0.30.1	2026-03-20 00:35:03 +00:00
Lance Release	e927924d26	Bump version: 0.30.0 → 0.30.1-beta.0	2026-03-20 00:35:02 +00:00
marca116	3a200d77ef	fix: pre-filtering on hybrid search (#3096 ) When using hybrid search with a where filter, the prefilter argument is silently inverted. Passing prefilter=True actually performs post-filtering, and prefilter=False actually performs pre-filtering.	2026-03-16 21:48:42 -07:00
Lance Release	c89240b16c	Bump version: 0.30.0-beta.6 → 0.30.0	2026-03-16 22:46:19 +00:00
Lance Release	099ff355a4	Bump version: 0.30.0-beta.5 → 0.30.0-beta.6	2026-03-16 22:46:17 +00:00

1 2 3 4 5 ...

980 Commits