lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2026-07-03 19:10:41 +00:00

Author	SHA1	Message	Date
Jack Ye	f54f5600ad	refactor(python): use namespace-backed rust connection	2026-04-18 17:09:13 -07:00
Jack Ye	e34fe84c7f	refactor(rust): align namespace client naming	2026-04-18 00:49:04 -07:00
Jack Ye	5b1f248257	fix(python): use rust namespace connection directly	2026-04-18 00:28:23 -07:00
Jack Ye	95e34d47b9	refactor(python): decouple namespace tables from temp connections	2026-04-17 20:30:52 -07:00
Jack Ye	a0defd448f	fix(python): handle namespace tables via resolved locations	2026-04-17 15:39:43 -07:00
Jack Ye	0fadb65153	refactor(python): document temporary namespace connections	2026-04-17 14:44:39 -07:00
Jack Ye	15fbcf61fc	fix(python): use declared location for namespace table open	2026-04-17 09:58:00 -07:00
Lance Release	d715bbb588	Bump version: 0.28.0-beta.6 → 0.28.0-beta.7	2026-04-17 08:12:27 +00:00
Lance Release	5ce3d8d141	Bump version: 0.31.0-beta.6 → 0.31.0-beta.7 python-v0.31.0-beta.7	2026-04-17 08:12:03 +00:00
Jack Ye	5eaac178b1	fix(python): pass namespace client on schema-only table create (#3283 ) ## Summary - pass `namespace_client` through the Python create-table path - ensure schema-only namespace table creation uses the namespace-aware empty-table flow - fix reopening namespace tables created without initial data	2026-04-17 01:11:18 -07:00
Lance Release	11af763fcd	Bump version: 0.28.0-beta.5 → 0.28.0-beta.6	2026-04-16 18:57:28 +00:00
Lance Release	2ed5452e1c	Bump version: 0.31.0-beta.5 → 0.31.0-beta.6 python-v0.31.0-beta.6	2026-04-16 18:57:05 +00:00
Xuanwo	b7c0b5987c	chore: upgrade lance to 6.0.0-beta.1 (#3281 )	2026-04-17 02:51:58 +08:00
Jack Ye	97a4b38f19	feat(rust): support nested namespace ops in listing db (#3279 ) ## Summary - delegate child-namespace `ListingDatabase` operations through an eagerly initialized `LanceNamespaceDatabase` - support nested namespace create/open/list/drop flows without requiring callers to inject explicit locations - add `namespace_client_properties` plumbing for local and namespace connections so directory namespace settings like `table_version_tracking_enabled` can be configured - add regression tests for nested namespace ops and namespace client property propagation	2026-04-16 10:12:28 -07:00
Gezi-lzq	10879d99b8	docs: fix broken documentation links (#3278 )	2026-04-15 20:56:59 +08:00
Lance Release	4e6a1d5dce	Bump version: 0.28.0-beta.4 → 0.28.0-beta.5	2026-04-12 23:51:14 +00:00
Lance Release	13d2759356	Bump version: 0.31.0-beta.4 → 0.31.0-beta.5 python-v0.31.0-beta.5	2026-04-12 23:50:50 +00:00
Jack Ye	7f52ec8c36	feat(python): support child namepsace operations and json serialization for LanceDBConnection (#3265 ) ## Summary Add connection serialization and child namespace support to `LanceDBConnection`. - `DBConnection.serialize()` / `lancedb.deserialize()` for connection reconstruction in remote workers - Cache `namespace_client()` in `LanceDBConnection` to avoid repeated DirectoryNamespace builds - `LanceDBConnection` transparently delegates child namespace operations (open_table, create_table, list_tables, drop_table, create_namespace, etc.) to `LanceNamespaceDBConnection` via `_namespace_conn()` - Root namespace operations still go through the original Rust path - Generic worker property override mechanism: any `namespace_client_properties` key prefixed with `_lancedb_worker_` has the prefix stripped and overrides the corresponding property when `deserialize(data, for_worker=True)` - `LanceNamespaceDBConnection` stores `namespace_client_impl`/`namespace_client_properties` for serialization roundtrip --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 16:49:45 -07:00
Lance Release	c6ae0de3ee	Bump version: 0.28.0-beta.3 → 0.28.0-beta.4	2026-04-12 03:57:58 +00:00
Lance Release	231f0655ce	Bump version: 0.31.0-beta.3 → 0.31.0-beta.4 python-v0.31.0-beta.4	2026-04-12 03:57:35 +00:00
LanceDB Robot	8c52977c59	chore: update lance dependency to v5.1.0-beta.3 (#3266 ) ## Summary - Bump Rust Lance dependencies to `v5.1.0-beta.3` using `ci/set_lance_version.py`. - Update Java `lance-core.version` to `5.1.0-beta.3` in `java/pom.xml`. - Refresh `Cargo.lock` metadata to the `v5.1.0-beta.3` Lance git tag. ## Verification - `cargo clippy --workspace --tests --all-features -- -D warnings` - `cargo fmt --all` ## Upstream Tag - https://github.com/lance-format/lance/releases/tag/v5.1.0-beta.3	2026-04-11 20:56:49 -07:00
Lance Release	359710a0bf	Bump version: 0.28.0-beta.2 → 0.28.0-beta.3	2026-04-11 22:44:52 +00:00
Lance Release	1f1726369d	Bump version: 0.31.0-beta.2 → 0.31.0-beta.3 python-v0.31.0-beta.3	2026-04-11 22:44:25 +00:00
Lance Release	df354abae4	Bump version: 0.28.0-beta.1 → 0.28.0-beta.2	2026-04-11 07:06:00 +00:00
Lance Release	11bc674548	Bump version: 0.31.0-beta.1 → 0.31.0-beta.2 python-v0.31.0-beta.2	2026-04-11 07:05:36 +00:00
LanceDB Robot	5593460823	chore: update lance dependency to v5.1.0-beta.2 (#3263 ) ## Summary - Bump Lance Rust workspace dependencies from `5.0.0-beta.5` to `5.1.0-beta.2` using `ci/set_lance_version.py`. - Update Java `lance-core.version` in `java/pom.xml` to `5.1.0-beta.2`. - Refresh `Cargo.lock` to match the new Lance tag. ## Verification - `cargo clippy --workspace --tests --all-features -- -D warnings` (passes) - `cargo fmt --all` (passes) ## Triggering Tag - https://github.com/lance-format/lance/releases/tag/v5.1.0-beta.2	2026-04-11 00:04:43 -07:00
Will Jones	2807ad6854	chore: bump Rust toolchain from 1.91.0 to 1.94.0 (#3257 ) Bumps the Rust toolchain to 1.94.0 (latest installed) to unblock CI failures caused by the AWS SDK's MSRV requirement. No lint fixes were needed. --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-10 07:57:47 -07:00
Dhruv Garg	4761fa9bcb	fix(python): migrate gemini-text provider to google-genai sdk (#3250 ) ## Summary - migrate gemini-text embedding provider from deprecated google.generativeai to google.genai - update Python embedding extra dependency to google-genai - update default model name to gemini-embedding-001 - adapt embed calls to Client().models.embed_content(...) - apply lint fixes from CI ## Related - Closes #3191	2026-04-09 15:28:34 -07:00
lennylxx	4c2939d66e	fix(python): guard against None before .decode() on split_names metadata key (#3229 ) `.get(b"split_names", None).decode()` was called unconditionally in both Permutations.__init__ and Permutation.from_tables(), crashing with AttributeError when schema metadata existed but lacked the split_names key. Guard the decode behind a None check and add regression tests.	2026-04-08 16:04:13 -07:00
yaommen	a813ce2f71	fix(python): sanitize bad vectors before Arrow cast (#3158 ) ## Problem `on_bad_vectors="drop"` is supposed to remove invalid vector rows before write, but for some schema-defined vector columns it can still fail later during Arrow cast instead of dropping the bad row. Repro: ```python class MySchema(LanceModel): text: str embedding: Vector(16) table = db.create_table("test", schema=MySchema) table.add( [ {"text": "hello", "embedding": []}, {"text": "bar", "embedding": [0.1] * 16}, ], on_bad_vectors="drop", ) ``` Before: ``` RuntimeError Arrow error: C Data interface error: Invalid: ListType can only be casted to FixedSizeListType if the lists are all the expected size. ``` After: ``` rows 1 texts ['bar'] ``` ## Solution Make bad-vector sanitization use schema dimensions before cast, while keeping the handling scoped to vector columns identified by schema metadata or existing vector-name heuristics. This also preserves existing integer vector inputs and avoids applying on_bad_vectors to unrelated fixed-size float columns. Fixes #1670 Signed-off-by: yaommen <myanstu@163.com>	2026-04-08 09:09:41 -07:00
Jack Ye	a898dc81c2	feat: add user_id field to ClientConfig for user identification (#3240 ) ## Summary - Add a `user_id` field to `ClientConfig` that allows users to identify themselves to LanceDB Cloud/Enterprise - The user_id is sent as the `x-lancedb-user-id` HTTP header in all requests - Supports three configuration methods: - Direct assignment via `ClientConfig.user_id` - Environment variable `LANCEDB_USER_ID` - Indirect env var lookup via `LANCEDB_USER_ID_ENV_KEY` Closes #3230 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-04-06 11:20:10 -07:00
Lance Release	de3f8097e7	Bump version: 0.28.0-beta.0 → 0.28.0-beta.1	2026-04-05 02:51:18 +00:00
Lance Release	0ac59de5f1	Bump version: 0.31.0-beta.0 → 0.31.0-beta.1 python-v0.31.0-beta.1	2026-04-05 02:50:52 +00:00
LanceDB Robot	d082c2d2ac	chore: update lance dependency to v5.0.0-beta.5 (#3237 ) ## Summary - update Rust Lance workspace dependencies to `v5.0.0-beta.5` using `ci/set_lance_version.py` - update Java `lance-core` dependency property to `5.0.0-beta.5` - refresh Cargo lockfile to the new Lance tag ## Verification - `cargo clippy --workspace --tests --all-features -- -D warnings` - `cargo fmt --all` ## Upstream Tag - https://github.com/lance-format/lance/releases/tag/v5.0.0-beta.5 --------- Co-authored-by: Jack Ye <yezhaoqin@gmail.com>	2026-04-04 19:49:51 -07:00
Zelys	9d8699f99e	feat(python): support Enum types in Pydantic to Arrow schema conversion (#3232 ) ## Summary Fixes #1846. Python `Enum` fields raised `TypeError: Converting Pydantic type to Arrow Type: unsupported type <enum 'SomethingTypes'>` when converting a Pydantic model to an Arrow schema. The fix adds Enum detection in `_pydantic_type_to_arrow_type`. When an Enum subclass is encountered, the value type of its members is inspected and mapped to the appropriate Arrow type: - `str`-valued enums (e.g. `class Status(str, Enum)`) → `pa.utf8()` - `int`-valued enums (e.g. `class Priority(int, Enum)`) → `pa.int64()` - Other homogeneous value types → the Arrow type for that Python type - Mixed-value or empty enums → `pa.utf8()` (safe fallback) This covers the common `(str, Enum)` and `(int, Enum)` mixin patterns used in practice. ## Changes - `python/python/lancedb/pydantic.py`: add Enum branch in `_pydantic_type_to_arrow_type` - `python/python/tests/test_pydantic.py`: add `test_enum_types` covering `str`, `int`, and `Optional` Enum fields ## Note on #2395 PR #2395 handles `StrEnum` (Python 3.11+) specifically, using a dictionary-encoded type. This PR handles the broader `(str, Enum)` / `(int, Enum)` mixin pattern that works across all Python versions and stores values as their natural Arrow type. AI assistance was used in developing this fix.	2026-04-03 10:40:49 -07:00
Lance Release	aa2c7b3591	Bump version: 0.27.2 → 0.28.0-beta.0	2026-04-03 08:45:56 +00:00
Lance Release	590c0c1e77	Bump version: 0.30.2 → 0.31.0-beta.0 python-v0.31.0-beta.0	2026-04-03 08:45:29 +00:00
LanceDB Robot	382ecd65e3	chore: update lance dependency to v5.0.0-beta.4 (#3234 ) ## Summary - Update Rust Lance workspace dependencies to `v5.0.0-beta.4` using `ci/set_lance_version.py` (including lockfile refresh). - Update Java `lance-core` dependency property to `5.0.0-beta.4` in `java/pom.xml`. ## Verification - `cargo clippy --workspace --tests --all-features -- -D warnings` - `cargo fmt --all` ## Triggering tag - https://github.com/lance-format/lance/releases/tag/v5.0.0-beta.4	2026-04-03 01:33:36 -07:00
Jack Ye	e26b22bcca	refactor!: consolidate namespace related naming and enterprise integration (#3205 ) 1. Refactored every client (Rust core, Python, Node/TypeScript) so “namespace” usage is explicit: code now keeps namespace paths (namespace_path) separate from namespace clients (namespace_client). Connections propagate the client, table creation routes through it, and managed versioning defaults are resolved from namespace metadata. Python gained LanceNamespaceDBConnection/async counterparts, and the namespace-focused tests were rewritten to match the clarified API surface. 2. Synchronized the workspace with Lance 5.0.0-beta.3 (see https://github.com/lance-format/lance/pull/6186 for the upstream namespace refactor), updating Cargo/uv lockfiles and ensuring all bindings align with the new namespace semantics. 3. Added a namespace-backed code path to lancedb.connect() via new keyword arguments (namespace_client_impl, namespace_client_properties, plus the existing pushdown-ops flag). When those kwargs are supplied, connect() delegates to connect_namespace, so users can opt into namespace clients without changing APIs. (The async helper will gain parity in a later change)	2026-04-03 00:09:03 -07:00
Lance Release	3ba46135a5	Bump version: 0.27.2-beta.2 → 0.27.2	2026-03-31 21:26:04 +00:00
Lance Release	f903d07887	Bump version: 0.27.2-beta.1 → 0.27.2-beta.2	2026-03-31 21:25:36 +00:00
Lance Release	5d550124bd	Bump version: 0.30.2-beta.2 → 0.30.2 python-v0.30.2	2026-03-31 21:25:04 +00:00
Lance Release	c57cb310a2	Bump version: 0.30.2-beta.1 → 0.30.2-beta.2	2026-03-31 21:25:02 +00:00
Dan Tasse	97754f5123	fix: change _client reference to _conn (#3188 ) This code previously referenced `self._client`, which does not exist. This change makes it correctly call `self._conn.close()`	2026-03-31 13:29:17 -07:00
Pratik Dey	7b1c063848	feat(python): add type-safe expression builder API (#3150 ) Introduces col(), lit(), func(), and Expr class as alternatives to raw SQL strings in .where() and .select(). Expressions are backed by DataFusion's Expr AST and serialized to SQL for remote table compat. Resolves: - https://github.com/lancedb/lancedb/issues/3044 (python api's) - https://github.com/lancedb/lancedb/issues/3043 (support for filter) - https://github.com/lancedb/lancedb/issues/3045 (support for projection) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-31 11:32:49 -07:00
Will Jones	c7f189f27b	chore: upgrade lance to stable 4.0.0 (#3207 ) Bumps all lance-* workspace dependencies from `4.0.0-rc.3` (git source) to the stable `4.0.0` release on crates.io, removing the `git`/`tag` overrides. No code changes were required — compiles and passes clippy cleanly. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-30 17:05:45 -07:00
yaommen	a0a2942ad5	fix: respect max_batch_length for Rust vector and hybrid queries (#3172 ) Fixes #1540 I could not reproduce this on current `main` from Python, but I could still reproduce it from the Rust SDK. Python no longer reproduces because the current Python vector/hybrid query paths re-chunk results into a `pyarrow.Table` before returning batches. Rust still reproduced because `max_batch_length` was passed into planning/scanning, but vector search could still emit larger `RecordBatch`es later in execution (for example after KNN / TopK), so it was not enforced on the final Rust output stream. This PR enforces `max_batch_length` on the final Rust query output stream and adds Rust regression coverage. Before the fix, the Rust repro produced: `num_batches=2, max_batch=8192, min_batch=1808, all_le_100=false` After the fix, the same repro produces batches `<= 100`. ## Runnable Rust repro Before this fix, current `main` could still return batches like `[8192, 1808]` here even with `max_batch_length = 100`: ```rust use std::sync::Arc; use arrow_array::{ types::Float32Type, FixedSizeListArray, RecordBatch, RecordBatchReader, StringArray, }; use arrow_schema::{DataType, Field, Schema}; use futures::TryStreamExt; use lancedb::query::{ExecutableQuery, QueryBase, QueryExecutionOptions}; #[tokio::main] async fn main() -> Result<(), Box<dyn std::error::Error>> { let tmp = tempfile::tempdir()?; let uri = tmp.path().to_str().unwrap(); let rows = 10_000; let schema = Arc::new(Schema::new(vec![ Field::new("id", DataType::Utf8, false), Field::new( "vector", DataType::FixedSizeList(Arc::new(Field::new("item", DataType::Float32, true)), 4), false, ), ])); let ids = StringArray::from_iter_values((0..rows).map(\|i\| format!("row-{i}"))); let vectors = FixedSizeListArray::from_iter_primitive::<Float32Type, _, _>( (0..rows).map(\|i\| Some(vec![Some(i as f32), Some(1.0), Some(2.0), Some(3.0)])), 4, ); let batch = RecordBatch::try_new(schema.clone(), vec![Arc::new(ids), Arc::new(vectors)])?; let reader: Box<dyn RecordBatchReader + Send> = Box::new( arrow_array::RecordBatchIterator::new(vec![Ok(batch)].into_iter(), schema), ); let db = lancedb::connect(uri).execute().await?; let table = db.create_table("test", reader).execute().await?; let mut opts = QueryExecutionOptions::default(); opts.max_batch_length = 100; let mut stream = table .query() .nearest_to(vec![0.0, 1.0, 2.0, 3.0])? .limit(rows) .execute_with_options(opts) .await?; let mut sizes = Vec::new(); while let Some(batch) = stream.try_next().await? { sizes.push(batch.num_rows()); } println!("{sizes:?}"); Ok(()) } ``` Signed-off-by: yaommen <myanstu@163.com>	2026-03-30 15:43:58 -07:00
Will Jones	e3d53dd185	fix(python): skip test_url_retrieve_downloads_image when PIL not installed (#3208 ) The test added in #3190 unconditionally imports `PIL`, which is an optional dependency. This causes CI failures in environments where Pillow isn't installed (`ModuleNotFoundError: No module named 'PIL'`). Use `pytest.importorskip` to skip gracefully when Pillow is unavailable. Fixes CI failure on main. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-30 14:48:49 -07:00
Will Jones	66804e99fc	fix(python): use correct exception types in namespace tests (#3206 ) ## Summary - Namespace tests expected `RuntimeError` for table-not-found and namespace-not-empty cases, but `lance_namespace` raises `TableNotFoundError` and `NamespaceNotEmptyError` which inherit from `Exception`, not `RuntimeError`. - Updated `pytest.raises` to use the correct exception types. ## Test plan - [x] CI passes on `test_namespace.py` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 12:55:54 -07:00
lennylxx	9f85d4c639	fix(embeddings): add missing urllib.request import in url_retrieve (#3190 ) url_retrieve() calls urllib.request.urlopen() but only urllib.error was imported, causing AttributeError for any HTTP URL input. This affects open-clip, siglip, and jinaai embedding functions when processing image URLs. The bug has existed since the embeddings API refactor (#580) but was masked because most users pass local file paths or bytes rather than HTTP URLs.	2026-03-30 12:03:44 -07:00

1 2 3 4 5 ...

2450 Commits