lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2026-07-05 20:10:39 +00:00

Author	SHA1	Message	Date
Xuanwo	57dbaf00a8	fix(python): dedupe language support notes on py310	2026-04-20 11:30:37 +08:00
Xuanwo	2c77e8b75f	fix(python): make lindera test fixture cwd-independent	2026-04-20 10:23:23 +08:00
Xuanwo	f54842ccaf	feat(python): support model-backed native FTS tokenizers	2026-04-20 10:03:16 +08:00
Xuanwo	c54888a83a	refactor(python): remove legacy tantivy FTS support (#3282 ) This follows the Rust-side Tantivy removal by deleting the remaining Python Tantivy runtime, tests, and packaging references. It also turns the legacy Python-only Tantivy parameters into explicit errors and stops reading legacy `_indices/fts` directories so Python FTS is fully native-only.	2026-04-20 09:28:45 +08:00
Will Jones	ba6c44abc9	ci: add top-level permissions to GHA workflows (#3255 ) Adds `permissions: contents: read` to the 10 workflows that had no top-level permissions block. Workflows that already declared permissions, or individual jobs that need elevated permissions (`issues: write`, `pull-requests: write`, `contents: write`), are left unchanged. Affected workflows: `dev.yml`, `java-publish.yml`, `java.yml`, `license-header-check.yml`, `nodejs.yml`, `pypi-publish.yml`, `python.yml`, `rust.yml`, `update_package_lock_run.yml`, `update_package_lock_run_nodejs.yml`	2026-04-20 09:22:27 +08:00
Lance Release	75b0a8e0a3	Bump version: 0.28.0-beta.8 → 0.28.0-beta.9	2026-04-19 20:39:29 +00:00
Lance Release	2a886141f7	Bump version: 0.31.0-beta.8 → 0.31.0-beta.9 python-v0.31.0-beta.9	2026-04-19 20:39:04 +00:00
Jack Ye	2a1df8edcf	fix(rust): materialize declared namespace tables on create (#3288 ) ## Summary - handle `declare_table` already-exists conflicts in the Rust namespace database create path - reuse declared-but-not-materialized table metadata instead of failing create mode - preserve overwrite behavior while allowing declared Geneva system tables to be materialized	2026-04-19 13:25:53 -07:00
C Kaustubh	fd98b845ea	fix(node): prevent reranker from keeping process alive (#3270 ) Fixes #3269. ## What I observed Using a reranker in a hybrid query could keep the Node.js process alive even after `table.close()` and `db.close()`. ## Root cause The reranker callback bridge used a `ThreadsafeFunction` in referenced mode, which can keep the event loop alive longer than intended. ## Minimal fix - In `nodejs/src/rerankers.rs`, create the reranker callback TSFN in weak mode (`.weak::<true>()`). - Add a regression test in `nodejs/__test__/rerankers.test.ts` that spawns a child process, runs a rerank query, and asserts the process exits naturally. ## Validation - Built Node bindings successfully. - Ran targeted tests: `rerankers.test.ts` passes (including new regression test). - Pre-commit checks for changed files were run and clean.	2026-04-19 14:02:23 +08:00
Lance Release	be48ada352	Bump version: 0.28.0-beta.7 → 0.28.0-beta.8	2026-04-19 04:19:10 +00:00
Lance Release	9ad2dfe601	Bump version: 0.31.0-beta.7 → 0.31.0-beta.8 python-v0.31.0-beta.8	2026-04-19 04:18:45 +00:00
Jack Ye	f909df3e87	fix(python): use namespace-backed rust connection for namespace tables (#3286 ) So far, I have been using a hacky approach that creates and opens namespace-backed table, by getting its location and use a temporary lancedb connection to create or open it. This was working for features like credentials vending but is no longer fully working for the managed versioning feature, recently geneva tests have been failing here and there and various patches are not addressing the root cause. This PR fully fixes this and implements proper rust binding for it. Specifically: - build a real Rust namespace-backed connection from the Python namespace client - route namespace table create/open through that connection instead of resolved-location temp connections - keep namespace client naming consistent in the Rust bridge and preserve federated namespace + DuckDB behavior	2026-04-18 21:17:52 -07:00
Lance Release	d715bbb588	Bump version: 0.28.0-beta.6 → 0.28.0-beta.7	2026-04-17 08:12:27 +00:00
Lance Release	5ce3d8d141	Bump version: 0.31.0-beta.6 → 0.31.0-beta.7 python-v0.31.0-beta.7	2026-04-17 08:12:03 +00:00
Jack Ye	5eaac178b1	fix(python): pass namespace client on schema-only table create (#3283 ) ## Summary - pass `namespace_client` through the Python create-table path - ensure schema-only namespace table creation uses the namespace-aware empty-table flow - fix reopening namespace tables created without initial data	2026-04-17 01:11:18 -07:00
Lance Release	11af763fcd	Bump version: 0.28.0-beta.5 → 0.28.0-beta.6	2026-04-16 18:57:28 +00:00
Lance Release	2ed5452e1c	Bump version: 0.31.0-beta.5 → 0.31.0-beta.6 python-v0.31.0-beta.6	2026-04-16 18:57:05 +00:00
Xuanwo	b7c0b5987c	chore: upgrade lance to 6.0.0-beta.1 (#3281 )	2026-04-17 02:51:58 +08:00
Jack Ye	97a4b38f19	feat(rust): support nested namespace ops in listing db (#3279 ) ## Summary - delegate child-namespace `ListingDatabase` operations through an eagerly initialized `LanceNamespaceDatabase` - support nested namespace create/open/list/drop flows without requiring callers to inject explicit locations - add `namespace_client_properties` plumbing for local and namespace connections so directory namespace settings like `table_version_tracking_enabled` can be configured - add regression tests for nested namespace ops and namespace client property propagation	2026-04-16 10:12:28 -07:00
Gezi-lzq	10879d99b8	docs: fix broken documentation links (#3278 )	2026-04-15 20:56:59 +08:00
Lance Release	4e6a1d5dce	Bump version: 0.28.0-beta.4 → 0.28.0-beta.5	2026-04-12 23:51:14 +00:00
Lance Release	13d2759356	Bump version: 0.31.0-beta.4 → 0.31.0-beta.5 python-v0.31.0-beta.5	2026-04-12 23:50:50 +00:00
Jack Ye	7f52ec8c36	feat(python): support child namepsace operations and json serialization for LanceDBConnection (#3265 ) ## Summary Add connection serialization and child namespace support to `LanceDBConnection`. - `DBConnection.serialize()` / `lancedb.deserialize()` for connection reconstruction in remote workers - Cache `namespace_client()` in `LanceDBConnection` to avoid repeated DirectoryNamespace builds - `LanceDBConnection` transparently delegates child namespace operations (open_table, create_table, list_tables, drop_table, create_namespace, etc.) to `LanceNamespaceDBConnection` via `_namespace_conn()` - Root namespace operations still go through the original Rust path - Generic worker property override mechanism: any `namespace_client_properties` key prefixed with `_lancedb_worker_` has the prefix stripped and overrides the corresponding property when `deserialize(data, for_worker=True)` - `LanceNamespaceDBConnection` stores `namespace_client_impl`/`namespace_client_properties` for serialization roundtrip --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 16:49:45 -07:00
Lance Release	c6ae0de3ee	Bump version: 0.28.0-beta.3 → 0.28.0-beta.4	2026-04-12 03:57:58 +00:00
Lance Release	231f0655ce	Bump version: 0.31.0-beta.3 → 0.31.0-beta.4 python-v0.31.0-beta.4	2026-04-12 03:57:35 +00:00
LanceDB Robot	8c52977c59	chore: update lance dependency to v5.1.0-beta.3 (#3266 ) ## Summary - Bump Rust Lance dependencies to `v5.1.0-beta.3` using `ci/set_lance_version.py`. - Update Java `lance-core.version` to `5.1.0-beta.3` in `java/pom.xml`. - Refresh `Cargo.lock` metadata to the `v5.1.0-beta.3` Lance git tag. ## Verification - `cargo clippy --workspace --tests --all-features -- -D warnings` - `cargo fmt --all` ## Upstream Tag - https://github.com/lance-format/lance/releases/tag/v5.1.0-beta.3	2026-04-11 20:56:49 -07:00
Lance Release	359710a0bf	Bump version: 0.28.0-beta.2 → 0.28.0-beta.3	2026-04-11 22:44:52 +00:00
Lance Release	1f1726369d	Bump version: 0.31.0-beta.2 → 0.31.0-beta.3 python-v0.31.0-beta.3	2026-04-11 22:44:25 +00:00
Lance Release	df354abae4	Bump version: 0.28.0-beta.1 → 0.28.0-beta.2	2026-04-11 07:06:00 +00:00
Lance Release	11bc674548	Bump version: 0.31.0-beta.1 → 0.31.0-beta.2 python-v0.31.0-beta.2	2026-04-11 07:05:36 +00:00
LanceDB Robot	5593460823	chore: update lance dependency to v5.1.0-beta.2 (#3263 ) ## Summary - Bump Lance Rust workspace dependencies from `5.0.0-beta.5` to `5.1.0-beta.2` using `ci/set_lance_version.py`. - Update Java `lance-core.version` in `java/pom.xml` to `5.1.0-beta.2`. - Refresh `Cargo.lock` to match the new Lance tag. ## Verification - `cargo clippy --workspace --tests --all-features -- -D warnings` (passes) - `cargo fmt --all` (passes) ## Triggering Tag - https://github.com/lance-format/lance/releases/tag/v5.1.0-beta.2	2026-04-11 00:04:43 -07:00
Will Jones	2807ad6854	chore: bump Rust toolchain from 1.91.0 to 1.94.0 (#3257 ) Bumps the Rust toolchain to 1.94.0 (latest installed) to unblock CI failures caused by the AWS SDK's MSRV requirement. No lint fixes were needed. --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-10 07:57:47 -07:00
Dhruv Garg	4761fa9bcb	fix(python): migrate gemini-text provider to google-genai sdk (#3250 ) ## Summary - migrate gemini-text embedding provider from deprecated google.generativeai to google.genai - update Python embedding extra dependency to google-genai - update default model name to gemini-embedding-001 - adapt embed calls to Client().models.embed_content(...) - apply lint fixes from CI ## Related - Closes #3191	2026-04-09 15:28:34 -07:00
lennylxx	4c2939d66e	fix(python): guard against None before .decode() on split_names metadata key (#3229 ) `.get(b"split_names", None).decode()` was called unconditionally in both Permutations.__init__ and Permutation.from_tables(), crashing with AttributeError when schema metadata existed but lacked the split_names key. Guard the decode behind a None check and add regression tests.	2026-04-08 16:04:13 -07:00
yaommen	a813ce2f71	fix(python): sanitize bad vectors before Arrow cast (#3158 ) ## Problem `on_bad_vectors="drop"` is supposed to remove invalid vector rows before write, but for some schema-defined vector columns it can still fail later during Arrow cast instead of dropping the bad row. Repro: ```python class MySchema(LanceModel): text: str embedding: Vector(16) table = db.create_table("test", schema=MySchema) table.add( [ {"text": "hello", "embedding": []}, {"text": "bar", "embedding": [0.1] * 16}, ], on_bad_vectors="drop", ) ``` Before: ``` RuntimeError Arrow error: C Data interface error: Invalid: ListType can only be casted to FixedSizeListType if the lists are all the expected size. ``` After: ``` rows 1 texts ['bar'] ``` ## Solution Make bad-vector sanitization use schema dimensions before cast, while keeping the handling scoped to vector columns identified by schema metadata or existing vector-name heuristics. This also preserves existing integer vector inputs and avoids applying on_bad_vectors to unrelated fixed-size float columns. Fixes #1670 Signed-off-by: yaommen <myanstu@163.com>	2026-04-08 09:09:41 -07:00
Jack Ye	a898dc81c2	feat: add user_id field to ClientConfig for user identification (#3240 ) ## Summary - Add a `user_id` field to `ClientConfig` that allows users to identify themselves to LanceDB Cloud/Enterprise - The user_id is sent as the `x-lancedb-user-id` HTTP header in all requests - Supports three configuration methods: - Direct assignment via `ClientConfig.user_id` - Environment variable `LANCEDB_USER_ID` - Indirect env var lookup via `LANCEDB_USER_ID_ENV_KEY` Closes #3230 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-04-06 11:20:10 -07:00
Lance Release	de3f8097e7	Bump version: 0.28.0-beta.0 → 0.28.0-beta.1	2026-04-05 02:51:18 +00:00
Lance Release	0ac59de5f1	Bump version: 0.31.0-beta.0 → 0.31.0-beta.1 python-v0.31.0-beta.1	2026-04-05 02:50:52 +00:00
LanceDB Robot	d082c2d2ac	chore: update lance dependency to v5.0.0-beta.5 (#3237 ) ## Summary - update Rust Lance workspace dependencies to `v5.0.0-beta.5` using `ci/set_lance_version.py` - update Java `lance-core` dependency property to `5.0.0-beta.5` - refresh Cargo lockfile to the new Lance tag ## Verification - `cargo clippy --workspace --tests --all-features -- -D warnings` - `cargo fmt --all` ## Upstream Tag - https://github.com/lance-format/lance/releases/tag/v5.0.0-beta.5 --------- Co-authored-by: Jack Ye <yezhaoqin@gmail.com>	2026-04-04 19:49:51 -07:00
Zelys	9d8699f99e	feat(python): support Enum types in Pydantic to Arrow schema conversion (#3232 ) ## Summary Fixes #1846. Python `Enum` fields raised `TypeError: Converting Pydantic type to Arrow Type: unsupported type <enum 'SomethingTypes'>` when converting a Pydantic model to an Arrow schema. The fix adds Enum detection in `_pydantic_type_to_arrow_type`. When an Enum subclass is encountered, the value type of its members is inspected and mapped to the appropriate Arrow type: - `str`-valued enums (e.g. `class Status(str, Enum)`) → `pa.utf8()` - `int`-valued enums (e.g. `class Priority(int, Enum)`) → `pa.int64()` - Other homogeneous value types → the Arrow type for that Python type - Mixed-value or empty enums → `pa.utf8()` (safe fallback) This covers the common `(str, Enum)` and `(int, Enum)` mixin patterns used in practice. ## Changes - `python/python/lancedb/pydantic.py`: add Enum branch in `_pydantic_type_to_arrow_type` - `python/python/tests/test_pydantic.py`: add `test_enum_types` covering `str`, `int`, and `Optional` Enum fields ## Note on #2395 PR #2395 handles `StrEnum` (Python 3.11+) specifically, using a dictionary-encoded type. This PR handles the broader `(str, Enum)` / `(int, Enum)` mixin pattern that works across all Python versions and stores values as their natural Arrow type. AI assistance was used in developing this fix.	2026-04-03 10:40:49 -07:00
Lance Release	aa2c7b3591	Bump version: 0.27.2 → 0.28.0-beta.0	2026-04-03 08:45:56 +00:00
Lance Release	590c0c1e77	Bump version: 0.30.2 → 0.31.0-beta.0 python-v0.31.0-beta.0	2026-04-03 08:45:29 +00:00
LanceDB Robot	382ecd65e3	chore: update lance dependency to v5.0.0-beta.4 (#3234 ) ## Summary - Update Rust Lance workspace dependencies to `v5.0.0-beta.4` using `ci/set_lance_version.py` (including lockfile refresh). - Update Java `lance-core` dependency property to `5.0.0-beta.4` in `java/pom.xml`. ## Verification - `cargo clippy --workspace --tests --all-features -- -D warnings` - `cargo fmt --all` ## Triggering tag - https://github.com/lance-format/lance/releases/tag/v5.0.0-beta.4	2026-04-03 01:33:36 -07:00
Jack Ye	e26b22bcca	refactor!: consolidate namespace related naming and enterprise integration (#3205 ) 1. Refactored every client (Rust core, Python, Node/TypeScript) so “namespace” usage is explicit: code now keeps namespace paths (namespace_path) separate from namespace clients (namespace_client). Connections propagate the client, table creation routes through it, and managed versioning defaults are resolved from namespace metadata. Python gained LanceNamespaceDBConnection/async counterparts, and the namespace-focused tests were rewritten to match the clarified API surface. 2. Synchronized the workspace with Lance 5.0.0-beta.3 (see https://github.com/lance-format/lance/pull/6186 for the upstream namespace refactor), updating Cargo/uv lockfiles and ensuring all bindings align with the new namespace semantics. 3. Added a namespace-backed code path to lancedb.connect() via new keyword arguments (namespace_client_impl, namespace_client_properties, plus the existing pushdown-ops flag). When those kwargs are supplied, connect() delegates to connect_namespace, so users can opt into namespace clients without changing APIs. (The async helper will gain parity in a later change)	2026-04-03 00:09:03 -07:00
Lance Release	3ba46135a5	Bump version: 0.27.2-beta.2 → 0.27.2	2026-03-31 21:26:04 +00:00
Lance Release	f903d07887	Bump version: 0.27.2-beta.1 → 0.27.2-beta.2	2026-03-31 21:25:36 +00:00
Lance Release	5d550124bd	Bump version: 0.30.2-beta.2 → 0.30.2 python-v0.30.2	2026-03-31 21:25:04 +00:00
Lance Release	c57cb310a2	Bump version: 0.30.2-beta.1 → 0.30.2-beta.2	2026-03-31 21:25:02 +00:00
Dan Tasse	97754f5123	fix: change _client reference to _conn (#3188 ) This code previously referenced `self._client`, which does not exist. This change makes it correctly call `self._conn.close()`	2026-03-31 13:29:17 -07:00
Pratik Dey	7b1c063848	feat(python): add type-safe expression builder API (#3150 ) Introduces col(), lit(), func(), and Expr class as alternatives to raw SQL strings in .where() and .select(). Expressions are backed by DataFusion's Expr AST and serialized to SQL for remote table compat. Resolves: - https://github.com/lancedb/lancedb/issues/3044 (python api's) - https://github.com/lancedb/lancedb/issues/3043 (support for filter) - https://github.com/lancedb/lancedb/issues/3045 (support for projection) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-31 11:32:49 -07:00

1 2 3 4 5 ...

2455 Commits