Adds a HuggingFace-style `map(fn)` method that applies fn to each row dict.
This complements `with_transform` (which operates on `pa.RecordBatch`) by
offering the more familiar per-row API for AI engineers.
Closeslancedb/lancedb#3246
Adds manifest_enabled for local/native connections so directory
namespace manifests can be the source of truth, including migration from
directory listing and Azure credential vending feature wiring. Also
exposes the option through Rust, Python, and Node bindings with focused
validation.
## Summary
- Update `rustls-webpki` 0.103.10 → 0.103.13 to fix RUSTSEC-2026-0104
(reachable panic in CRL parsing)
- Add advisory ignore for the legacy `rustls-webpki` 0.101.7 copy pinned
to the aws-smithy/rustls 0.21 chain (same chain already exempted for
RUSTSEC-2026-0098/0099)
Fixes the `deny` CI job failure seen in #3325.
## Test plan
- [x] `cargo deny check advisories` passes locally
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a `deny.toml` at the workspace root and a `deny` CI job that runs
`cargo deny check` on every PR. Catches yanked crates, license drift,
banned or wildcard dependencies, unapproved sources, and new RUSTSEC
advisories.
As part of wiring this up:
- Updated `aws-lc-rs` 1.13.0 → 1.16.3 / `aws-lc-sys` 0.28.0 → 0.40.0 to
clear four 2026 AWS-LC advisories (timing side-channel, PKCS7 bypass,
CRL scope). Removed the `=0.28.0` workaround pin; the original build
failure no longer reproduces.
- Updated `bytes`, `zlib-rs`, `rand`, `rustls-webpki`, `lz4_flex` to
clear their current advisories.
- Marked `lancedb-nodejs` and `lancedb-python` as `publish = false` and
pinned `lzma-sys` from `*` to `0.1` so `bans.wildcards = "deny"` can
be enforced.
10 remaining advisories have no safe upgrade available (transitive via
opendal, lance, datafusion, async-openai, aws-sdk on the legacy rustls
0.21 chain). Each is ignored in `deny.toml` with a per-entry rationale
and a link to the RUSTSEC advisory. New advisories still fail CI.
Fixes#3297
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
## Summary
- Replaces `LANCEDB_PYPI_API_TOKEN` (long-lived token) with OIDC trusted
publishing via `pypa/gh-action-pypi-publish`
- Adds `id-token: write` permission to linux/mac/windows jobs
- Removes `twine`-based upload and the `pypi_token` input from
`upload_wheel` composite action
- Enables PEP 740 Sigstore attestations on published wheels as a bonus
After merging, rotate/revoke the `LANCEDB_PYPI_API_TOKEN` secret.
Closes#3294🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds `.github/dependabot.yml` enabling weekly cargo update PRs for the
root workspace, which produces the Rust binaries we ship: the Node.js
and Python native extensions. The `rust/lancedb` library crate shares
the same lockfile — its consumers pick versions themselves, but bumping
transitive deps here keeps the shipped binaries current.
Also removes the misleading `exclude = ["python"]` line from the root
`Cargo.toml`: `python` is listed in `members`, and `cargo metadata`
confirms it's a workspace member, so the exclude was dead code that
implied the opposite.
Minor/patch updates are grouped to reduce PR noise.
Part of #3292. Only covers the cargo ecosystem; pip, npm, and
github-actions can follow.
## Summary
- make `TlsConfig::default()` enable hostname verification by default
- align the Rust default with the documented Python and Node behavior
- update the Rust unit test to lock in the safe default
This follows the Rust-side Tantivy removal by deleting the remaining
Python Tantivy runtime, tests, and packaging references.
It also turns the legacy Python-only Tantivy parameters into explicit
errors and stops reading legacy `_indices/fts` directories so Python FTS
is fully native-only.
Adds `permissions: contents: read` to the 10 workflows that had no
top-level permissions block. Workflows that already declared
permissions, or individual jobs that need elevated permissions (`issues:
write`, `pull-requests: write`, `contents: write`), are left unchanged.
Affected workflows: `dev.yml`, `java-publish.yml`, `java.yml`,
`license-header-check.yml`, `nodejs.yml`, `pypi-publish.yml`,
`python.yml`, `rust.yml`, `update_package_lock_run.yml`,
`update_package_lock_run_nodejs.yml`
Fixes#3269.
## What I observed
Using a reranker in a hybrid query could keep the Node.js process alive
even after `table.close()` and `db.close()`.
## Root cause
The reranker callback bridge used a `ThreadsafeFunction` in referenced
mode, which can keep the event loop alive longer than intended.
## Minimal fix
- In `nodejs/src/rerankers.rs`, create the reranker callback TSFN in
weak mode (`.weak::<true>()`).
- Add a regression test in `nodejs/__test__/rerankers.test.ts` that
spawns a child process, runs a rerank query, and asserts the process
exits naturally.
## Validation
- Built Node bindings successfully.
- Ran targeted tests: `rerankers.test.ts` passes (including new
regression test).
- Pre-commit checks for changed files were run and clean.
So far, I have been using a hacky approach that creates and opens
namespace-backed table, by getting its location and use a temporary
lancedb connection to create or open it. This was working for features
like credentials vending but is no longer fully working for the managed
versioning feature, recently geneva tests have been failing here and
there and various patches are not addressing the root cause. This PR
fully fixes this and implements proper rust binding for it.
Specifically:
- build a real Rust namespace-backed connection from the Python
namespace client
- route namespace table create/open through that connection instead of
resolved-location temp connections
- keep namespace client naming consistent in the Rust bridge and
preserve federated namespace + DuckDB behavior
## Summary
- pass `namespace_client` through the Python create-table path
- ensure schema-only namespace table creation uses the namespace-aware
empty-table flow
- fix reopening namespace tables created without initial data
## Summary
- delegate child-namespace `ListingDatabase` operations through an
eagerly initialized `LanceNamespaceDatabase`
- support nested namespace create/open/list/drop flows without requiring
callers to inject explicit locations
- add `namespace_client_properties` plumbing for local and namespace
connections so directory namespace settings like
`table_version_tracking_enabled` can be configured
- add regression tests for nested namespace ops and namespace client
property propagation
## Summary
Add connection serialization and child namespace support to
`LanceDBConnection`.
- `DBConnection.serialize()` / `lancedb.deserialize()` for connection
reconstruction in remote workers
- Cache `namespace_client()` in `LanceDBConnection` to avoid repeated
DirectoryNamespace builds
- `LanceDBConnection` transparently delegates child namespace operations
(open_table, create_table, list_tables, drop_table, create_namespace,
etc.) to `LanceNamespaceDBConnection` via `_namespace_conn()`
- Root namespace operations still go through the original Rust path
- Generic worker property override mechanism: any
`namespace_client_properties` key prefixed with `_lancedb_worker_` has
the prefix stripped and overrides the corresponding property when
`deserialize(data, for_worker=True)`
- `LanceNamespaceDBConnection` stores
`namespace_client_impl`/`namespace_client_properties` for serialization
roundtrip
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bumps the Rust toolchain to 1.94.0 (latest installed) to unblock CI
failures caused by the AWS SDK's MSRV requirement. No lint fixes were
needed.
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
## Summary
- migrate gemini-text embedding provider from deprecated
google.generativeai to google.genai
- update Python embedding extra dependency to google-genai
- update default model name to gemini-embedding-001
- adapt embed calls to Client().models.embed_content(...)
- apply lint fixes from CI
## Related
- Closes#3191
`.get(b"split_names", None).decode()` was called unconditionally in both
Permutations.__init__ and Permutation.from_tables(), crashing with
AttributeError when schema metadata existed but lacked the split_names
key. Guard the decode behind a None check and add regression tests.
## Problem
`on_bad_vectors="drop"` is supposed to remove invalid vector rows before
write, but for some schema-defined vector columns it can still fail
later during Arrow cast instead of dropping the bad row.
Repro:
```python
class MySchema(LanceModel):
text: str
embedding: Vector(16)
table = db.create_table("test", schema=MySchema)
table.add(
[
{"text": "hello", "embedding": []},
{"text": "bar", "embedding": [0.1] * 16},
],
on_bad_vectors="drop",
)
```
Before:
```
RuntimeError
Arrow error: C Data interface error: Invalid: ListType can only be casted to FixedSizeListType if the lists are all the expected size.
```
After:
```
rows 1
texts ['bar']
```
## Solution
Make bad-vector sanitization use schema dimensions before cast, while
keeping the handling scoped to vector columns identified by schema
metadata or existing vector-name heuristics.
This also preserves existing integer vector inputs and avoids applying
on_bad_vectors to unrelated fixed-size float columns.
Fixes#1670
Signed-off-by: yaommen <myanstu@163.com>
## Summary
- Add a `user_id` field to `ClientConfig` that allows users to identify
themselves to LanceDB Cloud/Enterprise
- The user_id is sent as the `x-lancedb-user-id` HTTP header in all
requests
- Supports three configuration methods:
- Direct assignment via `ClientConfig.user_id`
- Environment variable `LANCEDB_USER_ID`
- Indirect env var lookup via `LANCEDB_USER_ID_ENV_KEY`
Closes#3230🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
## Summary
Fixes#1846.
Python `Enum` fields raised `TypeError: Converting Pydantic type to
Arrow Type: unsupported type <enum 'SomethingTypes'>` when converting a
Pydantic model to an Arrow schema.
The fix adds Enum detection in `_pydantic_type_to_arrow_type`. When an
Enum subclass is encountered, the value type of its members is inspected
and mapped to the appropriate Arrow type:
- `str`-valued enums (e.g. `class Status(str, Enum)`) → `pa.utf8()`
- `int`-valued enums (e.g. `class Priority(int, Enum)`) → `pa.int64()`
- Other homogeneous value types → the Arrow type for that Python type
- Mixed-value or empty enums → `pa.utf8()` (safe fallback)
This covers the common `(str, Enum)` and `(int, Enum)` mixin patterns
used in practice.
## Changes
- `python/python/lancedb/pydantic.py`: add Enum branch in
`_pydantic_type_to_arrow_type`
- `python/python/tests/test_pydantic.py`: add `test_enum_types` covering
`str`, `int`, and `Optional` Enum fields
## Note on #2395
PR #2395 handles `StrEnum` (Python 3.11+) specifically, using a
dictionary-encoded type. This PR handles the broader `(str, Enum)` /
`(int, Enum)` mixin pattern that works across all Python versions and
stores values as their natural Arrow type.
AI assistance was used in developing this fix.