chore: update lance dependency to v8.0.0-beta.4

fix(python): run AsyncTable.search embeddings on a dedicated executor (#3459 )
## Summary `AsyncTable.search()` computes the query embedding with `loop.run_in_executor(None, ...)`, which uses asyncio's **default** `ThreadPoolExecutor`. That pool is shared with all other `run_in_executor(None, ...)` work, so a slow embedding call — a heavy local model or an HTTP request to an embeddings API — ties up those threads and starves unrelated async I/O under concurrent load. This moves the (potentially blocking) embedding call onto a **dedicated executor**, isolating it from the default pool. Closes #3310. ## Problem `python/lancedb/table.py`, `AsyncTable.search()`: ```python return ( await loop.run_in_executor( None, # asyncio's default executor, shared with other blocking I/O embedding.function.compute_query_embeddings_with_retry, query, ) )[0] ``` Under load, concurrent searches whose embeddings block (or any other code using the default executor) contend for the same small thread pool. ## Change - Add a dedicated `ThreadPoolExecutor(thread_name_prefix="lancedb-embedding")` in `background_loop.py`, exposed via `embedding_executor()`. - Use it in `AsyncTable.search()`'s `make_embedding` instead of the default executor. - Reset the executor in the existing `_reset_after_fork` hook — its worker threads don't survive `fork()`, same as the background event loop. It's recreated lazily, so this is cheap. ## Design notes The issue asked whether maintainers preferred a configurable executor, a dedicated internal one, or another approach (no response in the thread). I went with a **dedicated internal executor**: it fixes the starvation with no public API change and stays consistent with the existing `LOOP` singleton. Making the pool size configurable would be an easy follow-up if preferred. Scope is limited to `search()`. The broader "embedding functions need real async support" (including `add()`) is tracked separately in #3268. ## Testing - Added `test_async_search_runs_embedding_on_dedicated_executor`: patches the embedding function to record the executing thread during an async search and asserts it runs on a `lancedb-embedding` thread. Verified it **fails** against the previous `run_in_executor(None, ...)` and passes with the fix. - `ruff format`, `ruff check`, and `pyright` pass on the changed files.
2026-06-05 13:20:39 +00:00 · 2026-06-05 12:01:29 +00:00 · 2026-06-04 21:57:16 -07:00 · 2026-06-04 21:47:52 -05:00 · 2026-06-04 12:26:04 -05:00 · 2026-06-04 09:29:15 -07:00
58 changed files with 4833 additions and 2537 deletions
--- a/.agents/skills/README.md
+++ b/.agents/skills/README.md
@@ -0,0 +1,7 @@
+# Agent Skills
+
+This directory contains repo-scoped code agent skills for the LanceDB project.
+
+Each skill is a folder that contains a required `SKILL.md` and optional bundled resources.
+
+Codex discovers skills from `.agents/skills` in the current working directory and parent directories.
--- a/.agents/skills/lancedb-update-lance-dependency/SKILL.md
+++ b/.agents/skills/lancedb-update-lance-dependency/SKILL.md
@@ -0,0 +1,98 @@
+---
+name: lancedb-update-lance-dependency
+description: Update LanceDB to a specific Lance release or tag. Use when bumping Lance dependencies in the lancedb repository, including Rust workspace Lance crates, Java lance-core, validation, branch creation, commit, push, and PR creation when requested.
+---
+
+# LanceDB Update Lance Dependency
+
+## Scope
+
+Use this skill in the `lancedb/lancedb` repository when updating the Lance dependency to a specific Lance version or tag.
+
+Inputs can be a version (`7.2.0-beta.1`), a tag (`v7.2.0-beta.1`), a tag ref (`refs/tags/v7.2.0-beta.1`), or `latest`.
+
+## Workflow
+
+1. Confirm the worktree status with `git status --short`.
+2. Resolve the target Lance version:
+
+   - If the input is `latest`, empty, or omitted, run:
+
+     ```bash
+     python3 ci/check_lance_release.py
+     ```
+
+     Parse the JSON output. If `needs_update` is not `true`, stop without creating a PR. Otherwise use `latest_tag`.
+
+   - If the input is explicit, use it directly.
+
+3. Compute update metadata without changing files:
+
+   ```bash
+   python3 ci/update_lance_dependency.py "$TAG_OR_VERSION" --metadata-only
+   ```
+
+   Before making changes, check for an existing open PR with the emitted `pr_title`:
+
+   ```bash
+   gh pr list --search "\"$PR_TITLE\" in:title" --state open --limit 1 --json number,url,title
+   ```
+
+   If a matching open PR exists, stop and report it instead of creating a duplicate.
+
+4. Run the deterministic update entrypoint:
+
+   ```bash
+   python3 ci/update_lance_dependency.py "$TAG_OR_VERSION"
+   ```
+
+   This updates the Rust workspace Lance dependencies through `ci/set_lance_version.py`, updates `java/pom.xml`, refreshes Cargo metadata, and prints JSON metadata containing `branch_name`, `commit_message`, and `pr_title`.
+
+5. Run validation:
+
+   ```bash
+   cargo clippy --quiet --workspace --tests --all-features -- -D warnings
+   cargo fmt --all --quiet
+   ```
+
+   Fix real diagnostics and rerun clippy until it succeeds. Do not skip warnings.
+
+6. Inspect `git status --short` and `git diff` to ensure only the Lance dependency update and required compatibility fixes are present.
+
+7. If the task only asks to prepare local changes, stop here and report the changed files and validation result.
+
+8. If the task asks to publish the update, create a branch using the printed `branch_name`, stage all relevant files, and commit using the printed `commit_message`. Do not amend or rewrite existing commits.
+
+9. Push to `origin`. Before creating the PR, check that the current token has push permission:
+
+   ```bash
+   gh api repos/lancedb/lancedb --jq .permissions.push
+   ```
+
+   If the remote branch already exists for the same generated branch name, delete the remote ref with `gh api -X DELETE repos/lancedb/lancedb/git/refs/heads/$BRANCH_NAME`, then push. Do not force-push.
+
+10. Create a PR targeting `main` with the printed `pr_title`. If there is no PR template, keep the body to two or three concise sentences: state the Lance dependency bump, note any required compatibility fixes, and link the triggering Lance tag or release.
+
+11. Read back the remote PR title after creation. If it is not a Conventional Commit title, fix it immediately.
+
+12. When running in GitHub Actions after creating the LanceDB PR, trigger the Sophon dependency update:
+
+    ```bash
+    gh workflow run codex-bump-lancedb-lance.yml \
+      --repo lancedb/sophon \
+      -f lance_ref="$LANCE_TAG" \
+      -f lancedb_ref="$BRANCH_NAME"
+    gh run list --repo lancedb/sophon --workflow codex-bump-lancedb-lance.yml --limit 1 --json databaseId,url,displayTitle
+    ```
+
+    Use the emitted metadata `tag` value as `LANCE_TAG`. Do this only after a new LanceDB PR has been created. If the update was skipped because no update is needed or an open PR already exists, do not trigger Sophon.
+
+## GitHub Actions
+
+When this skill is used from GitHub Actions, `TAG`, `GH_TOKEN`, and `GITHUB_TOKEN` may already be set. Resolve `latest` first when `TAG` is empty. Once an explicit tag or version is known, use:
+
+```bash
+python3 ci/update_lance_dependency.py "$TAG" --github-output "$GITHUB_OUTPUT"
+```
+
+Then use the emitted `branch_name`, `commit_message`, and `pr_title` values for branch, commit, and PR creation.
--- a/.bumpversion.toml
+++ b/.bumpversion.toml
@@ -1,5 +1,5 @@
 [tool.bumpversion]
-current_version = "0.30.1-beta.0"
+current_version = "0.30.1-beta.2"
 parse = """(?x)
    (?P<major>0|[1-9]\\d*)\\.
    (?P<minor>0|[1-9]\\d*)\\.
--- a/.github/dependabot.yml
+++ b/.github/dependabot.yml
@@ -21,3 +21,14 @@ updates:
        update-types:
          - minor
          - patch
+
+  - package-ecosystem: pip
+    directory: /python
+    schedule:
+      interval: weekly
+    # Only update uv.lock, never widen version requirements in pyproject.toml.
+    versioning-strategy: lockfile-only
+    groups:
+      python-deps:
+        patterns:
+          - "*"
--- a/.github/workflows/codex-update-lance-dependency.yml
+++ b/.github/workflows/codex-update-lance-dependency.yml
@@ -4,14 +4,16 @@ on:
  workflow_call:
    inputs:
      tag:
-        description: "Tag name from Lance"
-        required: true
+        description: "Tag name from Lance. If omitted, the skill will use the latest Lance release that needs an update."
+        required: false
+        default: ""
        type: string
  workflow_dispatch:
    inputs:
      tag:
-        description: "Tag name from Lance"
-        required: true
+        description: "Tag name from Lance. Leave empty to use the latest Lance release that needs an update."
+        required: false
+        default: ""
        type: string

 permissions:
@@ -25,7 +27,7 @@ jobs:
    steps:
      - name: Show inputs
        run: |
-          echo "tag = ${{ inputs.tag }}"
+          echo "tag = ${{ inputs.tag || 'latest' }}"

      - name: Checkout Repo LanceDB
        uses: actions/checkout@v4
@@ -71,65 +73,21 @@ jobs:
          OPENAI_API_KEY: ${{ secrets.CODEX_TOKEN }}
        run: |
          set -euo pipefail
-          VERSION="${TAG#refs/tags/}"
-          VERSION="${VERSION#v}"
-          BRANCH_NAME="codex/update-lance-${VERSION//[^a-zA-Z0-9]/-}"
-
-          # Use "chore" for beta/rc versions, "feat" for stable releases
-          if [[ "${VERSION}" == *beta* ]] || [[ "${VERSION}" == *rc* ]]; then
-            COMMIT_TYPE="chore"
-          else
-            COMMIT_TYPE="feat"
-          fi
+          TARGET_TAG="${TAG:-latest}"

          cat <<EOF >/tmp/codex-prompt.txt
-          You are running inside the lancedb repository on a GitHub Actions runner. Update the Lance dependency to version ${VERSION} and prepare a pull request for maintainers to review.
+          You are running inside the lancedb repository on a GitHub Actions runner.

-          Follow these steps exactly:
-          1. Use script "ci/set_lance_version.py" to update Lance Rust dependencies. The script already refreshes Cargo metadata, so allow it to finish even if it takes time.
-          2. Update the Java lance-core dependency version in "java/pom.xml": change the "<lance-core.version>...</lance-core.version>" property to "${VERSION}".
-          3. Run "cargo clippy --workspace --tests --all-features -- -D warnings". If diagnostics appear, fix them yourself and rerun clippy until it exits cleanly. Do not skip any warnings.
-          4. After clippy succeeds, run "cargo fmt --all" to format the workspace.
-          5. Ensure the repository is clean except for intentional changes. Inspect "git status --short" and "git diff" to confirm the dependency update and any required fixes.
-          6. Create and switch to a new branch named "${BRANCH_NAME}" (replace any duplicated hyphens if necessary).
-          7. Stage all relevant files with "git add -A". Commit using the message "${COMMIT_TYPE}: update lance dependency to v${VERSION}".
-          8. Push the branch to origin. If the remote branch already exists, delete it first with "gh api -X DELETE repos/lancedb/lancedb/git/refs/heads/${BRANCH_NAME}" then push with "git push origin ${BRANCH_NAME}". Do NOT use "git push --force" or "git push -f".
-          9. env "GH_TOKEN" is available, use "gh" tools for github related operations like creating pull request.
-          10. Create a pull request targeting "main" with title "${COMMIT_TYPE}: update lance dependency to v${VERSION}". First, write the PR body to /tmp/pr-body.md using a heredoc (cat <<'EOF' > /tmp/pr-body.md). The body should summarize the dependency bump, clippy/fmt verification, and link the triggering tag (${TAG}). Then run "gh pr create --body-file /tmp/pr-body.md".
-          11. After creating the PR, display the PR URL, "git status --short", and a concise summary of the commands run and their results.
+          Use \$lancedb-update-lance-dependency with target "${TARGET_TAG}".

          Constraints:
-          - Use bash commands; avoid modifying GitHub workflow files other than through the scripted task above.
-          - Do not merge the PR.
-          - If any command fails, diagnose and fix the issue instead of aborting.
+          - Use env "GH_TOKEN" for GitHub operations.
+          - Do not merge the pull request.
+          - Do not force-push.
+          - Do not create a duplicate pull request if an open PR already exists for the target Lance version.
+          - If any command fails, diagnose and fix the root cause instead of aborting.
+          - After creating the PR, display the PR URL, "git status --short", and a concise summary of the commands run and their results.
          EOF

          printenv OPENAI_API_KEY | codex login --with-api-key
          codex --config shell_environment_policy.ignore_default_excludes=true exec --dangerously-bypass-approvals-and-sandbox "$(cat /tmp/codex-prompt.txt)"
-
-      - name: Trigger sophon dependency update
-        env:
-          TAG: ${{ inputs.tag }}
-          GH_TOKEN: ${{ secrets.ROBOT_TOKEN }}
-        run: |
-          set -euo pipefail
-          VERSION="${TAG#refs/tags/}"
-          VERSION="${VERSION#v}"
-          LANCEDB_BRANCH="codex/update-lance-${VERSION//[^a-zA-Z0-9]/-}"
-
-          echo "Triggering sophon workflow with:"
-          echo "  lance_ref: ${TAG#refs/tags/}"
-          echo "  lancedb_ref: ${LANCEDB_BRANCH}"
-
-          gh workflow run codex-bump-lancedb-lance.yml \
-            --repo lancedb/sophon \
-            -f lance_ref="${TAG#refs/tags/}" \
-            -f lancedb_ref="${LANCEDB_BRANCH}"
-
-      - name: Show latest sophon workflow run
-        env:
-          GH_TOKEN: ${{ secrets.ROBOT_TOKEN }}
-        run: |
-          set -euo pipefail
-          echo "Latest sophon workflow run:"
-          gh run list --repo lancedb/sophon --workflow codex-bump-lancedb-lance.yml --limit 1 --json databaseId,url,displayTitle
--- a/.github/workflows/lance-release-timer.yml
+++ b/.github/workflows/lance-release-timer.yml
@@ -1,62 +0,0 @@
-name: Lance Release Timer
-
-on:
-  schedule:
-    - cron: "*/10 * * * *"
-  workflow_dispatch:
-
-permissions:
-  contents: read
-  actions: write
-
-concurrency:
-  group: lance-release-timer
-  cancel-in-progress: false
-
-jobs:
-  trigger-update:
-    runs-on: ubuntu-latest
-    steps:
-      - name: Checkout repository
-        uses: actions/checkout@v4
-
-      - name: Check for new Lance tag
-        id: check
-        env:
-          GH_TOKEN: ${{ secrets.ROBOT_TOKEN }}
-        run: |
-          python3 ci/check_lance_release.py --github-output "$GITHUB_OUTPUT"
-
-      - name: Look for existing PR
-        if: steps.check.outputs.needs_update == 'true'
-        id: pr
-        env:
-          GH_TOKEN: ${{ secrets.ROBOT_TOKEN }}
-        run: |
-          set -euo pipefail
-          TITLE="chore: update lance dependency to v${{ steps.check.outputs.latest_version }}"
-          COUNT=$(gh pr list --search "\"$TITLE\" in:title" --state open --limit 1 --json number --jq 'length')
-          if [ "$COUNT" -gt 0 ]; then
-            echo "Open PR already exists for $TITLE"
-            echo "pr_exists=true" >> "$GITHUB_OUTPUT"
-          else
-            echo "No existing PR for $TITLE"
-            echo "pr_exists=false" >> "$GITHUB_OUTPUT"
-          fi
-
-      - name: Trigger codex update workflow
-        if: steps.check.outputs.needs_update == 'true' && steps.pr.outputs.pr_exists != 'true'
-        env:
-          GH_TOKEN: ${{ secrets.ROBOT_TOKEN }}
-        run: |
-          set -euo pipefail
-          TAG=${{ steps.check.outputs.latest_tag }}
-          gh workflow run codex-update-lance-dependency.yml -f tag=refs/tags/$TAG
-
-      - name: Show latest codex workflow run
-        if: steps.check.outputs.needs_update == 'true' && steps.pr.outputs.pr_exists != 'true'
-        env:
-          GH_TOKEN: ${{ secrets.ROBOT_TOKEN }}
-        run: |
-          set -euo pipefail
-          gh run list --workflow codex-update-lance-dependency.yml --limit 1 --json databaseId,url,displayTitle
--- a/Cargo.lock
+++ b/Cargo.lock
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -13,20 +13,20 @@ categories = ["database-implementations"]
 rust-version = "1.91.0"

 [workspace.dependencies]
-lance = { "version" = "=7.2.0-beta.3", default-features = false, "tag" = "v7.2.0-beta.3", "git" = "https://github.com/lance-format/lance.git" }
-lance-core = { "version" = "=7.2.0-beta.3", "tag" = "v7.2.0-beta.3", "git" = "https://github.com/lance-format/lance.git" }
-lance-datagen = { "version" = "=7.2.0-beta.3", "tag" = "v7.2.0-beta.3", "git" = "https://github.com/lance-format/lance.git" }
-lance-file = { "version" = "=7.2.0-beta.3", "tag" = "v7.2.0-beta.3", "git" = "https://github.com/lance-format/lance.git" }
-lance-io = { "version" = "=7.2.0-beta.3", default-features = false, "tag" = "v7.2.0-beta.3", "git" = "https://github.com/lance-format/lance.git" }
-lance-index = { "version" = "=7.2.0-beta.3", "tag" = "v7.2.0-beta.3", "git" = "https://github.com/lance-format/lance.git" }
-lance-linalg = { "version" = "=7.2.0-beta.3", "tag" = "v7.2.0-beta.3", "git" = "https://github.com/lance-format/lance.git" }
-lance-namespace = { "version" = "=7.2.0-beta.3", "tag" = "v7.2.0-beta.3", "git" = "https://github.com/lance-format/lance.git" }
-lance-namespace-impls = { "version" = "=7.2.0-beta.3", default-features = false, "tag" = "v7.2.0-beta.3", "git" = "https://github.com/lance-format/lance.git" }
-lance-table = { "version" = "=7.2.0-beta.3", "tag" = "v7.2.0-beta.3", "git" = "https://github.com/lance-format/lance.git" }
-lance-testing = { "version" = "=7.2.0-beta.3", "tag" = "v7.2.0-beta.3", "git" = "https://github.com/lance-format/lance.git" }
-lance-datafusion = { "version" = "=7.2.0-beta.3", "tag" = "v7.2.0-beta.3", "git" = "https://github.com/lance-format/lance.git" }
-lance-encoding = { "version" = "=7.2.0-beta.3", "tag" = "v7.2.0-beta.3", "git" = "https://github.com/lance-format/lance.git" }
-lance-arrow = { "version" = "=7.2.0-beta.3", "tag" = "v7.2.0-beta.3", "git" = "https://github.com/lance-format/lance.git" }
+lance = { "version" = "=8.0.0-beta.4", default-features = false, "tag" = "v8.0.0-beta.4", "git" = "https://github.com/lance-format/lance.git" }
+lance-core = { "version" = "=8.0.0-beta.4", "tag" = "v8.0.0-beta.4", "git" = "https://github.com/lance-format/lance.git" }
+lance-datagen = { "version" = "=8.0.0-beta.4", "tag" = "v8.0.0-beta.4", "git" = "https://github.com/lance-format/lance.git" }
+lance-file = { "version" = "=8.0.0-beta.4", "tag" = "v8.0.0-beta.4", "git" = "https://github.com/lance-format/lance.git" }
+lance-io = { "version" = "=8.0.0-beta.4", default-features = false, "tag" = "v8.0.0-beta.4", "git" = "https://github.com/lance-format/lance.git" }
+lance-index = { "version" = "=8.0.0-beta.4", "tag" = "v8.0.0-beta.4", "git" = "https://github.com/lance-format/lance.git" }
+lance-linalg = { "version" = "=8.0.0-beta.4", "tag" = "v8.0.0-beta.4", "git" = "https://github.com/lance-format/lance.git" }
+lance-namespace = { "version" = "=8.0.0-beta.4", "tag" = "v8.0.0-beta.4", "git" = "https://github.com/lance-format/lance.git" }
+lance-namespace-impls = { "version" = "=8.0.0-beta.4", default-features = false, "tag" = "v8.0.0-beta.4", "git" = "https://github.com/lance-format/lance.git" }
+lance-table = { "version" = "=8.0.0-beta.4", "tag" = "v8.0.0-beta.4", "git" = "https://github.com/lance-format/lance.git" }
+lance-testing = { "version" = "=8.0.0-beta.4", "tag" = "v8.0.0-beta.4", "git" = "https://github.com/lance-format/lance.git" }
+lance-datafusion = { "version" = "=8.0.0-beta.4", "tag" = "v8.0.0-beta.4", "git" = "https://github.com/lance-format/lance.git" }
+lance-encoding = { "version" = "=8.0.0-beta.4", "tag" = "v8.0.0-beta.4", "git" = "https://github.com/lance-format/lance.git" }
+lance-arrow = { "version" = "=8.0.0-beta.4", "tag" = "v8.0.0-beta.4", "git" = "https://github.com/lance-format/lance.git" }
 ahash = "0.8"
 # Note that this one does not include pyarrow
 arrow = { version = "58.0.0", optional = false }
--- a/REVIEW.md
+++ b/REVIEW.md
@@ -0,0 +1,26 @@
+# Code review guidelines
+
+Repo-specific guidance for automated PR reviews.
+
+## Cross-SDK parity
+
+LanceDB exposes the same core (`rust/lancedb`) through Python, TypeScript (`nodejs`),
+and Java bindings. Behavioral drift between SDKs is a recurring problem, so watch for
+parity gaps when reviewing — but only flag real ones:
+
+* If the change adds or modifies user-facing API or behavior in the shared core
+  (`rust/lancedb`), check whether each binding that should expose it (`python`,
+  `nodejs`) does. A core change with no corresponding binding update is worth a note.
+* If the change adds or modifies a public API in one SDK but not the other, open the
+  sibling SDK's corresponding module and state whether an equivalent exists. If not,
+  note it as a possible parity gap and suggest a follow-up issue.
+* For bug fixes, first read the sibling SDK's analogous code path to check whether the
+  same bug exists there. Only raise parity if it actually does. Do not ask to "port" a
+  fix for a bug that only ever existed in one binding.
+* Stay silent on internal-only refactors, tests, docs, and changes with no cross-SDK
+  surface.
+* Parity expectations apply to the Python and TypeScript (`nodejs`) SDKs. Java currently
+  implements only the remote table, not the local/embedded backend, so it is expected to
+  be partial — do not flag Java for missing local-only functionality.
+* Keep parity feedback to a short, clearly-labeled note (e.g. "Possible SDK parity
+  gap: …"). It is advisory, not a merge blocker.
--- a/ci/update_lance_dependency.py
+++ b/ci/update_lance_dependency.py
@@ -0,0 +1,126 @@
+#!/usr/bin/env python3
+"""Prepare a Lance dependency update for LanceDB."""
+
+from __future__ import annotations
+
+import argparse
+import json
+import re
+import subprocess
+import sys
+from pathlib import Path
+from typing import Sequence
+
+try:
+    from check_lance_release import parse_semver
+except ModuleNotFoundError:
+    # Supports importing as ci.update_lance_dependency from tests or ad hoc checks.
+    from ci.check_lance_release import parse_semver  # type: ignore
+
+
+def normalize_version(raw: str) -> str:
+    value = raw.strip()
+    value = value.removeprefix("refs/tags/")
+    value = value.removeprefix("v")
+    try:
+        parse_semver(value)
+    except ValueError:
+        raise ValueError(f"Unsupported Lance version or tag: {raw}")
+    return value
+
+
+def normalized_tag(version: str) -> str:
+    return f"v{version}"
+
+
+def branch_name(version: str) -> str:
+    suffix = re.sub(r"[^a-zA-Z0-9]+", "-", version).strip("-")
+    suffix = re.sub(r"-+", "-", suffix)
+    return f"codex/update-lance-{suffix}"
+
+
+def commit_type(version: str) -> str:
+    prerelease = version.split("-", maxsplit=1)[1] if "-" in version else ""
+    return "chore" if "beta" in prerelease or "rc" in prerelease else "feat"
+
+
+def metadata_for(version: str) -> dict[str, str]:
+    kind = commit_type(version)
+    message = f"{kind}: update lance dependency to v{version}"
+    return {
+        "version": version,
+        "tag": normalized_tag(version),
+        "branch_name": branch_name(version),
+        "commit_type": kind,
+        "commit_message": message,
+        "pr_title": message,
+    }
+
+
+def run_command(cmd: Sequence[str], *, cwd: Path) -> None:
+    subprocess.run(cmd, cwd=cwd, check=True)
+
+
+def update_java_lance_core_version(repo_root: Path, version: str) -> None:
+    pom_path = repo_root / "java" / "pom.xml"
+    contents = pom_path.read_text(encoding="utf-8")
+    updated, count = re.subn(
+        r"(<lance-core\.version>)[^<]+(</lance-core\.version>)",
+        rf"\g<1>{version}\g<2>",
+        contents,
+        count=1,
+    )
+    if count != 1:
+        raise RuntimeError(
+            "Expected exactly one <lance-core.version> entry in java/pom.xml"
+        )
+    pom_path.write_text(updated, encoding="utf-8")
+
+
+def write_github_outputs(path: str | None, payload: dict[str, str]) -> None:
+    if not path:
+        return
+    with open(path, "a", encoding="utf-8") as output:
+        for key, value in payload.items():
+            output.write(f"{key}={value}\n")
+
+
+def main(argv: Sequence[str] | None = None) -> int:
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument(
+        "tag_or_version",
+        help="Lance tag or version, for example refs/tags/v7.2.0-beta.1 or 7.2.0",
+    )
+    parser.add_argument(
+        "--repo-root",
+        type=Path,
+        default=Path(__file__).resolve().parents[1],
+        help="Path to the lancedb repository root",
+    )
+    parser.add_argument(
+        "--github-output",
+        default=None,
+        help="Optional GitHub Actions output file to receive metadata fields",
+    )
+    parser.add_argument(
+        "--metadata-only",
+        action="store_true",
+        help="Only print derived metadata; do not modify dependency files",
+    )
+    args = parser.parse_args(argv)
+
+    repo_root = args.repo_root.resolve()
+    version = normalize_version(args.tag_or_version)
+    payload = metadata_for(version)
+
+    if not args.metadata_only:
+        run_command([sys.executable, "ci/set_lance_version.py", version], cwd=repo_root)
+        update_java_lance_core_version(repo_root, version)
+
+    write_github_outputs(args.github_output, payload)
+    print(json.dumps(payload, sort_keys=True))
+    return 0
+
+
+if __name__ == "__main__":
+    sys.exit(main())
--- a/deny.toml
+++ b/deny.toml
@@ -147,6 +147,14 @@ allow = [
    "CDLA-Permissive-2.0",
 ]
 confidence-threshold = 0.8
+# Per-crate license exceptions: allow a license for a specific crate only,
+# rather than globally via the `allow` list above.
+exceptions = [
+    # CDDL-1.0 (copyleft) is pulled in only as a dev/profiling dependency via
+    # `inferno` -> `pprof` -> `lance-testing`; it is a test dependency that we
+    # do not distribute, so scope the allowance to `inferno` alone.
+    { allow = ["CDDL-1.0"], crate = "inferno" },
+]
 # Crates whose license cannot be determined from Cargo metadata but whose
 # license we've manually confirmed from upstream. Keep this list minimal.
 [[licenses.clarify]]
--- a/docs/src/java/java.md
+++ b/docs/src/java/java.md
@@ -14,7 +14,7 @@ Add the following dependency to your `pom.xml`:
 <dependency>
    <groupId>com.lancedb</groupId>
    <artifactId>lancedb-core</artifactId>
-    <version>0.30.1-beta.0</version>
+    <version>0.30.1-beta.2</version>
 </dependency>
 ```

--- a/docs/src/js/classes/Table.md
+++ b/docs/src/js/classes/Table.md
@@ -994,6 +994,29 @@ based on the row being updated (e.g. "my_col + 1")

 ***

+### updateFieldMetadata()
+
+```ts
+abstract updateFieldMetadata(updates): Promise<UpdateFieldMetadataResult>
+```
+
+Update per-field (column) metadata.
+
+#### Parameters
+
+* **updates**: [`FieldMetadataUpdate`](../interfaces/FieldMetadataUpdate.md)[]
+    One or more per-field updates. Each
+    update's metadata is merged into the field's existing metadata by default;
+    a value of `null` deletes that key, and `replace: true` swaps the whole map.
+
+#### Returns
+
+`Promise`&lt;[`UpdateFieldMetadataResult`](../interfaces/UpdateFieldMetadataResult.md)&gt;
+
+resolves to the new table version.
+
+***
+
 ### vectorSearch()

 ```ts
--- a/docs/src/js/globals.md
+++ b/docs/src/js/globals.md
@@ -65,6 +65,7 @@
 - [DropNamespaceOptions](interfaces/DropNamespaceOptions.md)
 - [DropNamespaceResponse](interfaces/DropNamespaceResponse.md)
 - [ExecutableQuery](interfaces/ExecutableQuery.md)
+- [FieldMetadataUpdate](interfaces/FieldMetadataUpdate.md)
 - [FragmentStatistics](interfaces/FragmentStatistics.md)
 - [FragmentSummaryStats](interfaces/FragmentSummaryStats.md)
 - [FtsOptions](interfaces/FtsOptions.md)
@@ -101,6 +102,7 @@
 - [TimeoutConfig](interfaces/TimeoutConfig.md)
 - [TlsConfig](interfaces/TlsConfig.md)
 - [TokenResponse](interfaces/TokenResponse.md)
+- [UpdateFieldMetadataResult](interfaces/UpdateFieldMetadataResult.md)
 - [UpdateOptions](interfaces/UpdateOptions.md)
 - [UpdateResult](interfaces/UpdateResult.md)
 - [Version](interfaces/Version.md)
--- a/docs/src/js/interfaces/FieldMetadataUpdate.md
+++ b/docs/src/js/interfaces/FieldMetadataUpdate.md
@@ -0,0 +1,41 @@
+[**@lancedb/lancedb**](../README.md) • **Docs**
+
+***
+
+[@lancedb/lancedb](../globals.md) / FieldMetadataUpdate
+
+# Interface: FieldMetadataUpdate
+
+A per-field metadata update, addressed by dot-path.
+
+## Properties
+
+### metadata
+
+```ts
+metadata: Record<string, null | string>;
+```
+
+Metadata key/value pairs. Merged into the field's existing metadata by
+default; a value of `null` deletes that key.
+
+***
+
+### path
+
+```ts
+path: string;
+```
+
+Dot-separated path to the field. For a top-level column this is just its
+name; for a nested field it's the path, e.g. "a.b.c".
+
+***
+
+### replace?
+
+```ts
+optional replace: boolean;
+```
+
+If true, replace the field's entire metadata map instead of merging.
--- a/docs/src/js/interfaces/UpdateFieldMetadataResult.md
+++ b/docs/src/js/interfaces/UpdateFieldMetadataResult.md
@@ -0,0 +1,15 @@
+[**@lancedb/lancedb**](../README.md) • **Docs**
+
+***
+
+[@lancedb/lancedb](../globals.md) / UpdateFieldMetadataResult
+
+# Interface: UpdateFieldMetadataResult
+
+## Properties
+
+### version
+
+```ts
+version: number;
+```
--- a/java/lancedb-core/pom.xml
+++ b/java/lancedb-core/pom.xml
@@ -8,7 +8,7 @@
    <parent>
      <groupId>com.lancedb</groupId>
      <artifactId>lancedb-parent</artifactId>
-      <version>0.30.1-beta.0</version>
+      <version>0.30.1-beta.2</version>
      <relativePath>../pom.xml</relativePath>
    </parent>

--- a/java/pom.xml
+++ b/java/pom.xml
@@ -6,7 +6,7 @@

    <groupId>com.lancedb</groupId>
    <artifactId>lancedb-parent</artifactId>
-    <version>0.30.1-beta.0</version>
+    <version>0.30.1-beta.2</version>
    <packaging>pom</packaging>
    <name>${project.artifactId}</name>
    <description>LanceDB Java SDK Parent POM</description>
@@ -28,7 +28,7 @@
    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <arrow.version>15.0.0</arrow.version>
-        <lance-core.version>7.2.0-beta.1</lance-core.version>
+        <lance-core.version>8.0.0-beta.4</lance-core.version>
        <spotless.skip>false</spotless.skip>
        <spotless.version>2.30.0</spotless.version>
        <spotless.java.googlejavaformat.version>1.7</spotless.java.googlejavaformat.version>
--- a/nodejs/Cargo.toml
+++ b/nodejs/Cargo.toml
@@ -1,7 +1,7 @@
 [package]
 name = "lancedb-nodejs"
 edition.workspace = true
-version = "0.30.1-beta.0"
+version = "0.30.1-beta.2"
 publish = false
 license.workspace = true
 description.workspace = true
--- a/nodejs/test/table.test.ts
+++ b/nodejs/test/table.test.ts
@@ -1571,6 +1571,33 @@ describe("schema evolution", function () {
    expect(await table.schema()).toEqual(expectedSchema3);
  });

+  it("can update field metadata", async function () {
+    const con = await connect(tmpDir.name);
+    const table = await con.createTable("fm", [
+      { id: 1, category: "a" },
+      { id: 2, category: "b" },
+    ]);
+
+    const res = await table.updateFieldMetadata([
+      { path: "category", metadata: { unit: "label", pii: "false" } },
+    ]);
+    expect(res).toHaveProperty("version");
+    expect(res.version).toBe(2);
+
+    let cat = (await table.schema()).fields.find((f) => f.name === "category");
+    expect(cat?.metadata.get("unit")).toBe("label");
+    expect(cat?.metadata.get("pii")).toBe("false");
+
+    // merge: add a key, delete one via null, keep the rest
+    await table.updateFieldMetadata([
+      { path: "category", metadata: { source: "import", pii: null } },
+    ]);
+    cat = (await table.schema()).fields.find((f) => f.name === "category");
+    expect(cat?.metadata.get("unit")).toBe("label"); // preserved
+    expect(cat?.metadata.get("source")).toBe("import"); // added
+    expect(cat?.metadata.has("pii")).toBe(false); // deleted
+  });
+
  it("can cast to various types", async function () {
    const con = await connect(tmpDir.name);

--- a/nodejs/lancedb/index.ts
+++ b/nodejs/lancedb/index.ts
@@ -42,6 +42,7 @@ export {
  AddResult,
  AddColumnsResult,
  AlterColumnsResult,
+  UpdateFieldMetadataResult,
  DeleteResult,
  DropColumnsResult,
  UpdateResult,
@@ -117,6 +118,7 @@ export {
  WriteProgress,
  LsmWriteSpec,
  ColumnAlteration,
+  FieldMetadataUpdate,
 } from "./table";

 export {
--- a/nodejs/lancedb/table.ts
+++ b/nodejs/lancedb/table.ts
@@ -32,6 +32,7 @@ import {
  OptimizeStats,
  TableStatistics,
  Tags,
+  UpdateFieldMetadataResult,
  UpdateResult,
  Table as _NativeTable,
 } from "./native";
@@ -508,6 +509,18 @@ export abstract class Table {
  abstract alterColumns(
    columnAlterations: ColumnAlteration[],
  ): Promise<AlterColumnsResult>;
+
+  /**
+   * Update per-field (column) metadata.
+   * @param {FieldMetadataUpdate[]} updates One or more per-field updates. Each
+   * update's metadata is merged into the field's existing metadata by default;
+   * a value of `null` deletes that key, and `replace: true` swaps the whole map.
+   * @returns {Promise<UpdateFieldMetadataResult>} resolves to the new table version.
+   */
+  abstract updateFieldMetadata(
+    updates: FieldMetadataUpdate[],
+  ): Promise<UpdateFieldMetadataResult>;
+
  /**
   * Drop one or more columns from the dataset
   *
@@ -1037,6 +1050,12 @@ export class LocalTable extends Table {
    return await this.inner.alterColumns(processedAlterations);
  }

+  async updateFieldMetadata(
+    updates: FieldMetadataUpdate[],
+  ): Promise<UpdateFieldMetadataResult> {
+    return await this.inner.updateFieldMetadata(updates);
+  }
+
  async dropColumns(columnNames: string[]): Promise<DropColumnsResult> {
    return await this.inner.dropColumns(columnNames);
  }
@@ -1203,3 +1222,19 @@ export interface ColumnAlteration {
  /** Set the new nullability. Note that a nullable column cannot be made non-nullable. */
  nullable?: boolean;
 }
+
+/** A per-field metadata update, addressed by dot-path. */
+export interface FieldMetadataUpdate {
+  /**
+   * Dot-separated path to the field. For a top-level column this is just its
+   * name; for a nested field it's the path, e.g. "a.b.c".
+   */
+  path: string;
+  /**
+   * Metadata key/value pairs. Merged into the field's existing metadata by
+   * default; a value of `null` deletes that key.
+   */
+  metadata: Record<string, string | null>;
+  /** If true, replace the field's entire metadata map instead of merging. */
+  replace?: boolean;
+}
--- a/nodejs/npm/darwin-arm64/package.json
+++ b/nodejs/npm/darwin-arm64/package.json
@@ -1,6 +1,6 @@
 {
 	"name": "@lancedb/lancedb-darwin-arm64",
-	"version": "0.30.1-beta.0",
+	"version": "0.30.1-beta.2",
 	"os": ["darwin"],
 	"cpu": ["arm64"],
 	"main": "lancedb.darwin-arm64.node",
--- a/nodejs/npm/linux-arm64-gnu/package.json
+++ b/nodejs/npm/linux-arm64-gnu/package.json
@@ -1,6 +1,6 @@
 {
 	"name": "@lancedb/lancedb-linux-arm64-gnu",
-	"version": "0.30.1-beta.0",
+	"version": "0.30.1-beta.2",
 	"os": ["linux"],
 	"cpu": ["arm64"],
 	"main": "lancedb.linux-arm64-gnu.node",
--- a/nodejs/npm/linux-arm64-musl/package.json
+++ b/nodejs/npm/linux-arm64-musl/package.json
@@ -1,6 +1,6 @@
 {
 	"name": "@lancedb/lancedb-linux-arm64-musl",
-	"version": "0.30.1-beta.0",
+	"version": "0.30.1-beta.2",
 	"os": ["linux"],
 	"cpu": ["arm64"],
 	"main": "lancedb.linux-arm64-musl.node",
--- a/nodejs/npm/linux-x64-gnu/package.json
+++ b/nodejs/npm/linux-x64-gnu/package.json
@@ -1,6 +1,6 @@
 {
 	"name": "@lancedb/lancedb-linux-x64-gnu",
-	"version": "0.30.1-beta.0",
+	"version": "0.30.1-beta.2",
 	"os": ["linux"],
 	"cpu": ["x64"],
 	"main": "lancedb.linux-x64-gnu.node",
--- a/nodejs/npm/linux-x64-musl/package.json
+++ b/nodejs/npm/linux-x64-musl/package.json
@@ -1,6 +1,6 @@
 {
 	"name": "@lancedb/lancedb-linux-x64-musl",
-	"version": "0.30.1-beta.0",
+	"version": "0.30.1-beta.2",
 	"os": ["linux"],
 	"cpu": ["x64"],
 	"main": "lancedb.linux-x64-musl.node",
--- a/nodejs/npm/win32-arm64-msvc/package.json
+++ b/nodejs/npm/win32-arm64-msvc/package.json
@@ -1,6 +1,6 @@
 {
  "name": "@lancedb/lancedb-win32-arm64-msvc",
-  "version": "0.30.1-beta.0",
+  "version": "0.30.1-beta.2",
  "os": [
    "win32"
  ],
--- a/nodejs/npm/win32-x64-msvc/package.json
+++ b/nodejs/npm/win32-x64-msvc/package.json
@@ -1,6 +1,6 @@
 {
 	"name": "@lancedb/lancedb-win32-x64-msvc",
-	"version": "0.30.1-beta.0",
+	"version": "0.30.1-beta.2",
 	"os": ["win32"],
 	"cpu": ["x64"],
 	"main": "lancedb.win32-x64-msvc.node",
--- a/nodejs/package-lock.json
+++ b/nodejs/package-lock.json
@@ -1,12 +1,12 @@
 {
  "name": "@lancedb/lancedb",
-  "version": "0.30.1-beta.0",
+  "version": "0.30.1-beta.2",
  "lockfileVersion": 3,
  "requires": true,
  "packages": {
    "": {
      "name": "@lancedb/lancedb",
-      "version": "0.30.1-beta.0",
+      "version": "0.30.1-beta.2",
      "cpu": [
        "x64",
        "arm64"
--- a/nodejs/package.json
+++ b/nodejs/package.json
@@ -11,7 +11,7 @@
    "ann"
  ],
  "private": false,
-  "version": "0.30.1-beta.0",
+  "version": "0.30.1-beta.2",
  "main": "dist/index.js",
  "exports": {
    ".": "./dist/index.js",
--- a/nodejs/src/table.rs
+++ b/nodejs/src/table.rs
@@ -5,8 +5,9 @@ use std::collections::HashMap;

 use lancedb::ipc::{ipc_file_to_batches, ipc_file_to_schema};
 use lancedb::table::{
-    AddDataMode, ColumnAlteration as LanceColumnAlteration, Duration, NewColumnTransform,
-    OptimizeAction, OptimizeOptions, Table as LanceDbTable,
+    AddDataMode, ColumnAlteration as LanceColumnAlteration, Duration,
+    FieldMetadataUpdate as LanceFieldMetadataUpdate, NewColumnTransform, OptimizeAction,
+    OptimizeOptions, Table as LanceDbTable,
 };
 use napi::bindgen_prelude::*;
 use napi::threadsafe_function::{ThreadsafeFunction, ThreadsafeFunctionCallMode};
@@ -355,6 +356,23 @@ impl Table {
        Ok(res.into())
    }

+    #[napi(catch_unwind)]
+    pub async fn update_field_metadata(
+        &self,
+        updates: Vec<FieldMetadataUpdate>,
+    ) -> napi::Result<UpdateFieldMetadataResult> {
+        let updates = updates
+            .into_iter()
+            .map(LanceFieldMetadataUpdate::from)
+            .collect::<Vec<_>>();
+        let res = self
+            .inner_ref()?
+            .update_field_metadata(&updates)
+            .await
+            .default_error()?;
+        Ok(res.into())
+    }
+
    #[napi(catch_unwind)]
    pub async fn drop_columns(&self, columns: Vec<String>) -> napi::Result<DropColumnsResult> {
        let col_refs = columns.iter().map(String::as_str).collect::<Vec<_>>();
@@ -747,6 +765,29 @@ pub struct ColumnAlteration {
    pub nullable: Option<bool>,
 }

+/// A per-field metadata update, addressed by dot-path. Merges into the field's
+/// existing metadata by default; a `null` value deletes a key, and `replace`
+/// swaps the field's entire metadata map.
+#[napi(object)]
+pub struct FieldMetadataUpdate {
+    /// Dot-separated path to the field (e.g. "embedding" or "a.b.c").
+    pub path: String,
+    /// Metadata keys to set; a `null` value deletes that key.
+    pub metadata: HashMap<String, Option<String>>,
+    /// If true, replace the field's entire metadata map instead of merging.
+    pub replace: Option<bool>,
+}
+
+impl From<FieldMetadataUpdate> for LanceFieldMetadataUpdate {
+    fn from(js: FieldMetadataUpdate) -> Self {
+        Self {
+            path: js.path,
+            metadata: js.metadata,
+            replace: js.replace.unwrap_or(false),
+        }
+    }
+}
+
 impl TryFrom<ColumnAlteration> for LanceColumnAlteration {
    type Error = String;
    fn try_from(js: ColumnAlteration) -> std::result::Result<Self, Self::Error> {
@@ -987,6 +1028,19 @@ impl From<lancedb::table::AlterColumnsResult> for AlterColumnsResult {
    }
 }

+#[napi(object)]
+pub struct UpdateFieldMetadataResult {
+    pub version: i64,
+}
+
+impl From<lancedb::table::UpdateFieldMetadataResult> for UpdateFieldMetadataResult {
+    fn from(value: lancedb::table::UpdateFieldMetadataResult) -> Self {
+        Self {
+            version: value.version as i64,
+        }
+    }
+}
+
 #[napi(object)]
 pub struct DropColumnsResult {
    pub version: i64,
--- a/python/.bumpversion.toml
+++ b/python/.bumpversion.toml
@@ -1,5 +1,5 @@
 [tool.bumpversion]
-current_version = "0.33.1-beta.0"
+current_version = "0.33.1-beta.2"
 parse = """(?x)
    (?P<major>0|[1-9]\\d*)\\.
    (?P<minor>0|[1-9]\\d*)\\.
--- a/python/Cargo.toml
+++ b/python/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "lancedb-python"
-version = "0.33.1-beta.0"
+version = "0.33.1-beta.2"
 publish = false
 edition.workspace = true
 description = "Python bindings for LanceDB"
--- a/python/python/lancedb/_lancedb.pyi
+++ b/python/python/lancedb/_lancedb.pyi
@@ -208,6 +208,9 @@ class Table:
    async def alter_columns(
        self, columns: list[dict[str, Any]]
    ) -> AlterColumnsResult: ...
+    async def update_field_metadata(
+        self, updates: list[dict[str, Any]]
+    ) -> UpdateFieldMetadataResult: ...
    async def optimize(
        self,
        *,
@@ -217,7 +220,6 @@ class Table:
    async def uri(self) -> str: ...
    async def initial_storage_options(self) -> Optional[Dict[str, str]]: ...
    async def latest_storage_options(self) -> Optional[Dict[str, str]]: ...
-    async def _table_reopen_state(self) -> Dict[str, Any]: ...
    async def set_unenforced_primary_key(self, columns: List[str]) -> None: ...
    async def set_lsm_write_spec(self, spec: LsmWriteSpec) -> None: ...
    async def unset_lsm_write_spec(self) -> None: ...
@@ -461,6 +463,9 @@ class AddColumnsResult:
 class AlterColumnsResult:
    version: int

+class UpdateFieldMetadataResult:
+    version: int
+
 class DropColumnsResult:
    version: int

--- a/python/python/lancedb/background_loop.py
+++ b/python/python/lancedb/background_loop.py
@@ -2,6 +2,7 @@
 # SPDX-FileCopyrightText: Copyright The LanceDB Authors

 import asyncio
+import concurrent.futures
 import os
 import threading
 import warnings
@@ -37,6 +38,24 @@ class BackgroundEventLoop:

 LOOP = BackgroundEventLoop()

+
+def _new_embedding_executor() -> concurrent.futures.ThreadPoolExecutor:
+    return concurrent.futures.ThreadPoolExecutor(thread_name_prefix="lancedb-embedding")
+
+
+# Embedding functions can block for a long time -- a heavy local model or an
+# HTTP request to a remote embeddings API. Running them on asyncio's default
+# executor lets them starve the unrelated blocking I/O that shares that pool,
+# so they get a dedicated one. See
+# https://github.com/lancedb/lancedb/issues/3310.
+_EMBEDDING_EXECUTOR = _new_embedding_executor()
+
+
+def embedding_executor() -> concurrent.futures.ThreadPoolExecutor:
+    """Return the executor dedicated to running blocking embedding calls."""
+    return _EMBEDDING_EXECUTOR
+
+
 _FORK_WARNED = False


@@ -47,6 +66,12 @@ def _reset_after_fork():
    # the new state. The Rust-side tokio runtime is reset analogously by a
    # pthread_atfork hook installed in the _lancedb extension.
    LOOP._start()
+    # The embedding executor's worker threads are dead in the child as well.
+    # Replace it with a fresh pool (threads are spawned lazily, so this is
+    # cheap); we don't shut down the old one, since joining its dead workers
+    # could hang.
+    global _EMBEDDING_EXECUTOR
+    _EMBEDDING_EXECUTOR = _new_embedding_executor()
    global _FORK_WARNED
    if not _FORK_WARNED:
        _FORK_WARNED = True
--- a/python/python/lancedb/permutation.py
+++ b/python/python/lancedb/permutation.py
@@ -358,28 +358,36 @@ DEFAULT_BATCH_SIZE = 100
 def _table_to_pickle_state(table: Table) -> dict[str, Any]:
    from .remote.table import RemoteTable

-    if isinstance(table, LanceTable) and table._conn.uri.startswith("memory://"):
+    if isinstance(table, RemoteTable):
+        return {
+            "kind": "remote",
+            "table": table,
+        }
+
+    if not isinstance(table, LanceTable):
+        raise ValueError(f"Cannot pickle table of type {type(table)!r}")
+
+    base_uri = table._conn.uri
+    if base_uri.startswith("memory://"):
        return {
            "kind": "memory",
            "name": table.name,
            "data": table.to_arrow(),
        }

-    if isinstance(table, (LanceTable, RemoteTable)):
-        return {
-            "kind": "table",
-            "table": table,
-        }
-
-    raise ValueError(f"Cannot pickle table of type {type(table)!r}")
+    return {
+        "kind": "local",
+        "name": table.name,
+        "uri": base_uri,
+        "namespace": table._namespace_path,
+        "storage_options": table._conn.storage_options,
+    }


 def _table_from_pickle_state(state: dict[str, Any]) -> Table:
    from . import connect

    kind = state["kind"]
-    if kind == "table":
-        return state["table"]
    if kind == "remote":
        return state["table"]
    if kind == "memory":
--- a/python/python/lancedb/query.py
+++ b/python/python/lancedb/query.py
@@ -41,6 +41,14 @@ from .rerankers.rrf import RRFReranker
 from .rerankers.util import check_reranker_result
 from .util import flatten_columns

+BlobMode = Literal["lazy", "bytes", "descriptions"]
+
+_BLOB_MODE_TO_HANDLING = {
+    "lazy": "blobs_descriptions",
+    "bytes": "all_binary",
+    "descriptions": "blobs_descriptions",
+}
+
 if TYPE_CHECKING:
    import sys

@@ -55,7 +63,7 @@ if TYPE_CHECKING:
    from ._lancedb import VectorQuery as LanceVectorQuery
    from .common import VEC
    from .pydantic import LanceModel
-    from .table import Table
+    from .table import AsyncTable, Table

    if sys.version_info >= (3, 11):
        from typing import Self
@@ -65,6 +73,179 @@ if TYPE_CHECKING:
 T = TypeVar("T", bound="LanceModel")


+def _validate_blob_mode(blob_mode: BlobMode) -> None:
+    if blob_mode not in _BLOB_MODE_TO_HANDLING:
+        modes = ", ".join(repr(mode) for mode in _BLOB_MODE_TO_HANDLING)
+        raise ValueError(f"blob_mode must be one of {modes}, got {blob_mode!r}")
+
+
+def _field_is_blob(field: pa.Field) -> bool:
+    metadata = field.metadata or {}
+    return metadata.get(b"lance-encoding:blob") == b"true" or (
+        metadata.get("lance-encoding:blob") == "true"
+    )
+
+
+def _schema_has_blob_field(schema: pa.Schema) -> bool:
+    return any(_field_is_blob(field) for field in schema)
+
+
+def _blob_mode_requires_native_pandas(blob_mode: BlobMode, schema: pa.Schema) -> bool:
+    return blob_mode in _BLOB_MODE_TO_HANDLING and _schema_has_blob_field(schema)
+
+
+def _unsupported_blob_pandas_error(reason: str) -> RuntimeError:
+    return RuntimeError(
+        "blob columns require Lance native scanner conversion for query "
+        f"to_pandas(), but {reason}. Use a plain scan query or remove blob "
+        "columns from the projection."
+    )
+
+
+def _query_is_plain_scan(query: Query) -> bool:
+    return (
+        query.vector is None
+        and query.full_text_query is None
+        and not query.postfilter
+        and not query.order_by
+    )
+
+
+def _filter_to_sql(filter: Optional[Union[str, Expr]]) -> Optional[str]:
+    if filter is None:
+        return None
+    if isinstance(filter, Expr):
+        return filter.to_sql()
+    return filter
+
+
+def _projection_to_scanner_kwargs(
+    columns: Optional[
+        Union[
+            List[str], List[Tuple[str, Union[str, Expr]]], Dict[str, Union[str, Expr]]
+        ]
+    ],
+) -> Dict[str, Any]:
+    if columns is None:
+        return {}
+    if isinstance(columns, list):
+        if all(isinstance(column, str) for column in columns):
+            return {"columns": columns}
+        if all(isinstance(column, tuple) and len(column) == 2 for column in columns):
+            return {
+                "columns": {
+                    name: expr.to_sql() if isinstance(expr, Expr) else expr
+                    for name, expr in columns
+                }
+            }
+        # Let Lance raise the detailed projection validation error.
+        return {"columns": columns}
+
+    projection = {}
+    for name, expr in columns.items():
+        if isinstance(expr, Expr):
+            expr = expr.to_sql()
+        projection[name] = expr
+    return {"columns": projection}
+
+
+def _scanner_kwargs_for_query(
+    query: Query, blob_mode: BlobMode, dataset: Optional[Any] = None
+) -> Dict[str, Any]:
+    fragments = _scanner_fragments_for_query(query, dataset)
+    kwargs = {
+        **_projection_to_scanner_kwargs(query.columns),
+        "filter": _filter_to_sql(query.filter),
+        "limit": query.limit,
+        "offset": query.offset,
+        "with_row_id": query.with_row_id,
+        "with_row_address": query.with_row_address,
+        "fast_search": query.fast_search,
+        "blob_handling": _BLOB_MODE_TO_HANDLING[blob_mode],
+        "fragments": fragments,
+    }
+    return {key: value for key, value in kwargs.items() if value is not None}
+
+
+def _scanner_fragments_for_query(query: Query, dataset: Optional[Any]) -> Optional[Any]:
+    if query.fragments is not None and query.fragment_ids is not None:
+        raise ValueError("fragments and fragment_ids cannot both be set")
+    if query.fragments is not None:
+        return query.fragments
+    if query.fragment_ids is None:
+        return None
+    if dataset is None:
+        raise ValueError("fragment_ids require a Lance dataset")
+
+    requested = set(query.fragment_ids)
+    fragments = [
+        fragment
+        for fragment in dataset.get_fragments()
+        if fragment.fragment_id in requested
+    ]
+    found = {fragment.fragment_id for fragment in fragments}
+    missing = requested - found
+    if missing:
+        missing_ids = ", ".join(str(fragment_id) for fragment_id in sorted(missing))
+        raise ValueError(f"fragment_ids not found in dataset: {missing_ids}")
+    return fragments
+
+
+def _ensure_lazy_blob_frame(
+    df: "pd.DataFrame", schema: pa.Schema, blob_mode: BlobMode
+) -> "pd.DataFrame":
+    if blob_mode != "lazy" or not _schema_has_blob_field(schema) or len(df) == 0:
+        return df
+
+    for field in schema:
+        if not _field_is_blob(field) or field.name not in df.columns:
+            continue
+        value = df[field.name].iloc[0]
+        if value is not None and not hasattr(value, "readall"):
+            raise _unsupported_blob_pandas_error(
+                "the Lance scanner did not return lazy blob files"
+            )
+    return df
+
+
+def _scanner_to_table(scanner: Any) -> pa.Table:
+    if hasattr(scanner, "to_pyarrow"):
+        reader = scanner.to_pyarrow()
+        return reader.read_all()
+    if hasattr(scanner, "to_table"):
+        return scanner.to_table()
+    reader = scanner.to_reader()
+    return reader.read_all()
+
+
+def _scanner_to_pandas(scanner: Any, blob_mode: BlobMode, **kwargs) -> "pd.DataFrame":
+    schema = getattr(scanner, "projected_schema", None)
+    if schema is None:
+        schema = getattr(scanner, "schema", None)
+    if schema is None:
+        schema = getattr(scanner, "dataset_schema", None)
+    if callable(schema):
+        schema = schema()
+    if hasattr(scanner, "to_pandas"):
+        try:
+            df = scanner.to_pandas(blob_mode=blob_mode, **kwargs)
+        except TypeError as err:
+            message = str(err)
+            if "blob_mode" not in message and "unexpected keyword" not in message:
+                raise
+            df = scanner.to_pandas(**kwargs)
+        if schema is not None:
+            return _ensure_lazy_blob_frame(df, schema, blob_mode)
+        return df
+
+    tbl = _scanner_to_table(scanner)
+    if blob_mode == "lazy" and _schema_has_blob_field(tbl.schema):
+        raise _unsupported_blob_pandas_error(
+            "the Lance scanner does not expose to_pandas"
+        )
+    return tbl.to_pandas(**kwargs)
+
+
 # Pydantic validation function for vector queries
 def ensure_vector_query(
    val: Any,
@@ -499,6 +680,13 @@ class Query(pydantic.BaseModel):
    # if true, include the row id in the results
    with_row_id: Optional[bool] = None

+    # if true, include the row address in the results
+    with_row_address: Optional[bool] = None
+
+    # Lance fragments or fragment ids to scan on scanner-backed plain queries
+    fragments: Optional[Any] = None
+    fragment_ids: Optional[List[int]] = None
+
    # offset to start fetching results from
    offset: Optional[int] = None

@@ -691,6 +879,9 @@ class LanceQueryBuilder(ABC):
        self._where = None
        self._postfilter = None
        self._with_row_id = None
+        self._with_row_address = None
+        self._fragments = None
+        self._fragment_ids = None
        self._vector = None
        self._text = None
        self._ef = None
@@ -718,6 +909,7 @@ class LanceQueryBuilder(ABC):
        self,
        flatten: Optional[Union[int, bool]] = None,
        *,
+        blob_mode: BlobMode = "lazy",
        timeout: Optional[timedelta] = None,
        **kwargs,
    ) -> "pd.DataFrame":
@@ -737,11 +929,41 @@ class LanceQueryBuilder(ABC):
        timeout: Optional[timedelta]
            The maximum time to wait for the query to complete.
            If None, wait indefinitely.
+        blob_mode: str, default "lazy"
+            Controls how blob columns are returned for plain scan queries.
+            Vector, FTS, hybrid, and other non-native query shapes keep the
+            existing Arrow conversion path and only support blob descriptions.
        **kwargs
            Forwarded to pyarrow.Table.to_pandas after query execution and
            optional flattening.
        """
+        _validate_blob_mode(blob_mode)
+        output_schema = getattr(self, "output_schema", None)
+        if output_schema is not None:
+            schema = output_schema()
+            if _blob_mode_requires_native_pandas(blob_mode, schema):
+                native_error = None
+                if (flatten is None or blob_mode == "descriptions") and timeout is None:
+                    try:
+                        df = self._plain_scan_to_pandas(
+                            blob_mode, flatten=flatten, **kwargs
+                        )
+                        if df is not None:
+                            return df
+                    except Exception as err:
+                        native_error = err
+                reason = (
+                    "this query shape cannot use Lance native pandas conversion"
+                    if native_error is None
+                    else str(native_error)
+                )
+                raise _unsupported_blob_pandas_error(reason) from native_error
+
        tbl = flatten_columns(self.to_arrow(timeout=timeout), flatten)
+        if _blob_mode_requires_native_pandas(blob_mode, tbl.schema):
+            raise _unsupported_blob_pandas_error(
+                "this query shape cannot use Lance native pandas conversion"
+            )
        return tbl.to_pandas(**kwargs)

    @abstractmethod
@@ -947,6 +1169,32 @@ class LanceQueryBuilder(ABC):
        self._with_row_id = with_row_id
        return self

+    def with_row_address(self, with_row_address: bool = True) -> Self:
+        """Set whether to return row addresses.
+
+        Parameters
+        ----------
+        with_row_address: bool, default True
+            If True, return the _rowaddr column in the results.
+
+        Returns
+        -------
+        LanceQueryBuilder
+            The LanceQueryBuilder object.
+        """
+        self._with_row_address = with_row_address
+        return self
+
+    def with_fragments(self, fragments: Any) -> Self:
+        """Set the Lance fragments to scan for plain scanner-backed queries."""
+        self._fragments = fragments
+        return self
+
+    def fragment_ids(self, fragment_ids: List[int]) -> Self:
+        """Set the Lance fragment ids to scan for plain scanner-backed queries."""
+        self._fragment_ids = fragment_ids
+        return self
+
    def explain_plan(self, verbose: Optional[bool] = False) -> str:
        """Return the execution plan for this query.

@@ -1086,6 +1334,25 @@ class LanceQueryBuilder(ABC):
        """
        raise NotImplementedError

+    def _plain_scan_to_pandas(
+        self,
+        blob_mode: BlobMode,
+        flatten: Optional[Union[int, bool]] = None,
+        **kwargs,
+    ) -> Optional["pd.DataFrame"]:
+        query = self.to_query_object()
+        if not _query_is_plain_scan(query):
+            return None
+
+        dataset = self._table.to_lance()
+        scanner = dataset.scanner(
+            **_scanner_kwargs_for_query(query, blob_mode, dataset)
+        )
+        if flatten is not None:
+            tbl = flatten_columns(_scanner_to_table(scanner), flatten)
+            return tbl.to_pandas(**kwargs)
+        return _scanner_to_pandas(scanner, blob_mode, **kwargs)
+
    @abstractmethod
    def to_query_object(self) -> Query:
        """Return a serializable representation of the query
@@ -1357,6 +1624,9 @@ class LanceVectorQueryBuilder(LanceQueryBuilder):
            refine_factor=self._refine_factor,
            vector_column=self._vector_column,
            with_row_id=self._with_row_id,
+            with_row_address=self._with_row_address,
+            fragments=self._fragments,
+            fragment_ids=self._fragment_ids,
            offset=self._offset,
            fast_search=self._fast_search,
            ef=self._ef,
@@ -1559,6 +1829,9 @@ class LanceFtsQueryBuilder(LanceQueryBuilder):
            limit=self._limit,
            postfilter=self._postfilter,
            with_row_id=self._with_row_id,
+            with_row_address=self._with_row_address,
+            fragments=self._fragments,
+            fragment_ids=self._fragment_ids,
            full_text_query=FullTextSearchQuery(
                query=self._query, columns=self._fts_columns
            ),
@@ -1629,6 +1902,9 @@ class LanceEmptyQueryBuilder(LanceQueryBuilder):
            filter=self._where,
            limit=self._limit,
            with_row_id=self._with_row_id,
+            with_row_address=self._with_row_address,
+            fragments=self._fragments,
+            fragment_ids=self._fragment_ids,
            offset=self._offset,
            order_by=self._order_by,
        )
@@ -2207,7 +2483,11 @@ class AsyncQueryBase(object):
    Base class for all async queries (take, scan, vector, fts, hybrid)
    """

-    def __init__(self, inner: Union[LanceQuery, LanceVectorQuery, LanceTakeQuery]):
+    def __init__(
+        self,
+        inner: Union[LanceQuery, LanceVectorQuery, LanceTakeQuery],
+        table: Optional["AsyncTable"] = None,
+    ):
        """
        Construct an AsyncQueryBase

@@ -2215,6 +2495,10 @@ class AsyncQueryBase(object):
        [AsyncTable.query][lancedb.table.AsyncTable.query] method to create a query.
        """
        self._inner = inner
+        self._table = table
+        self._with_row_address = None
+        self._fragments = None
+        self._fragment_ids = None

    def to_query_object(self) -> Query:
        """
@@ -2223,7 +2507,11 @@ class AsyncQueryBase(object):
        This is currently experimental but can be useful as the query object is pure
        python and more easily serializable.
        """
-        return Query.from_inner(self._inner.to_query_request())
+        query = Query.from_inner(self._inner.to_query_request())
+        query.with_row_address = self._with_row_address
+        query.fragments = self._fragments
+        query.fragment_ids = self._fragment_ids
+        return query

    def select(self, columns: Union[List[str], dict[str, str]]) -> Self:
        """
@@ -2280,6 +2568,27 @@ class AsyncQueryBase(object):
        self._inner.with_row_id()
        return self

+    def with_row_address(self, with_row_address: bool = True) -> Self:
+        """
+        Include the _rowaddr column in scanner-backed plain query results.
+        """
+        self._with_row_address = with_row_address
+        return self
+
+    def with_fragments(self, fragments: Any) -> Self:
+        """
+        Restrict scanner-backed plain query results to the given Lance fragments.
+        """
+        self._fragments = fragments
+        return self
+
+    def fragment_ids(self, fragment_ids: List[int]) -> Self:
+        """
+        Restrict scanner-backed plain query results to the given Lance fragment ids.
+        """
+        self._fragment_ids = fragment_ids
+        return self
+
    async def to_batches(
        self,
        *,
@@ -2357,6 +2666,8 @@ class AsyncQueryBase(object):
        self,
        flatten: Optional[Union[int, bool]] = None,
        timeout: Optional[timedelta] = None,
+        *,
+        blob_mode: BlobMode = "lazy",
        **kwargs,
    ) -> "pd.DataFrame":
        """
@@ -2390,13 +2701,63 @@ class AsyncQueryBase(object):
            The maximum time to wait for the query to complete.
            If not specified, no timeout is applied. If the query does not
            complete within the specified time, an error will be raised.
+        blob_mode: str, default "lazy"
+            Controls how blob columns are returned for plain scan queries.
+            Vector, FTS, hybrid, and other non-native query shapes keep the
+            existing Arrow conversion path and only support blob descriptions.
        **kwargs
            Forwarded to pyarrow.Table.to_pandas after query execution and
            optional flattening.
        """
-        return (
-            flatten_columns(await self.to_arrow(timeout=timeout), flatten)
-        ).to_pandas(**kwargs)
+        _validate_blob_mode(blob_mode)
+        if hasattr(self._inner, "output_schema"):
+            schema = await self.output_schema()
+            if _blob_mode_requires_native_pandas(blob_mode, schema):
+                native_error = None
+                if (flatten is None or blob_mode == "descriptions") and timeout is None:
+                    try:
+                        df = await self._plain_scan_to_pandas(
+                            blob_mode, flatten=flatten, **kwargs
+                        )
+                        if df is not None:
+                            return df
+                    except Exception as err:
+                        native_error = err
+                reason = (
+                    "this query shape cannot use Lance native pandas conversion"
+                    if native_error is None
+                    else str(native_error)
+                )
+                raise _unsupported_blob_pandas_error(reason) from native_error
+
+        tbl = flatten_columns(await self.to_arrow(timeout=timeout), flatten)
+        if _blob_mode_requires_native_pandas(blob_mode, tbl.schema):
+            raise _unsupported_blob_pandas_error(
+                "this query shape cannot use Lance native pandas conversion"
+            )
+        return tbl.to_pandas(**kwargs)
+
+    async def _plain_scan_to_pandas(
+        self,
+        blob_mode: BlobMode,
+        flatten: Optional[Union[int, bool]] = None,
+        **kwargs,
+    ) -> Optional["pd.DataFrame"]:
+        if self._table is None:
+            return None
+
+        query = self.to_query_object()
+        if not _query_is_plain_scan(query):
+            return None
+
+        dataset = await self._table._to_lance()
+        scanner = dataset.scanner(
+            **_scanner_kwargs_for_query(query, blob_mode, dataset)
+        )
+        if flatten is not None:
+            tbl = flatten_columns(_scanner_to_table(scanner), flatten)
+            return tbl.to_pandas(**kwargs)
+        return _scanner_to_pandas(scanner, blob_mode, **kwargs)

    async def to_polars(
        self,
@@ -2503,14 +2864,18 @@ class AsyncStandardQuery(AsyncQueryBase):
    Base class for "standard" async queries (all but take currently)
    """

-    def __init__(self, inner: Union[LanceQuery, LanceVectorQuery]):
+    def __init__(
+        self,
+        inner: Union[LanceQuery, LanceVectorQuery],
+        table: Optional["AsyncTable"] = None,
+    ):
        """
        Construct an AsyncStandardQuery

        This method is not intended to be called directly.  Instead, use the
        [AsyncTable.query][lancedb.table.AsyncTable.query] method to create a query.
        """
-        super().__init__(inner)
+        super().__init__(inner, table)

    def where(self, predicate: Union[str, Expr]) -> Self:
        """
@@ -2616,14 +2981,14 @@ class AsyncStandardQuery(AsyncQueryBase):


 class AsyncQuery(AsyncStandardQuery):
-    def __init__(self, inner: LanceQuery):
+    def __init__(self, inner: LanceQuery, table: Optional["AsyncTable"] = None):
        """
        Construct an AsyncQuery

        This method is not intended to be called directly.  Instead, use the
        [AsyncTable.query][lancedb.table.AsyncTable.query] method to create a query.
        """
-        super().__init__(inner)
+        super().__init__(inner, table)
        self._inner = inner

    @classmethod
@@ -2707,10 +3072,11 @@ class AsyncQuery(AsyncStandardQuery):
            new_self = self._inner.nearest_to(query_vectors[0])
            for v in query_vectors[1:]:
                new_self.add_query_vector(v)
-            return AsyncVectorQuery(new_self)
+            return AsyncVectorQuery(new_self, self._table)
        else:
            return AsyncVectorQuery(
-                self._inner.nearest_to(AsyncQuery._query_vec_to_array(query_vector))
+                self._inner.nearest_to(AsyncQuery._query_vec_to_array(query_vector)),
+                self._table,
            )

    def nearest_to_text(
@@ -2743,17 +3109,18 @@ class AsyncQuery(AsyncStandardQuery):

        if isinstance(query, str):
            return AsyncFTSQuery(
-                self._inner.nearest_to_text({"query": query, "columns": columns})
+                self._inner.nearest_to_text({"query": query, "columns": columns}),
+                self._table,
            )
        # FullTextQuery object
-        return AsyncFTSQuery(self._inner.nearest_to_text({"query": query}))
+        return AsyncFTSQuery(self._inner.nearest_to_text({"query": query}), self._table)


 class AsyncFTSQuery(AsyncStandardQuery):
    """A query for full text search for LanceDB."""

-    def __init__(self, inner: LanceFTSQuery):
-        super().__init__(inner)
+    def __init__(self, inner: LanceFTSQuery, table: Optional["AsyncTable"] = None):
+        super().__init__(inner, table)
        self._inner = inner
        self._reranker = None

@@ -2835,10 +3202,11 @@ class AsyncFTSQuery(AsyncStandardQuery):
            new_self = self._inner.nearest_to(query_vectors[0])
            for v in query_vectors[1:]:
                new_self.add_query_vector(v)
-            return AsyncHybridQuery(new_self)
+            return AsyncHybridQuery(new_self, self._table)
        else:
            return AsyncHybridQuery(
-                self._inner.nearest_to(AsyncQuery._query_vec_to_array(query_vector))
+                self._inner.nearest_to(AsyncQuery._query_vec_to_array(query_vector)),
+                self._table,
            )

    async def to_batches(
@@ -3029,7 +3397,7 @@ class AsyncVectorQueryBase:


 class AsyncVectorQuery(AsyncStandardQuery, AsyncVectorQueryBase):
-    def __init__(self, inner: LanceVectorQuery):
+    def __init__(self, inner: LanceVectorQuery, table: Optional["AsyncTable"] = None):
        """
        Construct an AsyncVectorQuery

@@ -3039,7 +3407,7 @@ class AsyncVectorQuery(AsyncStandardQuery, AsyncVectorQueryBase):
        a vector query.  Or you can use
        [AsyncTable.vector_search][lancedb.table.AsyncTable.vector_search]
        """
-        super().__init__(inner)
+        super().__init__(inner, table)
        self._inner = inner
        self._reranker = None
        self._query_string = None
@@ -3093,10 +3461,13 @@ class AsyncVectorQuery(AsyncStandardQuery, AsyncVectorQueryBase):

        if isinstance(query, str):
            return AsyncHybridQuery(
-                self._inner.nearest_to_text({"query": query, "columns": columns})
+                self._inner.nearest_to_text({"query": query, "columns": columns}),
+                self._table,
            )
        # FullTextQuery object
-        return AsyncHybridQuery(self._inner.nearest_to_text({"query": query}))
+        return AsyncHybridQuery(
+            self._inner.nearest_to_text({"query": query}), self._table
+        )

    async def to_batches(
        self,
@@ -3123,8 +3494,8 @@ class AsyncHybridQuery(AsyncStandardQuery, AsyncVectorQueryBase):
    in the `rerank` method to convert the scores to ranks and then normalize them.
    """

-    def __init__(self, inner: LanceHybridQuery):
-        super().__init__(inner)
+    def __init__(self, inner: LanceHybridQuery, table: Optional["AsyncTable"] = None):
+        super().__init__(inner, table)
        self._inner = inner
        self._norm = "score"
        self._reranker = RRFReranker()
@@ -3165,8 +3536,8 @@ class AsyncHybridQuery(AsyncStandardQuery, AsyncVectorQueryBase):
        max_batch_length: Optional[int] = None,
        timeout: Optional[timedelta] = None,
    ) -> AsyncRecordBatchReader:
-        fts_query = AsyncFTSQuery(self._inner.to_fts_query())
-        vec_query = AsyncVectorQuery(self._inner.to_vector_query())
+        fts_query = AsyncFTSQuery(self._inner.to_fts_query(), self._table)
+        vec_query = AsyncVectorQuery(self._inner.to_vector_query(), self._table)

        # save the row ID choice that was made on the query builder and force it
        # to actually fetch the row ids because we need this for reranking
@@ -3266,8 +3637,16 @@ class AsyncTakeQuery(AsyncQueryBase):
    Builder for parameterizing and executing take queries.
    """

-    def __init__(self, inner: LanceTakeQuery):
-        super().__init__(inner)
+    def __init__(self, inner: LanceTakeQuery, table: Optional["AsyncTable"] = None):
+        super().__init__(inner, table)
+
+    async def _plain_scan_to_pandas(
+        self,
+        blob_mode: BlobMode,
+        flatten: Optional[Union[int, bool]] = None,
+        **kwargs,
+    ) -> Optional["pd.DataFrame"]:
+        return None


 class BaseQueryBuilder(object):
@@ -3319,6 +3698,27 @@ class BaseQueryBuilder(object):
        self._inner.with_row_id()
        return self

+    def with_row_address(self, with_row_address: bool = True) -> Self:
+        """
+        Include the _rowaddr column in scanner-backed plain query results.
+        """
+        self._inner.with_row_address(with_row_address)
+        return self
+
+    def with_fragments(self, fragments: Any) -> Self:
+        """
+        Restrict scanner-backed plain query results to the given Lance fragments.
+        """
+        self._inner.with_fragments(fragments)
+        return self
+
+    def fragment_ids(self, fragment_ids: List[int]) -> Self:
+        """
+        Restrict scanner-backed plain query results to the given Lance fragment ids.
+        """
+        self._inner.fragment_ids(fragment_ids)
+        return self
+
    def output_schema(self) -> pa.Schema:
        """
        Return the output schema for the query
@@ -3400,6 +3800,8 @@ class BaseQueryBuilder(object):
        self,
        flatten: Optional[Union[int, bool]] = None,
        timeout: Optional[timedelta] = None,
+        *,
+        blob_mode: BlobMode = "lazy",
        **kwargs,
    ) -> "pd.DataFrame":
        """
@@ -3433,11 +3835,15 @@ class BaseQueryBuilder(object):
            The maximum time to wait for the query to complete.
            If not specified, no timeout is applied. If the query does not
            complete within the specified time, an error will be raised.
+        blob_mode: str, default "lazy"
+            Controls how blob columns are returned for plain scan queries.
        **kwargs
            Forwarded to pyarrow.Table.to_pandas after query execution and
            optional flattening.
        """
-        return LOOP.run(self._inner.to_pandas(flatten, timeout, **kwargs))
+        return LOOP.run(
+            self._inner.to_pandas(flatten, timeout, blob_mode=blob_mode, **kwargs)
+        )

    def to_polars(
        self,
--- a/python/python/lancedb/remote/errors.py
+++ b/python/python/lancedb/remote/errors.py
@@ -27,6 +27,9 @@ class LanceDBClientError(RuntimeError):
        self.request_id = request_id
        self.status_code = status_code

+    def __reduce__(self) -> tuple[type, tuple]:
+        return (self.__class__, (str(self), self.request_id, self.status_code))
+

 class HttpError(LanceDBClientError):
    """An error that occurred during an HTTP request.
@@ -101,3 +104,19 @@ class RetryError(LanceDBClientError):
        self.max_request_failures = max_request_failures
        self.max_connect_failures = max_connect_failures
        self.max_read_failures = max_read_failures
+
+    def __reduce__(self) -> tuple[type, tuple]:
+        return (
+            self.__class__,
+            (
+                str(self),
+                self.request_id,
+                self.request_failures,
+                self.connect_failures,
+                self.read_failures,
+                self.max_request_failures,
+                self.max_connect_failures,
+                self.max_read_failures,
+                self.status_code,
+            ),
+        )
--- a/python/python/lancedb/remote/table.py
+++ b/python/python/lancedb/remote/table.py
@@ -25,6 +25,7 @@ from lancedb._lancedb import (
    AddColumnsResult,
    AddResult,
    AlterColumnsResult,
+    UpdateFieldMetadataResult,
    DeleteResult,
    DropColumnsResult,
    IndexConfig,
@@ -74,7 +75,6 @@ class RemoteTable(Table):
        self._connection_state = connection_state
        self._namespace_path = list(namespace_path or [])
        self._checkout_version: Optional[int] = None
-        self._table_state: Optional[dict[str, Any]] = None
        self._pid = os.getpid()

    def _serialized_connection_state(self) -> str:
@@ -87,16 +87,6 @@ class RemoteTable(Table):
            self._connection_state = self._connection_state()
        return self._connection_state

-    def _reopen_state(self) -> dict[str, Any]:
-        if self._table_state is not None:
-            return self._table_state
-        self._table_state = {
-            "name": self._name,
-            "namespace_path": self._namespace_path,
-            "storage_options": None,
-        }
-        return self._table_state
-
    @property
    def _table(self) -> AsyncTable:
        self._ensure_open()
@@ -107,7 +97,6 @@ class RemoteTable(Table):
    def _table(self, table: AsyncTable) -> None:
        self._table_handle = table
        self._name = table.name
-        self._table_state = None
        self._pid = os.getpid()

    def _ensure_open(self) -> None:
@@ -120,11 +109,7 @@ class RemoteTable(Table):
        from lancedb import deserialize_conn

        db = deserialize_conn(self._serialized_connection_state(), for_worker=True)
-        table_state = self._reopen_state()
-        table = db.open_table(
-            table_state["name"],
-            namespace_path=table_state["namespace_path"] or None,
-        )
+        table = db.open_table(self._name, namespace_path=self._namespace_path)
        if self._checkout_version is not None:
            table.checkout(self._checkout_version)

@@ -136,24 +121,17 @@ class RemoteTable(Table):
        return {
            "connection_state": self._serialized_connection_state(),
            "db_name": self.db_name,
-            "table_state": self._reopen_state(),
+            "name": self.name,
+            "namespace_path": self._namespace_path,
            "checkout_version": self._checkout_version,
        }

    def __setstate__(self, state: dict) -> None:
        self._table_handle = None
-        table_state = state.get("table_state")
-        if table_state is None:
-            table_state = {
-                "name": state["name"],
-                "namespace_path": state["namespace_path"],
-                "storage_options": None,
-            }
-        self._table_state = table_state
-        self._name = table_state["name"]
+        self._name = state["name"]
        self.db_name = state["db_name"]
        self._connection_state = state["connection_state"]
-        self._namespace_path = table_state["namespace_path"]
+        self._namespace_path = state["namespace_path"]
        self._checkout_version = state["checkout_version"]
        self._pid = None

@@ -873,6 +851,11 @@ class RemoteTable(Table):
    ) -> AlterColumnsResult:
        return LOOP.run(self._table.alter_columns(*alterations))

+    def update_field_metadata(
+        self, *updates: dict[str, Any]
+    ) -> UpdateFieldMetadataResult:
+        return LOOP.run(self._table.update_field_metadata(*updates))
+
    def drop_columns(self, columns: Iterable[str]) -> DropColumnsResult:
        return LOOP.run(self._table.drop_columns(columns))

--- a/python/python/lancedb/rerankers/mrr.py
+++ b/python/python/lancedb/rerankers/mrr.py
@@ -125,6 +125,9 @@ class MRRReranker(Reranker):
        This cannot reuse rerank_hybrid because MRR semantics require treating
        each vector result as a separate ranking system.
        """
+        if not vector_results:
+            raise ValueError("vector_results must not be empty")
+
        if not all(isinstance(v, type(vector_results[0])) for v in vector_results):
            raise ValueError(
                "All elements in vector_results should be of the same type"
--- a/python/python/lancedb/rerankers/rrf.py
+++ b/python/python/lancedb/rerankers/rrf.py
@@ -82,6 +82,9 @@ class RRFReranker(Reranker):
        results from multiple vector searches as it doesn't support reranking
        vector results individually.
        """
+        if not vector_results:
+            raise ValueError("vector_results must not be empty")
+
        # Make sure all elements are of the same type
        if not all(isinstance(v, type(vector_results[0])) for v in vector_results):
            raise ValueError(
--- a/python/python/lancedb/table.py
+++ b/python/python/lancedb/table.py
@@ -5,7 +5,6 @@ from __future__ import annotations

 import asyncio
 import inspect
-import os
 import deprecation
 import warnings
 from abc import ABC, abstractmethod
@@ -31,7 +30,7 @@ from lancedb.scannable import _register_optional_converters, to_scannable

 from . import __version__
 from lancedb.arrow import peek_reader
-from lancedb.background_loop import LOOP
+from lancedb.background_loop import LOOP, embedding_executor
 from .dependencies import (
    _check_for_hugging_face,
    _check_for_lance,
@@ -90,6 +89,26 @@ from .index import lang_mapping

 BlobMode = Literal["lazy", "bytes", "descriptions"]

+_VALID_BLOB_MODES = ("lazy", "bytes", "descriptions")
+
+
+def _validate_blob_mode(blob_mode: BlobMode) -> None:
+    if blob_mode not in _VALID_BLOB_MODES:
+        modes = ", ".join(repr(mode) for mode in _VALID_BLOB_MODES)
+        raise ValueError(f"blob_mode must be one of {modes}, got {blob_mode!r}")
+
+
+def _field_is_blob(field: pa.Field) -> bool:
+    metadata = field.metadata or {}
+    return metadata.get(b"lance-encoding:blob") == b"true" or (
+        metadata.get("lance-encoding:blob") == "true"
+    )
+
+
+def _schema_has_blob_field(schema: pa.Schema) -> bool:
+    return any(_field_is_blob(field) for field in schema)
+
+
 _MODEL_BACKED_TOKENIZER_PREFIXES = ("jieba", "lindera")
 _MODEL_BACKED_TOKENIZER_ERRORS = (
    "unknown base tokenizer",
@@ -155,6 +174,7 @@ if TYPE_CHECKING:
        AddColumnsResult,
        AddResult,
        AlterColumnsResult,
+        UpdateFieldMetadataResult,
        DeleteResult,
        DropColumnsResult,
        LsmWriteSpec,
@@ -758,12 +778,8 @@ class Table(ABC):
        """
        raise NotImplementedError

-    def _ensure_open(self) -> None:
-        pass
-
    def __len__(self) -> int:
        """The number of rows in this Table"""
-        self._ensure_open()
        return self.count_rows(None)

    @property
@@ -1409,7 +1425,6 @@ class Table(ABC):
        pa.RecordBatch
            A record batch containing the rows at the given offsets.
        """
-        self._ensure_open()
        # We don't know the order of the results at all.  So we calculate a permutation
        # for ordering the given offsets.  Then we load the data with the _rowoffset
        # column.  Then we sort by _rowoffset and apply the inverse of the permutation
@@ -1805,6 +1820,29 @@ class Table(ABC):
            version: the new version number of the table after the alteration.
        """

+    @abstractmethod
+    def update_field_metadata(
+        self, *updates: dict[str, Any]
+    ) -> UpdateFieldMetadataResult:
+        """
+        Update per-field (column) metadata.
+
+        Parameters
+        ----------
+        updates : dict
+            One or more dicts, each with:
+            - "path": str — dot-path to the field (e.g. "embedding" or "a.b.c").
+            - "metadata": dict[str, str | None] — keys to set; a value of ``None``
+              deletes that key.
+            - "replace": bool, optional — replace the field's whole metadata map
+              instead of merging (default False).
+
+        Returns
+        -------
+        UpdateFieldMetadataResult
+            version: the new table version after the update.
+        """
+
    @abstractmethod
    def drop_columns(self, columns: Iterable[str]) -> DropColumnsResult:
        """
@@ -1968,7 +2006,6 @@ class LanceTable(Table):
        self._location = location  # Store location for use in _dataset_path
        self._namespace_client = namespace_client
        self._pushdown_operations = pushdown_operations or set()
-        self._init_reopen_tracking()
        if _async is not None:
            self._table = _async
        else:
@@ -1984,66 +2021,6 @@ class LanceTable(Table):
                )
            )

-    def _init_reopen_tracking(self) -> None:
-        self._checkout_version: Optional[int] = None
-        self._table_state: Optional[dict[str, Any]] = None
-        self._pid = os.getpid()
-
-    def _reopen_state(self) -> dict[str, Any]:
-        state = LOOP.run(self._table._table_reopen_state())
-        if get_uri_scheme(self._conn.uri) == "memory":
-            raise ValueError(
-                "Cannot pickle an in-memory LanceTable. Use a persisted table "
-                "or provide a worker-side connection factory."
-            )
-        return state
-
-    def _copy_reopened_table(self, table: "LanceTable") -> None:
-        self._conn = table._conn
-        self._namespace_path = table._namespace_path
-        self._location = table._location
-        self._namespace_client = table._namespace_client
-        self._pushdown_operations = table._pushdown_operations
-        self._table = table._table
-        self._pid = os.getpid()
-
-    def _ensure_open(self) -> None:
-        pid = os.getpid()
-        if getattr(self, "_table", None) is not None and self._pid == pid:
-            return
-        if self._table_state is None:
-            self._table_state = self._reopen_state()
-
-        table = self._conn.open_table(
-            self._table_state["name"],
-            namespace_path=self._table_state["namespace_path"] or None,
-            storage_options=self._table_state["storage_options"],
-        )
-        if self._checkout_version is not None:
-            table.checkout(self._checkout_version)
-        self._copy_reopened_table(table)
-
-    def __getstate__(self) -> dict[str, Any]:
-        return {
-            "connection_state": self._conn.serialize(),
-            "table_state": self._reopen_state(),
-            "checkout_version": self._checkout_version,
-        }
-
-    def __setstate__(self, state: dict[str, Any]) -> None:
-        from . import deserialize_conn
-
-        self._conn = deserialize_conn(state["connection_state"], for_worker=True)
-        self._namespace_path = list(state["table_state"]["namespace_path"] or [])
-        self._location = None
-        self._namespace_client = None
-        self._pushdown_operations = set()
-        self._checkout_version = state["checkout_version"]
-        self._table_state = state["table_state"]
-        self._table = None
-        self._pid = None
-        self._ensure_open()
-
    @property
    def name(self) -> str:
        return self._table.name
@@ -2247,7 +2224,6 @@ class LanceTable(Table):
        0  [1.1, 0.9]  vector
        """
        LOOP.run(self._table.checkout(version))
-        self._checkout_version = self.version

    def checkout_latest(self):
        """Checkout the latest version of the table. This is an in-place operation.
@@ -2256,7 +2232,6 @@ class LanceTable(Table):
        version of the table.
        """
        LOOP.run(self._table.checkout_latest())
-        self._checkout_version = None

    def restore(self, version: Optional[Union[int, str]] = None):
        """Restore a version of the table. This is an in-place operation.
@@ -2305,7 +2280,6 @@ class LanceTable(Table):
        if version is not None:
            LOOP.run(self._table.checkout(version))
        LOOP.run(self._table.restore())
-        self._checkout_version = None

    def count_rows(self, filter: Optional[str] = None) -> int:
        return LOOP.run(self._table.count_rows(filter))
@@ -2340,9 +2314,14 @@ class LanceTable(Table):
        -------
        pd.DataFrame
        """
-        if blob_mode == "lazy" and (
-            self._namespace_client is not None
-            or get_uri_scheme(self._dataset_path) == "memory"
+        _validate_blob_mode(blob_mode)
+        if blob_mode == "descriptions" or not _schema_has_blob_field(self.schema):
+            return self.to_arrow().to_pandas(**kwargs)
+
+        if (
+            blob_mode == "lazy"
+            and self._namespace_client is None
+            and get_uri_scheme(self._dataset_path) == "memory"
        ):
            return self.to_arrow().to_pandas(**kwargs)

@@ -3364,7 +3343,6 @@ class LanceTable(Table):
        self._location = location
        self._namespace_client = namespace_client
        self._pushdown_operations = pushdown_operations or set()
-        self._init_reopen_tracking()

        if data_storage_version is not None:
            warnings.warn(
@@ -3654,6 +3632,11 @@ class LanceTable(Table):
    ) -> AlterColumnsResult:
        return LOOP.run(self._table.alter_columns(*alterations))

+    def update_field_metadata(
+        self, *updates: dict[str, Any]
+    ) -> UpdateFieldMetadataResult:
+        return LOOP.run(self._table.update_field_metadata(*updates))
+
    def drop_columns(self, columns: Iterable[str]) -> DropColumnsResult:
        return LOOP.run(self._table.drop_columns(columns))

@@ -3708,10 +3691,18 @@ class LanceTable(Table):
        """
        LOOP.run(self._table.migrate_v2_manifest_paths())

+    @deprecation.deprecated(
+        deprecated_in="0.33.1",
+        current_version=__version__,
+        details="Use update_field_metadata() instead.",
+    )
    def replace_field_metadata(self, field_name: str, new_metadata: Dict[str, str]):
        """
        Replace the metadata of a field in the schema

+        .. deprecated:: 0.33.1
+            Use :func:`update_field_metadata` instead.
+
        Parameters
        ----------
        field_name: str
@@ -4351,7 +4342,7 @@ class AsyncTable:
        can be executed with methods like [to_arrow][lancedb.query.AsyncQuery.to_arrow],
        [to_pandas][lancedb.query.AsyncQuery.to_pandas] and more.
        """
-        return AsyncQuery(self._inner.query())
+        return AsyncQuery(self._inner.query(), self)

    async def _to_lance(self, **kwargs) -> lance.LanceDataset:
        try:
@@ -4383,7 +4374,13 @@ class AsyncTable:
        -------
        pd.DataFrame
        """
-        if blob_mode == "lazy":
+        _validate_blob_mode(blob_mode)
+        if blob_mode == "descriptions" or not _schema_has_blob_field(
+            await self.schema()
+        ):
+            return (await self.to_arrow()).to_pandas(**kwargs)
+
+        if blob_mode == "lazy" and get_uri_scheme(await self.uri()) == "memory":
            return (await self.to_arrow()).to_pandas(**kwargs)
        return (await self._to_lance()).to_pandas(blob_mode=blob_mode, **kwargs)

@@ -4622,10 +4619,6 @@ class AsyncTable:
        """
        return await self._inner.latest_storage_options()

-    async def _table_reopen_state(self) -> dict[str, Any]:
-        """Get the Rust-side table state needed to reopen this table."""
-        return await self._inner._table_reopen_state()
-
    async def add(
        self,
        data: DATA,
@@ -4915,10 +4908,13 @@ class AsyncTable:
            if embedding is not None:
                loop = asyncio.get_running_loop()
                # This function is likely to block, since it either calls an expensive
-                # function or makes an HTTP request to an embeddings REST API.
+                # function or makes an HTTP request to an embeddings REST API. Run it
+                # on a dedicated executor so it can't starve the default executor that
+                # other blocking I/O shares. See
+                # https://github.com/lancedb/lancedb/issues/3310.
                return (
                    await loop.run_in_executor(
-                        None,
+                        embedding_executor(),
                        embedding.function.compute_query_embeddings_with_retry,
                        query,
                    )
@@ -5309,6 +5305,13 @@ class AsyncTable:
        """
        return await self._inner.alter_columns(alterations)

+    async def update_field_metadata(
+        self, *updates: dict[str, Any]
+    ) -> UpdateFieldMetadataResult:
+        """Update per-field metadata. See
+        [`Table.update_field_metadata`][lancedb.table.Table.update_field_metadata]."""
+        return await self._inner.update_field_metadata(updates)
+
    async def drop_columns(self, columns: Iterable[str]):
        """
        Drop columns from the table.
@@ -5424,7 +5427,7 @@ class AsyncTable:
        pa.RecordBatch
            A record batch containing the rows at the given offsets.
        """
-        return AsyncTakeQuery(self._inner.take_offsets(offsets))
+        return AsyncTakeQuery(self._inner.take_offsets(offsets), self)

    def take_row_ids(self, row_ids: list[int]) -> AsyncTakeQuery:
        """
@@ -5453,7 +5456,7 @@ class AsyncTable:
        AsyncTakeQuery
            A query object that can be executed to get the rows.
        """
-        return AsyncTakeQuery(self._inner.take_row_ids(row_ids))
+        return AsyncTakeQuery(self._inner.take_row_ids(row_ids), self)

    @property
    def tags(self) -> AsyncTags:
@@ -5593,12 +5596,20 @@ class AsyncTable:
        """
        await self._inner.migrate_manifest_paths_v2()

+    @deprecation.deprecated(
+        deprecated_in="0.33.1",
+        current_version=__version__,
+        details="Use update_field_metadata() instead.",
+    )
    async def replace_field_metadata(
        self, field_name: str, new_metadata: dict[str, str]
    ):
        """
        Replace the metadata of a field in the schema

+        .. deprecated:: 0.33.1
+            Use :func:`update_field_metadata` instead.
+
        Parameters
        ----------
        field_name: str
--- a/python/python/tests/test_errors.py
+++ b/python/python/tests/test_errors.py
@@ -0,0 +1,56 @@
+# SPDX-License-Identifier: Apache-2.0
+# SPDX-FileCopyrightText: Copyright The LanceDB Authors
+
+import pickle
+
+from lancedb.remote.errors import HttpError, LanceDBClientError, RetryError
+
+
+def test_pickle_lancedb_client_error():
+    err = LanceDBClientError("something went wrong", "req-123", 400)
+    restored = pickle.loads(pickle.dumps(err))
+    assert str(restored) == "something went wrong"
+    assert restored.request_id == "req-123"
+    assert restored.status_code == 400
+
+
+def test_pickle_lancedb_client_error_no_status_code():
+    err = LanceDBClientError("fail", "req-456")
+    restored = pickle.loads(pickle.dumps(err))
+    assert str(restored) == "fail"
+    assert restored.request_id == "req-456"
+    assert restored.status_code is None
+
+
+def test_pickle_http_error():
+    err = HttpError("not found", "req-789", 404)
+    restored = pickle.loads(pickle.dumps(err))
+    assert isinstance(restored, HttpError)
+    assert str(restored) == "not found"
+    assert restored.request_id == "req-789"
+    assert restored.status_code == 404
+
+
+def test_pickle_retry_error():
+    err = RetryError(
+        "max retries exceeded",
+        "req-abc",
+        request_failures=3,
+        connect_failures=1,
+        read_failures=2,
+        max_request_failures=5,
+        max_connect_failures=3,
+        max_read_failures=3,
+        status_code=503,
+    )
+    restored = pickle.loads(pickle.dumps(err))
+    assert isinstance(restored, RetryError)
+    assert str(restored) == "max retries exceeded"
+    assert restored.request_id == "req-abc"
+    assert restored.request_failures == 3
+    assert restored.connect_failures == 1
+    assert restored.read_failures == 2
+    assert restored.max_request_failures == 5
+    assert restored.max_connect_failures == 3
+    assert restored.max_read_failures == 3
+    assert restored.status_code == 503
--- a/python/python/tests/test_namespace.py
+++ b/python/python/tests/test_namespace.py
@@ -76,6 +76,35 @@ class TestNamespaceConnection:
        assert len(result) == 0
        assert list(result.columns) == ["id", "vector", "text"]

+    def test_table_to_pandas_blob_lazy_through_namespace(self):
+        """Namespace-backed tables should use Lance blob-aware pandas conversion."""
+        pytest.importorskip("lance")
+        db = lancedb.connect_namespace("dir", {"root": self.temp_dir})
+        db.create_namespace(["test_ns"])
+        data = pa.table(
+            {
+                "id": pa.array([1, 2], pa.int64()),
+                "blob": pa.array([b"hello", b"world"], pa.large_binary()),
+            },
+            schema=pa.schema(
+                [
+                    pa.field("id", pa.int64()),
+                    pa.field(
+                        "blob",
+                        pa.large_binary(),
+                        metadata={"lance-encoding:blob": "true"},
+                    ),
+                ]
+            ),
+        )
+
+        table = db.create_table("blob_table", data, namespace_path=["test_ns"])
+        df = table.to_pandas(blob_mode="lazy").sort_values("id")
+
+        blob = df["blob"].iloc[0]
+        assert hasattr(blob, "readall")
+        assert blob.readall() == b"hello"
+
    def test_open_table_through_namespace(self):
        """Test opening an existing table through namespace."""
        db = lancedb.connect_namespace("dir", {"root": self.temp_dir})
--- a/python/python/tests/test_query.py
+++ b/python/python/tests/test_query.py
@@ -39,6 +39,35 @@ from utils import exception_output
 from importlib.util import find_spec


+def _blob_query_data():
+    return pa.table(
+        {
+            "id": pa.array([1, 2, 3, 4], pa.int64()),
+            "tag": pa.array(["drop", "keep", "keep", "keep"], pa.utf8()),
+            "vector": pa.array(
+                [[1.0, 0.0], [2.0, 0.0], [3.0, 0.0], [4.0, 0.0]],
+                type=pa.list_(pa.float32(), list_size=2),
+            ),
+            "blob": pa.array([b"one", b"two", b"three", b"four"], pa.large_binary()),
+        },
+        schema=pa.schema(
+            [
+                pa.field("id", pa.int64()),
+                pa.field("tag", pa.utf8()),
+                pa.field("vector", pa.list_(pa.float32(), list_size=2)),
+                pa.field(
+                    "blob", pa.large_binary(), metadata={"lance-encoding:blob": "true"}
+                ),
+            ]
+        ),
+    )
+
+
+def _assert_lazy_blob(value, expected: bytes):
+    assert hasattr(value, "readall")
+    assert value.readall() == expected
+
+
@pytest.fixture(scope="module")
 def table(tmpdir_factory) -> lancedb.table.Table:
    tmp_path = str(tmpdir_factory.mktemp("data"))
@@ -181,6 +210,216 @@ async def test_query_to_pandas_kwargs(table, table_async):
    assert async_df["id"].tolist() == [1, 2]


+@pytest.mark.parametrize("blob_mode", ["lazy", "bytes", "descriptions"])
+def test_plain_scan_query_to_pandas_blob_modes(tmp_db, blob_mode):
+    pytest.importorskip("lance")
+    table = tmp_db.create_table(
+        f"test_query_to_pandas_blob_{blob_mode}", _blob_query_data()
+    )
+
+    df = (
+        table.search()
+        .select(["id", "blob"])
+        .where("id = 1")
+        .to_pandas(blob_mode=blob_mode)
+    )
+
+    assert df["id"].tolist() == [1]
+    if blob_mode == "lazy":
+        _assert_lazy_blob(df["blob"].iloc[0], b"one")
+    elif blob_mode == "bytes":
+        assert df["blob"].tolist() == [b"one"]
+    else:
+        first = df["blob"].iloc[0]
+        assert first != b"one"
+        assert not hasattr(first, "readall")
+
+
+def test_plain_scan_query_to_pandas_blob_projection(tmp_db):
+    pytest.importorskip("lance")
+    table = tmp_db.create_table(
+        "test_query_to_pandas_blob_projection", _blob_query_data()
+    )
+
+    df = (
+        table.search()
+        .where("id >= 2")
+        .select({"id_alias": "id", "payload": "blob", "double_id": "id * 2"})
+        .limit(2)
+        .offset(1)
+        .to_pandas(blob_mode="bytes")
+    )
+
+    assert df["id_alias"].tolist() == [3, 4]
+    assert df["payload"].tolist() == [b"three", b"four"]
+    assert df["double_id"].tolist() == [6, 8]
+
+
+@pytest.mark.parametrize("blob_mode", ["bytes", "descriptions"])
+def test_plain_scan_query_to_pandas_blob_mode_does_not_collect_arrow(
+    tmp_db, monkeypatch, blob_mode
+):
+    pytest.importorskip("lance")
+    table = tmp_db.create_table(
+        "test_query_to_pandas_blob_no_arrow_collect", _blob_query_data()
+    )
+    query = table.search().where("id = 1").select(["id", "blob"])
+
+    def fail_to_arrow(*args, **kwargs):
+        raise AssertionError("to_arrow should not be called before native pandas")
+
+    monkeypatch.setattr(query, "to_arrow", fail_to_arrow)
+
+    df = query.to_pandas(blob_mode=blob_mode)
+
+    assert df["id"].tolist() == [1]
+    if blob_mode == "bytes":
+        assert df["blob"].tolist() == [b"one"]
+    else:
+        first = df["blob"].iloc[0]
+        assert first != b"one"
+        assert not hasattr(first, "readall")
+
+
+def test_plain_scan_query_to_pandas_blob_descriptions_flatten_uses_scanner(
+    tmp_db, monkeypatch
+):
+    pytest.importorskip("lance")
+    table = tmp_db.create_table(
+        "test_query_to_pandas_blob_desc_flatten", _blob_query_data()
+    )
+    query = table.search().where("id = 1").select(["id", "blob"])
+
+    def fail_to_arrow(*args, **kwargs):
+        raise AssertionError("to_arrow should not be called before scanner pandas")
+
+    monkeypatch.setattr(query, "to_arrow", fail_to_arrow)
+
+    df = query.to_pandas(blob_mode="descriptions", flatten=True)
+
+    assert df["id"].tolist() == [1]
+    assert any(column == "blob" or column.startswith("blob.") for column in df.columns)
+
+
+def test_plain_scan_query_to_pandas_scanner_state(tmp_db):
+    pytest.importorskip("lance")
+    data = _blob_query_data()
+    table = tmp_db.create_table("test_query_to_pandas_scanner_state", data.slice(0, 2))
+    table.add(data.slice(2, 2))
+
+    fragments = table.to_lance().get_fragments()
+    assert len(fragments) == 2
+
+    query = (
+        table.search()
+        .select(["id", "blob"])
+        .with_row_address()
+        .fragment_ids([fragments[1].fragment_id])
+    )
+    query_obj = query.to_query_object()
+    assert query_obj.with_row_address is True
+    assert query_obj.fragment_ids == [fragments[1].fragment_id]
+
+    df = query.to_pandas(blob_mode="descriptions")
+
+    assert df["id"].tolist() == [3, 4]
+    assert "_rowaddr" in df.columns
+    assert {rowaddr >> 32 for rowaddr in df["_rowaddr"]} == {fragments[1].fragment_id}
+
+    df_by_fragment = (
+        table.search()
+        .select(["id", "blob"])
+        .with_fragments([fragments[0]])
+        .to_pandas(blob_mode="descriptions")
+    )
+    assert df_by_fragment["id"].tolist() == [1, 2]
+
+
+@pytest.mark.asyncio
+async def test_async_plain_scan_query_to_pandas_blob_projection(tmp_db_async):
+    pytest.importorskip("lance")
+    table = await tmp_db_async.create_table(
+        "test_async_query_to_pandas_blob_projection", _blob_query_data()
+    )
+
+    lazy_df = await (
+        table.query().where("id = 1").select(["id", "blob"]).to_pandas(blob_mode="lazy")
+    )
+    assert lazy_df["id"].tolist() == [1]
+    _assert_lazy_blob(lazy_df["blob"].iloc[0], b"one")
+
+    bytes_df = await (
+        table.query()
+        .where("id >= 2")
+        .select({"id_alias": "id", "payload": "blob", "double_id": "id * 2"})
+        .limit(2)
+        .offset(1)
+        .to_pandas(blob_mode="bytes")
+    )
+    assert bytes_df["id_alias"].tolist() == [3, 4]
+    assert bytes_df["payload"].tolist() == [b"three", b"four"]
+    assert bytes_df["double_id"].tolist() == [6, 8]
+
+    desc_df = await (
+        table.query()
+        .where("id = 1")
+        .select(["blob"])
+        .to_pandas(blob_mode="descriptions")
+    )
+    first = desc_df["blob"].iloc[0]
+    assert first != b"one"
+    assert not hasattr(first, "readall")
+
+
+@pytest.mark.asyncio
+@pytest.mark.parametrize("blob_mode", ["bytes", "descriptions"])
+async def test_async_plain_scan_query_to_pandas_blob_mode_does_not_collect_arrow(
+    tmp_db_async, monkeypatch, blob_mode
+):
+    pytest.importorskip("lance")
+    table = await tmp_db_async.create_table(
+        "test_async_query_to_pandas_blob_no_arrow_collect", _blob_query_data()
+    )
+    query = table.query().where("id = 1").select(["id", "blob"])
+
+    async def fail_to_arrow(*args, **kwargs):
+        raise AssertionError("to_arrow should not be called before native pandas")
+
+    monkeypatch.setattr(query, "to_arrow", fail_to_arrow)
+
+    df = await query.to_pandas(blob_mode=blob_mode)
+
+    assert df["id"].tolist() == [1]
+    if blob_mode == "bytes":
+        assert df["blob"].tolist() == [b"one"]
+    else:
+        first = df["blob"].iloc[0]
+        assert first != b"one"
+        assert not hasattr(first, "readall")
+
+
+def test_vector_query_to_pandas_blob_mode_requires_native_path(tmp_db):
+    pytest.importorskip("lance")
+    table = tmp_db.create_table("test_vector_query_blob_mode", _blob_query_data())
+
+    with pytest.raises(RuntimeError, match="Lance native pandas conversion"):
+        table.search([1.0, 0.0]).select(["blob", "vector"]).limit(1).to_pandas(
+            blob_mode="lazy"
+        )
+
+
+def test_vector_query_to_pandas_blob_descriptions_requires_plain_scan(tmp_db):
+    pytest.importorskip("lance")
+    table = tmp_db.create_table(
+        "test_vector_query_blob_descriptions", _blob_query_data()
+    )
+
+    with pytest.raises(RuntimeError, match="plain scan query"):
+        table.search([1.0, 0.0]).select(["blob", "vector"]).limit(1).to_pandas(
+            blob_mode="descriptions"
+        )
+
+
 def test_order_by_plain_query(mem_db):
    table = mem_db.create_table(
        "test_order_by",
--- a/python/python/tests/test_remote_db.py
+++ b/python/python/tests/test_remote_db.py
@@ -215,51 +215,10 @@ def test_remote_table_is_picklable():

    with mock_lancedb_connection(handler) as db:
        table = db.open_table("test")
-        state = table.__getstate__()
-        assert state["table_state"] == {
-            "name": "test",
-            "namespace_path": [],
-            "storage_options": None,
-        }
        restored = pickle.loads(pickle.dumps(table))
        assert restored.count_rows() == 3


-def test_remote_table_reopens_when_pid_changes_without_cached_state():
-    def handler(request):
-        request.close_connection = True
-        if request.path == "/v1/table/test/describe/":
-            request.send_response(200)
-            request.send_header("Content-Type", "application/json")
-            request.end_headers()
-            payload = json.dumps(
-                {
-                    "version": 1,
-                    "schema": {
-                        "fields": [
-                            {"name": "id", "type": {"type": "int64"}, "nullable": False}
-                        ]
-                    },
-                }
-            )
-            request.wfile.write(payload.encode())
-        elif request.path == "/v1/table/test/count_rows/":
-            request.send_response(200)
-            request.send_header("Content-Type", "application/json")
-            request.end_headers()
-            request.wfile.write(b"3")
-        else:
-            request.send_response(404)
-            request.end_headers()
-
-    with mock_lancedb_connection(handler) as db:
-        table = db.open_table("test")
-        table._pid = -1
-        table._table_state = None
-
-        assert table.count_rows() == 3
-
-
 def test_remote_table_open_does_not_require_picklable_client_config():
    from lancedb.remote import HeaderProvider

--- a/python/python/tests/test_rerankers.py
+++ b/python/python/tests/test_rerankers.py
@@ -344,6 +344,12 @@ def test_mrr_reranker(tmp_path):
    assert len(result_deduped) == len(result)


+def test_mrr_reranker_empty_input():
+    reranker = MRRReranker()
+    with pytest.raises(ValueError, match="must not be empty"):
+        reranker.rerank_multivector([])
+
+
 def test_rrf_reranker_distance():
    data = pa.table(
        {
--- a/python/python/tests/test_table.py
+++ b/python/python/tests/test_table.py
@@ -3,8 +3,8 @@


 import os
-import pickle
 import sys
+import threading
 import warnings
 from datetime import date, datetime, timedelta
 from time import sleep
@@ -27,6 +27,28 @@ from lancedb.table import LanceTable
 from pydantic import BaseModel


+def _blob_test_data():
+    return pa.table(
+        {
+            "id": pa.array([1, 2], pa.int64()),
+            "blob": pa.array([b"hello", b"world"], pa.large_binary()),
+        },
+        schema=pa.schema(
+            [
+                pa.field("id", pa.int64()),
+                pa.field(
+                    "blob", pa.large_binary(), metadata={"lance-encoding:blob": "true"}
+                ),
+            ]
+        ),
+    )
+
+
+def _assert_lazy_blob(value, expected: bytes):
+    assert hasattr(value, "readall")
+    assert value.readall() == expected
+
+
 def test_basic(mem_db: DBConnection):
    data = [
        {"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
@@ -49,36 +71,6 @@ def test_basic(mem_db: DBConnection):
    assert table.to_arrow() == expected_data


-def test_lance_table_is_picklable(tmp_db: DBConnection):
-    table = tmp_db.create_table("pickle_table", pa.table({"id": [1, 2, 3]}))
-
-    restored = pickle.loads(pickle.dumps(table))
-
-    assert restored.name == "pickle_table"
-    assert restored.count_rows() == 3
-    assert restored.to_arrow().column("id").to_pylist() == [1, 2, 3]
-
-
-def test_lance_table_pickle_preserves_checkout(tmp_db: DBConnection):
-    table = tmp_db.create_table("pickle_checkout", pa.table({"id": [1]}))
-    table.add(pa.table({"id": [2]}))
-    table.checkout(1)
-
-    restored = pickle.loads(pickle.dumps(table))
-
-    assert restored.count_rows() == 1
-    assert restored.to_arrow().column("id").to_pylist() == [1]
-    restored.checkout_latest()
-    assert restored.count_rows() == 2
-
-
-def test_memory_lance_table_pickle_is_unsupported(mem_db: DBConnection):
-    table = mem_db.create_table("memory_pickle", pa.table({"id": [1]}))
-
-    with pytest.raises(ValueError, match="in-memory LanceTable"):
-        pickle.dumps(table)
-
-
 def test_table_to_pandas_default_matches_arrow(tmp_db: DBConnection):
    pd = pytest.importorskip("pandas")
    data = pa.table({"id": [1, 2], "text": ["one", "two"]})
@@ -88,27 +80,30 @@ def test_table_to_pandas_default_matches_arrow(tmp_db: DBConnection):
    pd.testing.assert_frame_equal(table.to_pandas(), expected)


-def test_table_to_pandas_blob_bytes(tmp_db: DBConnection):
+def test_table_to_pandas_invalid_blob_mode_non_blob_table(tmp_db: DBConnection):
+    data = pa.table({"id": [1, 2], "text": ["one", "two"]})
+    table = tmp_db.create_table("test_to_pandas_invalid_blob_mode", data=data)
+
+    with pytest.raises(ValueError, match="blob_mode must be one of"):
+        table.to_pandas(blob_mode="invalid")
+
+
+@pytest.mark.parametrize("blob_mode", ["lazy", "bytes", "descriptions"])
+def test_table_to_pandas_blob_modes(tmp_db: DBConnection, blob_mode):
    pytest.importorskip("lance")
-    data = pa.table(
-        {
-            "id": pa.array([1, 2], pa.int64()),
-            "blob": pa.array([b"hello", b"world"], pa.large_binary()),
-        },
-        schema=pa.schema(
-            [
-                pa.field("id", pa.int64()),
-                pa.field(
-                    "blob", pa.large_binary(), metadata={"lance-encoding:blob": "true"}
-                ),
-            ]
-        ),
-    )
-    table = tmp_db.create_table("test_to_pandas_blob_bytes", data=data)
+    table = tmp_db.create_table(f"test_to_pandas_blob_{blob_mode}", _blob_test_data())

-    df = table.to_pandas(blob_mode="bytes")
+    df = table.to_pandas(blob_mode=blob_mode)

-    assert df["blob"].tolist() == [b"hello", b"world"]
+    if blob_mode == "lazy":
+        _assert_lazy_blob(df["blob"].iloc[0], b"hello")
+        _assert_lazy_blob(df["blob"].iloc[1], b"world")
+    elif blob_mode == "bytes":
+        assert df["blob"].tolist() == [b"hello", b"world"]
+    else:
+        first = df["blob"].iloc[0]
+        assert first != b"hello"
+        assert not hasattr(first, "readall")


 def test_table_to_pandas_kwargs(tmp_db: DBConnection):
@@ -124,22 +119,8 @@ def test_table_to_pandas_kwargs(tmp_db: DBConnection):
@pytest.mark.asyncio
 async def test_async_table_to_pandas_blob_bytes(tmp_db_async: AsyncConnection):
    pytest.importorskip("lance")
-    data = pa.table(
-        {
-            "id": pa.array([1, 2], pa.int64()),
-            "blob": pa.array([b"hello", b"world"], pa.large_binary()),
-        },
-        schema=pa.schema(
-            [
-                pa.field("id", pa.int64()),
-                pa.field(
-                    "blob", pa.large_binary(), metadata={"lance-encoding:blob": "true"}
-                ),
-            ]
-        ),
-    )
    table = await tmp_db_async.create_table(
-        "test_async_to_pandas_blob_bytes", data=data
+        "test_async_to_pandas_blob_bytes", data=_blob_test_data()
    )

    df = await table.to_pandas(blob_mode="bytes")
@@ -147,6 +128,19 @@ async def test_async_table_to_pandas_blob_bytes(tmp_db_async: AsyncConnection):
    assert df["blob"].tolist() == [b"hello", b"world"]


+@pytest.mark.asyncio
+async def test_async_table_to_pandas_invalid_blob_mode_non_blob_table(
+    tmp_db_async: AsyncConnection,
+):
+    table = await tmp_db_async.create_table(
+        "test_async_to_pandas_invalid_blob_mode",
+        data=pa.table({"id": [1, 2], "text": ["one", "two"]}),
+    )
+
+    with pytest.raises(ValueError, match="blob_mode must be one of"):
+        await table.to_pandas(blob_mode="invalid")
+
+
@pytest.mark.asyncio
 async def test_async_table_to_pandas_kwargs(tmp_db_async: AsyncConnection):
    pd = pytest.importorskip("pandas")
@@ -1295,6 +1289,45 @@ def test_add_with_empty_fixed_size_list_drops_bad_rows(mem_db: DBConnection):
    assert np.allclose(data["embedding"].to_pylist()[0], np.array([0.1] * 16))


+def test_add_nullable_struct_with_none(mem_db: DBConnection):
+    """Regression test for issue #2654: a nullable struct column whose
+    first batch contains only None values must not crash in
+    _align_field_types with AttributeError: 'pyarrow.lib.DataType'
+    object has no attribute 'fields'.
+
+    PyArrow infers an all-None struct column as `null` (not `struct`),
+    so the type-alignment path needs to handle the case where the
+    source field type is null and use the target type directly.
+    """
+    # Use the v2.1 file format so that nullable structs are supported.
+    table = mem_db.create_table(
+        "test_nullable_struct",
+        schema=pa.schema(
+            [
+                pa.field("id", pa.string()),
+                pa.field(
+                    "data",
+                    pa.struct([pa.field("x", pa.float32())]),
+                    nullable=True,
+                ),
+            ]
+        ),
+        storage_options=dict(new_table_data_storage_version="2.1"),
+    )
+
+    # Adding a row with a non-null struct should work.
+    table.add([{"id": "1", "data": {"x": 1.0}}])
+
+    # Adding a row with None for the nullable struct field should also
+    # work — this is what used to crash.
+    table.add([{"id": "2", "data": None}])
+
+    result = table.to_arrow()
+    assert result.num_rows == 2
+    assert result.column("id").to_pylist() == ["1", "2"]
+    assert result.column("data").to_pylist() == [{"x": 1.0}, None]
+
+
 def test_add_with_integer_embeddings_preserves_casting(mem_db: DBConnection):
    class Schema(LanceModel):
        text: str
@@ -2503,6 +2536,30 @@ def test_alter_columns(mem_db: DBConnection):
    assert table.to_arrow().column_names == ["new_id"]


+def test_update_field_metadata(mem_db: DBConnection):
+    data = pa.table({"id": [0, 1], "category": ["a", "b"]})
+    table = mem_db.create_table("my_table", data=data)
+
+    res = table.update_field_metadata(
+        {"path": "category", "metadata": {"unit": "label", "pii": "false"}}
+    )
+    assert res.version == 2
+    # Arrow field metadata is bytes-keyed
+    assert table.schema.field("category").metadata == {
+        b"unit": b"label",
+        b"pii": b"false",
+    }
+
+    # merge: add a key, delete one via None, keep the rest
+    table.update_field_metadata(
+        {"path": "category", "metadata": {"source": "import", "pii": None}}
+    )
+    assert table.schema.field("category").metadata == {
+        b"unit": b"label",
+        b"source": b"import",
+    }
+
+
@pytest.mark.asyncio
 async def test_alter_columns_async(mem_db_async: AsyncConnection):
    data = pa.table({"id": [0, 1]})
@@ -2781,3 +2838,38 @@ def test_sanitize_data_metadata_not_stripped():
    assert result_schema.metadata is not None
    assert result_schema.metadata[b"existing_key"] == b"existing_value"
    assert result_schema.metadata[b"new_key"] == b"new_value"
+
+
+@pytest.mark.asyncio
+async def test_async_search_runs_embedding_on_dedicated_executor(
+    mem_db_async: AsyncConnection,
+):
+    # Regression test for #3310: AsyncTable.search() must run the (potentially
+    # blocking) query-embedding call on the dedicated embedding executor, not
+    # asyncio's default executor -- which is shared with other blocking I/O and
+    # can be starved by a slow embedding call under concurrent load.
+    func = MockTextEmbeddingFunction.create()
+
+    class Schema(LanceModel):
+        text: str = func.SourceField()
+        vector: Vector(func.ndims()) = func.VectorField()
+
+    table = await mem_db_async.create_table("embed_executor", schema=Schema)
+    await table.add([{"text": "hello world"}])
+
+    captured_threads: List[str] = []
+    original = MockTextEmbeddingFunction.generate_embeddings
+
+    def record_thread(self, texts):
+        captured_threads.append(threading.current_thread().name)
+        return original(self, texts)
+
+    # Patch only around the search so we capture the query-embedding call, not
+    # the add-time source-embedding call.
+    with patch.object(MockTextEmbeddingFunction, "generate_embeddings", record_thread):
+        await (await table.search("a query string")).limit(1).to_list()
+
+    assert captured_threads, "search did not invoke the embedding function"
+    assert all(name.startswith("lancedb-embedding") for name in captured_threads), (
+        f"embedding ran off the dedicated executor: {captured_threads}"
+    )
--- a/python/src/lib.rs
+++ b/python/src/lib.rs
@@ -16,7 +16,7 @@ use query::{FTSQuery, HybridQuery, Query, VectorQuery};
 use session::Session;
 use table::{
    AddColumnsResult, AddResult, AlterColumnsResult, DeleteResult, DropColumnsResult, LsmWriteSpec,
-    MergeResult, Table, UpdateResult,
+    MergeResult, Table, UpdateFieldMetadataResult, UpdateResult,
 };

 pub mod arrow;
@@ -50,6 +50,7 @@ pub fn _lancedb(_py: Python, m: &Bound<'_, PyModule>) -> PyResult<()> {
    m.add_class::<RecordBatchStream>()?;
    m.add_class::<AddColumnsResult>()?;
    m.add_class::<AlterColumnsResult>()?;
+    m.add_class::<UpdateFieldMetadataResult>()?;
    m.add_class::<AddResult>()?;
    m.add_class::<MergeResult>()?;
    m.add_class::<LsmWriteSpec>()?;
--- a/python/src/table.rs
+++ b/python/src/table.rs
@@ -16,12 +16,12 @@ use arrow::{
    pyarrow::{FromPyArrow, PyArrowType, ToPyArrow},
 };
 use lancedb::table::{
-    AddDataMode, ColumnAlteration, Duration, NewColumnTransform, OptimizeAction, OptimizeOptions,
-    Table as LanceDbTable,
+    AddDataMode, ColumnAlteration, Duration, FieldMetadataUpdate, NewColumnTransform,
+    OptimizeAction, OptimizeOptions, Table as LanceDbTable,
 };
 use pyo3::{
    Bound, FromPyObject, Py, PyAny, PyRef, PyResult, Python,
-    exceptions::{PyKeyError, PyRuntimeError, PyValueError},
+    exceptions::{PyRuntimeError, PyValueError},
    pyclass, pymethods,
    types::{IntoPyDict, PyAnyMethods, PyDict, PyDictMethods},
 };
@@ -357,6 +357,27 @@ impl From<lancedb::table::AlterColumnsResult> for AlterColumnsResult {
    }
 }

+#[pyclass(get_all, from_py_object)]
+#[derive(Clone, Debug)]
+pub struct UpdateFieldMetadataResult {
+    pub version: u64,
+}
+
+#[pymethods]
+impl UpdateFieldMetadataResult {
+    pub fn __repr__(&self) -> String {
+        format!("UpdateFieldMetadataResult(version={})", self.version)
+    }
+}
+
+impl From<lancedb::table::UpdateFieldMetadataResult> for UpdateFieldMetadataResult {
+    fn from(result: lancedb::table::UpdateFieldMetadataResult) -> Self {
+        Self {
+            version: result.version,
+        }
+    }
+}
+
 #[pyclass(get_all, from_py_object)]
 #[derive(Clone, Debug)]
 pub struct DropColumnsResult {
@@ -755,23 +776,6 @@ impl Table {
        })
    }

-    pub fn _table_reopen_state(self_: PyRef<'_, Self>) -> PyResult<Bound<'_, PyAny>> {
-        let inner = self_.inner_ref()?.clone();
-        future_into_py(self_.py(), async move {
-            let name = inner.name().to_string();
-            let namespace_path = inner.namespace().to_vec();
-            let storage_options = inner.initial_storage_options().await;
-
-            Python::attach(|py| {
-                let dict = PyDict::new(py);
-                dict.set_item("name", name)?;
-                dict.set_item("namespace_path", namespace_path)?;
-                dict.set_item("storage_options", storage_options)?;
-                Ok(dict.unbind())
-            })
-        })
-    }
-
    pub fn __repr__(&self) -> String {
        match &self.inner {
            None => format!("ClosedTable({})", self.name),
@@ -1119,31 +1123,57 @@ impl Table {
        field_name: String,
        metadata: &Bound<'_, PyDict>,
    ) -> PyResult<Bound<'a, PyAny>> {
-        let mut new_metadata = HashMap::<String, String>::new();
-        for (column_name, value) in metadata.into_iter() {
-            let key: String = column_name.extract()?;
-            let value: String = value.extract()?;
-            new_metadata.insert(key, value);
+        // Deprecated: forwards to the update_field_metadata path (replace mode).
+        let mut update = FieldMetadataUpdate::new(field_name).replace();
+        for (key, value) in metadata.into_iter() {
+            update = update.set(key.extract::<String>()?, value.extract::<String>()?);
        }

        let inner = self_.inner_ref()?.clone();
        future_into_py(self_.py(), async move {
-            let native_tbl = inner
-                .as_native()
-                .ok_or_else(|| PyValueError::new_err("This cannot be run on a remote table"))?;
-            let schema = native_tbl.manifest().await.infer_error()?.schema;
-            let field = schema
-                .field(&field_name)
-                .ok_or_else(|| PyKeyError::new_err(format!("Field {} not found", field_name)))?;
-
-            native_tbl
-                .replace_field_metadata(vec![(field.id as u32, new_metadata)])
-                .await
-                .infer_error()?;
-
+            inner.update_field_metadata(&[update]).await.infer_error()?;
            Ok(())
        })
    }
+
+    pub fn update_field_metadata<'a>(
+        self_: PyRef<'a, Self>,
+        updates: Vec<Bound<PyDict>>,
+    ) -> PyResult<Bound<'a, PyAny>> {
+        let updates = updates
+            .iter()
+            .map(|update| {
+                let path: String = update
+                    .get_item("path")?
+                    .ok_or_else(|| PyValueError::new_err("Missing path"))?
+                    .extract()?;
+                let mut field_update = FieldMetadataUpdate::new(path);
+                if let Some(metadata) = update.get_item("metadata")? {
+                    let metadata_dict = metadata.cast::<PyDict>()?;
+                    for (key, value) in metadata_dict.iter() {
+                        let key: String = key.extract()?;
+                        if value.is_none() {
+                            field_update = field_update.remove(key);
+                        } else {
+                            field_update = field_update.set(key, value.extract::<String>()?);
+                        }
+                    }
+                }
+                if let Some(replace) = update.get_item("replace")?
+                    && replace.extract::<bool>()?
+                {
+                    field_update = field_update.replace();
+                }
+                Ok(field_update)
+            })
+            .collect::<PyResult<Vec<_>>>()?;
+
+        let inner = self_.inner_ref()?.clone();
+        future_into_py(self_.py(), async move {
+            let result = inner.update_field_metadata(&updates).await.infer_error()?;
+            Ok(UpdateFieldMetadataResult::from(result))
+        })
+    }
 }

 #[derive(FromPyObject)]
--- a/python/uv.lock
+++ b/python/uv.lock
--- a/rust/lancedb/Cargo.toml
+++ b/rust/lancedb/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "lancedb"
-version = "0.30.1-beta.0"
+version = "0.30.1-beta.2"
 edition.workspace = true
 description = "LanceDB: A serverless, low-latency vector database for AI applications"
 license.workspace = true
--- a/rust/lancedb/src/dataloader/permutation/shuffle.rs
+++ b/rust/lancedb/src/dataloader/permutation/shuffle.rs
@@ -203,11 +203,11 @@ impl Shuffler {

        // Finish writing files
        for (file_idx, mut writer) in file_writers.into_iter().enumerate() {
-            let num_written = writer.finish().await?;
+            let write_summary = writer.finish().await?;
            log::debug!(
                "Shuffle job {}: wrote {} rows to file {}",
                self.id,
-                num_written,
+                write_summary.num_rows,
                file_idx
            );
        }
--- a/rust/lancedb/src/remote/table.rs
+++ b/rust/lancedb/src/remote/table.rs
@@ -18,13 +18,14 @@ use crate::index::waiter::wait_for_index;
 use crate::query::{QueryFilter, QueryRequest, Select, VectorQueryRequest};
 use crate::table::AddColumnsResult;
 use crate::table::AddResult;
-use crate::table::AlterColumnsResult;
 use crate::table::DeleteResult;
 use crate::table::DropColumnsResult;
 use crate::table::MergeResult;
 use crate::table::Tags;
 use crate::table::UpdateResult;
+use crate::table::merge::MergeFilter;
 use crate::table::query::create_multi_vector_plan;
+use crate::table::{AlterColumnsResult, FieldMetadataUpdate, UpdateFieldMetadataResult};
 use crate::table::{AnyQuery, Filter, Predicate, PreprocessingOutput, TableStatistics};
 use crate::utils::background_cache::BackgroundCache;
 use crate::utils::{
@@ -1826,16 +1827,57 @@ impl<S: HttpSend> BaseTable for RemoteTable<S> {
        })
    }

-    async fn set_lsm_write_spec(&self, _spec: crate::table::LsmWriteSpec) -> Result<()> {
-        Err(Error::NotSupported {
-            message: "set_lsm_write_spec is not supported on LanceDB cloud.".into(),
-        })
+    async fn set_lsm_write_spec(&self, spec: crate::table::LsmWriteSpec) -> Result<()> {
+        use crate::table::LsmWriteSpec;
+        self.check_mutable().await?;
+
+        // Map the spec onto the server's request DTO. `sharding` is internally
+        // tagged on `mode` to mirror sophon's `Sharding` enum; `maintained_indexes`
+        // and `writer_config_defaults` are sent verbatim (an empty list means "no
+        // maintained indexes", not "default to all").
+        let sharding = match &spec {
+            LsmWriteSpec::Bucket {
+                column,
+                num_buckets,
+                ..
+            } => serde_json::json!({
+                "mode": "bucket",
+                "column": column,
+                "num_buckets": num_buckets,
+            }),
+            LsmWriteSpec::Identity { column, .. } => serde_json::json!({
+                "mode": "identity",
+                "column": column,
+            }),
+            LsmWriteSpec::Unsharded { .. } => serde_json::json!({ "mode": "unsharded" }),
+        };
+        let body = serde_json::json!({
+            "sharding": sharding,
+            "maintained_indexes": spec.maintained_indexes(),
+            "writer_config_defaults": spec.writer_config_defaults(),
+        });
+
+        let request = self
+            .client
+            .post(&format!(
+                "/v1/table/{}/set_lsm_write_spec/",
+                self.identifier
+            ))
+            .json(&body);
+        let (request_id, response) = self.send(request, true).await?;
+        self.check_table_response(&request_id, response).await?;
+        Ok(())
    }

    async fn unset_lsm_write_spec(&self) -> Result<()> {
-        Err(Error::NotSupported {
-            message: "unset_lsm_write_spec is not supported on LanceDB cloud.".into(),
-        })
+        self.check_mutable().await?;
+        let request = self.client.post(&format!(
+            "/v1/table/{}/unset_lsm_write_spec/",
+            self.identifier
+        ));
+        let (request_id, response) = self.send(request, true).await?;
+        self.check_table_response(&request_id, response).await?;
+        Ok(())
    }

    async fn tags(&self) -> Result<Box<dyn Tags + '_>> {
@@ -1968,6 +2010,35 @@ impl<S: HttpSend> BaseTable for RemoteTable<S> {
        Ok(result)
    }

+    async fn update_field_metadata(
+        &self,
+        updates: &[FieldMetadataUpdate],
+    ) -> Result<UpdateFieldMetadataResult> {
+        self.check_mutable().await?;
+        let body = serde_json::json!({ "updates": updates });
+        let request = self
+            .client
+            .post(&format!(
+                "/v1/table/{}/update_field_metadata/",
+                self.identifier
+            ))
+            .json(&body);
+        let (request_id, response) = self.send(request, true).await?;
+        let response = self.check_table_response(&request_id, response).await?;
+        let body = response.text().await.err_to_http(request_id.clone())?;
+
+        let result: UpdateFieldMetadataResult =
+            serde_json::from_str(&body).map_err(|e| Error::Http {
+                source: format!("Failed to parse update_field_metadata response: {}", e).into(),
+                request_id,
+                status_code: None,
+            })?;
+
+        self.invalidate_schema_cache();
+        self.track_write_version(result.version);
+        Ok(result)
+    }
+
    async fn drop_columns(&self, columns: &[&str]) -> Result<DropColumnsResult> {
        self.check_mutable().await?;
        let body = serde_json::json!({ "columns": columns });
@@ -2237,13 +2308,34 @@ impl TryFrom<MergeInsertBuilder> for MergeInsertRequest {
        }
        let on = value.on[0].clone();

+        let when_matched_update_all_filt = match value.when_matched_update_all_filt {
+            Some(MergeFilter::Sql(sql)) => Some(sql),
+            Some(MergeFilter::Expr(_)) => {
+                return Err(Error::NotSupported {
+                    message: "DataFusion expressions are not supported on remote tables".into(),
+                });
+            }
+            None => None,
+        };
+
+        let when_not_matched_by_source_delete_filt =
+            match value.when_not_matched_by_source_delete_filt {
+                Some(MergeFilter::Sql(sql)) => Some(sql),
+                Some(MergeFilter::Expr(_)) => {
+                    return Err(Error::NotSupported {
+                        message: "DataFusion expressions are not supported on remote tables".into(),
+                    });
+                }
+                None => None,
+            };
+
        Ok(Self {
            on,
            when_matched_update_all: value.when_matched_update_all,
-            when_matched_update_all_filt: value.when_matched_update_all_filt,
+            when_matched_update_all_filt,
            when_not_matched_insert_all: value.when_not_matched_insert_all,
            when_not_matched_by_source_delete: value.when_not_matched_by_source_delete,
-            when_not_matched_by_source_delete_filt: value.when_not_matched_by_source_delete_filt,
+            when_not_matched_by_source_delete_filt,
            // Only serialize use_index when it's false for backwards compatibility
            use_index: value.use_index,
        })
@@ -2261,6 +2353,7 @@ mod tests {

    use crate::remote::client::{ClientConfig, RetryConfig};
    use crate::table::AddDataMode;
+    use crate::table::FieldMetadataUpdate;

    use arrow::{array::AsArray, compute::concat_batches, datatypes::Int32Type};
    use arrow_array::{Int32Array, RecordBatch, RecordBatchIterator, record_batch};
@@ -4376,6 +4469,91 @@ mod tests {
        assert!(matches!(e, Error::IndexNotFound { .. }));
    }

+    #[tokio::test]
+    async fn test_set_lsm_write_spec_unsharded() {
+        let table = Table::new_with_handler("my_table", |request| {
+            assert_eq!(request.method(), "POST");
+            assert_eq!(
+                request.url().path(),
+                "/v1/table/my_table/set_lsm_write_spec/"
+            );
+            let body = request.body().unwrap().as_bytes().unwrap();
+            let body: serde_json::Value = serde_json::from_slice(body).unwrap();
+            assert_eq!(body["sharding"], serde_json::json!({ "mode": "unsharded" }));
+            assert_eq!(body["maintained_indexes"], serde_json::json!(["id_idx"]));
+            assert_eq!(
+                body["writer_config_defaults"],
+                serde_json::json!({ "max_memtable_rows": "1000" })
+            );
+            http::Response::builder()
+                .status(200)
+                .body(r#"{"maintained_indexes":["id_idx"]}"#)
+                .unwrap()
+        });
+        let spec = crate::table::LsmWriteSpec::unsharded()
+            .with_maintained_indexes(["id_idx"])
+            .with_writer_config_defaults([("max_memtable_rows", "1000")]);
+        table.set_lsm_write_spec(spec).await.unwrap();
+    }
+
+    #[tokio::test]
+    async fn test_set_lsm_write_spec_bucket() {
+        let table = Table::new_with_handler("my_table", |request| {
+            assert_eq!(request.method(), "POST");
+            assert_eq!(
+                request.url().path(),
+                "/v1/table/my_table/set_lsm_write_spec/"
+            );
+            let body = request.body().unwrap().as_bytes().unwrap();
+            let body: serde_json::Value = serde_json::from_slice(body).unwrap();
+            assert_eq!(
+                body["sharding"],
+                serde_json::json!({ "mode": "bucket", "column": "id", "num_buckets": 16 })
+            );
+            assert_eq!(body["maintained_indexes"], serde_json::json!([]));
+            http::Response::builder().status(200).body("{}").unwrap()
+        });
+        table
+            .set_lsm_write_spec(crate::table::LsmWriteSpec::bucket("id", 16))
+            .await
+            .unwrap();
+    }
+
+    #[tokio::test]
+    async fn test_set_lsm_write_spec_identity() {
+        let table = Table::new_with_handler("my_table", |request| {
+            assert_eq!(request.method(), "POST");
+            assert_eq!(
+                request.url().path(),
+                "/v1/table/my_table/set_lsm_write_spec/"
+            );
+            let body = request.body().unwrap().as_bytes().unwrap();
+            let body: serde_json::Value = serde_json::from_slice(body).unwrap();
+            assert_eq!(
+                body["sharding"],
+                serde_json::json!({ "mode": "identity", "column": "tenant" })
+            );
+            http::Response::builder().status(200).body("{}").unwrap()
+        });
+        table
+            .set_lsm_write_spec(crate::table::LsmWriteSpec::identity("tenant"))
+            .await
+            .unwrap();
+    }
+
+    #[tokio::test]
+    async fn test_unset_lsm_write_spec() {
+        let table = Table::new_with_handler("my_table", |request| {
+            assert_eq!(request.method(), "POST");
+            assert_eq!(
+                request.url().path(),
+                "/v1/table/my_table/unset_lsm_write_spec/"
+            );
+            http::Response::builder().status(200).body("{}").unwrap()
+        });
+        table.unset_lsm_write_spec().await.unwrap();
+    }
+
    #[tokio::test]
    async fn test_wait_for_index() {
        let table = _make_table_with_indices(0);
@@ -6460,4 +6638,25 @@ mod tests {
        assert!(!headers.contains_key("x-lancedb-min-version"));
        assert!(!headers.contains_key("x-lancedb-min-timestamp"));
    }
+
+    #[tokio::test]
+    async fn test_update_field_metadata() {
+        let table = Table::new_with_handler("my_table", |request| {
+            assert_eq!(request.method(), "POST");
+            assert_eq!(
+                request.url().path(),
+                "/v1/table/my_table/update_field_metadata/"
+            );
+            http::Response::builder()
+                .status(200)
+                .body(r#"{"version": 7, "fields": {"category": {"unit": "label"}}}"#)
+                .unwrap()
+        });
+
+        let result = table
+            .update_field_metadata(&[FieldMetadataUpdate::new("category").set("unit", "label")])
+            .await
+            .unwrap();
+        assert_eq!(result.version, 7);
+    }
 }
--- a/rust/lancedb/src/table.rs
+++ b/rust/lancedb/src/table.rs
@@ -91,7 +91,10 @@ pub use lance::dataset::scanner::DatasetRecordBatchStream;
 use lance::dataset::statistics::DatasetStatisticsExt;
 pub use lance_index::optimize::OptimizeOptions;
 pub use optimize::{CompactionOptions, OptimizeAction, OptimizeStats};
-pub use schema_evolution::{AddColumnsResult, AlterColumnsResult, DropColumnsResult};
+pub use schema_evolution::{
+    AddColumnsResult, AlterColumnsResult, DropColumnsResult, FieldMetadataUpdate,
+    UpdateFieldMetadataResult,
+};
 use serde_with::skip_serializing_none;
 pub use update::{UpdateBuilder, UpdateResult};

@@ -660,6 +663,19 @@ pub trait BaseTable: std::fmt::Display + std::fmt::Debug + Send + Sync {
            message: "create_insert_exec not implemented".to_string(),
        })
    }
+    /// Update per-field metadata. Merges into existing metadata by default;
+    /// [`FieldMetadataUpdate::remove`] deletes a key and
+    /// [`FieldMetadataUpdate::replace`] swaps the field's whole map.
+    ///
+    /// The default returns `NotSupported`; Lance-backed and remote tables override it.
+    async fn update_field_metadata(
+        &self,
+        _updates: &[FieldMetadataUpdate],
+    ) -> Result<UpdateFieldMetadataResult> {
+        Err(Error::NotSupported {
+            message: "update_field_metadata is not supported on this table type".into(),
+        })
+    }
 }

 /// A Table is a collection of strong typed Rows.
@@ -1340,6 +1356,14 @@ impl Table {
        self.inner.alter_columns(alterations).await
    }

+    /// Update per-field metadata (merges by default).
+    pub async fn update_field_metadata(
+        &self,
+        updates: &[FieldMetadataUpdate],
+    ) -> Result<UpdateFieldMetadataResult> {
+        self.inner.update_field_metadata(updates).await
+    }
+
    /// Remove columns from the table.
    pub async fn drop_columns(&self, columns: &[&str]) -> Result<DropColumnsResult> {
        self.inner.drop_columns(columns).await
@@ -2580,6 +2604,7 @@ impl NativeTable {
    ///   field id and the second element is a hashmap of metadata key-value
    ///   pairs.
    ///
+    #[deprecated(since = "0.33.1", note = "Use `update_field_metadata` instead")]
    pub async fn replace_field_metadata(
        &self,
        new_values: impl IntoIterator<Item = (u32, HashMap<String, String>)>,
@@ -2886,6 +2911,13 @@ impl BaseTable for NativeTable {
        schema_evolution::execute_alter_columns(self, alterations).await
    }

+    async fn update_field_metadata(
+        &self,
+        updates: &[FieldMetadataUpdate],
+    ) -> Result<UpdateFieldMetadataResult> {
+        schema_evolution::execute_update_field_metadata(self, updates).await
+    }
+
    async fn drop_columns(&self, columns: &[&str]) -> Result<DropColumnsResult> {
        schema_evolution::execute_drop_columns(self, columns).await
    }
@@ -3136,7 +3168,6 @@ pub struct FragmentSummaryStats {
 #[cfg(test)]
 #[allow(deprecated)]
 mod tests {
-    use std::collections::HashMap;
    use std::sync::Arc;
    use std::sync::atomic::{AtomicBool, Ordering};
    use std::time::Duration;
@@ -4449,10 +4480,10 @@ mod tests {
            Some(&"test_val2_update".to_string())
        );

-        let mut new_field_metadata = HashMap::<String, String>::new();
-        new_field_metadata.insert("test_field_key1".into(), "test_field_val1".into());
        native_tbl
-            .replace_field_metadata(vec![(field.id as u32, new_field_metadata)])
+            .update_field_metadata(&[
+                FieldMetadataUpdate::new("i").set("test_field_key1", "test_field_val1")
+            ])
            .await
            .unwrap();

--- a/rust/lancedb/src/table/merge.rs
+++ b/rust/lancedb/src/table/merge.rs
@@ -53,6 +53,12 @@ pub struct MergeResult {
    pub num_rows: u64,
 }

+#[derive(Debug, Clone)]
+pub enum MergeFilter {
+    Sql(String),
+    Expr(datafusion_expr::Expr),
+}
+
 /// A builder used to create and run a merge insert operation
 ///
 /// See [`super::Table::merge_insert`] for more context
@@ -61,10 +67,10 @@ pub struct MergeInsertBuilder {
    table: Arc<dyn BaseTable>,
    pub(crate) on: Vec<String>,
    pub(crate) when_matched_update_all: bool,
-    pub(crate) when_matched_update_all_filt: Option<String>,
+    pub(crate) when_matched_update_all_filt: Option<MergeFilter>,
    pub(crate) when_not_matched_insert_all: bool,
    pub(crate) when_not_matched_by_source_delete: bool,
-    pub(crate) when_not_matched_by_source_delete_filt: Option<String>,
+    pub(crate) when_not_matched_by_source_delete_filt: Option<MergeFilter>,
    pub(crate) timeout: Option<Duration>,
    pub(crate) use_index: bool,
    pub(crate) use_lsm_write: Option<bool>,
@@ -110,7 +116,14 @@ impl MergeInsertBuilder {
    /// For example, "target.last_update < source.last_update"
    pub fn when_matched_update_all(&mut self, condition: Option<String>) -> &mut Self {
        self.when_matched_update_all = true;
-        self.when_matched_update_all_filt = condition;
+        self.when_matched_update_all_filt = condition.map(MergeFilter::Sql);
+        self
+    }
+
+    /// Similar to [`Self::when_matched_update_all`] but accepts a DataFusion logical expression directly.
+    pub fn when_matched_update_all_expr(&mut self, condition: datafusion_expr::Expr) -> &mut Self {
+        self.when_matched_update_all = true;
+        self.when_matched_update_all_filt = Some(MergeFilter::Expr(condition));
        self
    }

@@ -132,7 +145,17 @@ impl MergeInsertBuilder {
    ///   limit what rows are deleted.
    pub fn when_not_matched_by_source_delete(&mut self, filter: Option<String>) -> &mut Self {
        self.when_not_matched_by_source_delete = true;
-        self.when_not_matched_by_source_delete_filt = filter;
+        self.when_not_matched_by_source_delete_filt = filter.map(MergeFilter::Sql);
+        self
+    }
+
+    /// Similar to [`Self::when_not_matched_by_source_delete`] but accepts a DataFusion logical expression directly.
+    pub fn when_not_matched_by_source_delete_expr(
+        &mut self,
+        filter: datafusion_expr::Expr,
+    ) -> &mut Self {
+        self.when_not_matched_by_source_delete = true;
+        self.when_not_matched_by_source_delete_filt = Some(MergeFilter::Expr(filter));
        self
    }

@@ -234,7 +257,12 @@ pub(crate) async fn execute_merge_insert(
    ) {
        (false, _) => builder.when_matched(WhenMatched::DoNothing),
        (true, None) => builder.when_matched(WhenMatched::UpdateAll),
-        (true, Some(filt)) => builder.when_matched(WhenMatched::update_if(&dataset, &filt)?),
+        (true, Some(MergeFilter::Sql(filt))) => {
+            builder.when_matched(WhenMatched::update_if(&dataset, &filt)?)
+        }
+        (true, Some(MergeFilter::Expr(expr))) => {
+            builder.when_matched(WhenMatched::update_if_expr(expr))
+        }
    };
    if params.when_not_matched_insert_all {
        builder.when_not_matched(lance::dataset::WhenNotMatched::InsertAll);
@@ -242,10 +270,12 @@ pub(crate) async fn execute_merge_insert(
        builder.when_not_matched(lance::dataset::WhenNotMatched::DoNothing);
    }
    if params.when_not_matched_by_source_delete {
-        let behavior = if let Some(filter) = params.when_not_matched_by_source_delete_filt {
-            WhenNotMatchedBySource::delete_if(dataset.as_ref(), &filter)?
-        } else {
-            WhenNotMatchedBySource::Delete
+        let behavior = match params.when_not_matched_by_source_delete_filt {
+            Some(MergeFilter::Sql(filter)) => {
+                WhenNotMatchedBySource::delete_if(dataset.as_ref(), &filter)?
+            }
+            Some(MergeFilter::Expr(expr)) => WhenNotMatchedBySource::DeleteIf(expr),
+            None => WhenNotMatchedBySource::Delete,
        };
        builder.when_not_matched_by_source(behavior);
    } else {
@@ -386,6 +416,45 @@ mod tests {
        merge_insert_builder.execute(new_batches).await.unwrap();
        assert_eq!(table.count_rows(None).await.unwrap(), 25);
    }
+
+    #[tokio::test]
+    async fn test_merge_insert_expr() {
+        use datafusion_expr::{col, lit};
+
+        let conn = connect("memory://").execute().await.unwrap();
+
+        // Create a dataset with i=0..10
+        let batches = merge_insert_test_batches(0, 0);
+        let table = conn
+            .create_table("my_table_expr", batches)
+            .execute()
+            .await
+            .unwrap();
+        assert_eq!(table.count_rows(None).await.unwrap(), 10);
+
+        // Conditional update that only replaces the age=0 data
+        let new_batches = merge_insert_test_batches(5, 3);
+        let mut merge_insert_builder = table.merge_insert(&["i"]);
+        // use expression: target.age = 0
+        let expr = col("target.age").eq(lit(0));
+        merge_insert_builder.when_matched_update_all_expr(expr);
+        merge_insert_builder.execute(new_batches).await.unwrap();
+        assert_eq!(
+            table.count_rows(Some("age = 3".to_string())).await.unwrap(),
+            5
+        );
+
+        // Delete with expression
+        // Create new batches with i=10..20 (so target rows i=0..9 are not matched by source)
+        let new_batches = merge_insert_test_batches(10, 0); // won't insert or update since we don't enable matched/unmatched actions
+        let mut merge_insert_builder = table.merge_insert(&["i"]);
+        // delete if target.age = 3
+        let delete_expr = col("target.age").eq(lit(3));
+        merge_insert_builder.when_not_matched_by_source_delete_expr(delete_expr);
+        let result = merge_insert_builder.execute(new_batches).await.unwrap();
+        assert_eq!(result.num_deleted_rows, 5);
+        assert_eq!(table.count_rows(None).await.unwrap(), 5);
+    }
 }

 #[cfg(test)]
--- a/rust/lancedb/src/table/schema_evolution.rs
+++ b/rust/lancedb/src/table/schema_evolution.rs
@@ -10,6 +10,7 @@

 use lance::dataset::{ColumnAlteration, NewColumnTransform};
 use serde::{Deserialize, Serialize};
+use std::collections::HashMap;

 use super::NativeTable;
 use crate::Result;
@@ -44,6 +45,52 @@ pub struct DropColumnsResult {
    pub version: u64,
 }

+/// A single field's metadata update, addressed by dot-path.
+///
+/// Merges into the field's existing metadata by default. Use [`Self::remove`] to
+/// delete a key, or [`Self::replace`] to swap the field's entire metadata map.
+#[derive(Debug, Clone, PartialEq, Eq, Default, Serialize)]
+pub struct FieldMetadataUpdate {
+    /// Dot-separated path to the field (e.g. `"embedding"` or `"address.zip"`).
+    pub path: String,
+    /// Keys to set (`Some`) or delete (`None`).
+    pub metadata: HashMap<String, Option<String>>,
+    /// If `true`, replace the field's entire metadata map instead of merging.
+    pub replace: bool,
+}
+
+impl FieldMetadataUpdate {
+    pub fn new(path: impl Into<String>) -> Self {
+        Self {
+            path: path.into(),
+            metadata: HashMap::new(),
+            replace: false,
+        }
+    }
+
+    pub fn set(mut self, key: impl Into<String>, value: impl Into<String>) -> Self {
+        self.metadata.insert(key.into(), Some(value.into()));
+        self
+    }
+
+    pub fn remove(mut self, key: impl Into<String>) -> Self {
+        self.metadata.insert(key.into(), None);
+        self
+    }
+
+    pub fn replace(mut self) -> Self {
+        self.replace = true;
+        self
+    }
+}
+
+#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize, Default)]
+pub struct UpdateFieldMetadataResult {
+    /// The commit version associated with the operation.
+    #[serde(default)]
+    pub version: u64,
+}
+
 /// Internal implementation of the add columns logic.
 ///
 /// Adds new columns to the table using the provided transforms.
@@ -90,6 +137,32 @@ pub(crate) async fn execute_drop_columns(
    Ok(DropColumnsResult { version })
 }

+/// Internal implementation of the update field metadata logic.
+///
+/// Merges or replaces per-field metadata, addressing fields by dot-path.
+pub(crate) async fn execute_update_field_metadata(
+    table: &NativeTable,
+    updates: &[FieldMetadataUpdate],
+) -> Result<UpdateFieldMetadataResult> {
+    table.dataset.ensure_mutable()?;
+    let mut dataset = (*table.dataset.get().await?).clone();
+
+    let mut builder = dataset.update_field_metadata();
+    for update in updates {
+        let entries = update.metadata.iter().map(|(k, v)| (k.clone(), v.clone()));
+        builder = if update.replace {
+            builder.replace(&update.path, entries)?
+        } else {
+            builder.update(&update.path, entries)?
+        };
+    }
+    builder.await?;
+
+    let version = dataset.version().version;
+    table.dataset.update(dataset);
+    Ok(UpdateFieldMetadataResult { version })
+}
+
 #[cfg(test)]
 mod tests {
    use arrow_array::{Int32Array, StringArray, record_batch};
@@ -97,6 +170,7 @@ mod tests {
    use futures::TryStreamExt;
    use lance::dataset::ColumnAlteration;

+    use super::FieldMetadataUpdate;
    use crate::connect;
    use crate::query::{ExecutableQuery, QueryBase, Select};
    use crate::table::NewColumnTransform;
@@ -610,4 +684,46 @@ mod tests {
        let v4 = table.version().await.unwrap();
        assert_eq!(drop_result.version, v4);
    }
+
+    #[tokio::test]
+    async fn test_update_field_metadata() {
+        let conn = connect("memory://").execute().await.unwrap();
+        let batch = record_batch!(
+            ("id", Int32, [1, 2, 3]),
+            ("category", Utf8, ["A", "B", "C"])
+        )
+        .unwrap();
+        let table = conn
+            .create_table("test_update_field_metadata", batch)
+            .execute()
+            .await
+            .unwrap();
+
+        // Set metadata on a field.
+        table
+            .update_field_metadata(&[FieldMetadataUpdate::new("category")
+                .set("unit", "label")
+                .set("pii", "false")])
+            .await
+            .unwrap();
+        let schema = table.schema().await.unwrap();
+        let field = schema.field_with_name("category").unwrap();
+        assert_eq!(
+            field.metadata().get("unit").map(String::as_str),
+            Some("label")
+        );
+
+        // Merge: add a key, delete one, keep the rest.
+        table
+            .update_field_metadata(&[FieldMetadataUpdate::new("category")
+                .set("source", "import")
+                .remove("pii")])
+            .await
+            .unwrap();
+        let schema = table.schema().await.unwrap();
+        let md = schema.field_with_name("category").unwrap().metadata();
+        assert_eq!(md.get("unit").map(String::as_str), Some("label")); // preserved
+        assert_eq!(md.get("source").map(String::as_str), Some("import")); // added
+        assert!(!md.contains_key("pii")); // deleted
+    }
 }