ci: move lance dependency bump flow into skill

chore(deps): bump the rust-minor-patch group with 5 updates (#3465 )
Bumps the rust-minor-patch group with 5 updates: | Package | From | To | | --- | --- | --- | | [log](https://github.com/rust-lang/log) | `0.4.29` | `0.4.30` | | [serde_json](https://github.com/serde-rs/json) | `1.0.149` | `1.0.150` | | [http](https://github.com/hyperium/http) | `1.4.0` | `1.4.1` | | [uuid](https://github.com/uuid-rs/uuid) | `1.23.1` | `1.23.2` | | [aws-smithy-runtime](https://github.com/smithy-lang/smithy-rs) | `1.11.1` | `1.11.3` | Updates `log` from 0.4.29 to 0.4.30 <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/rust-lang/log/releases">log's releases</a>.</em></p> <blockquote> <h2>0.4.30</h2> <h3>What's Changed</h3> <ul> <li>Support capturing of <code>std::net</code> types by <a href="https://github.com/KodrAus"><code>@KodrAus</code></a> in <a href="https://redirect.github.com/rust-lang/log/pull/724">rust-lang/log#724</a></li> </ul> <h3>New Contributors</h3> <ul> <li><a href="https://github.com/V0ldek"><code>@V0ldek</code></a> made their first contribution in <a href="https://redirect.github.com/rust-lang/log/pull/720">rust-lang/log#720</a></li> <li><a href="https://github.com/woodruffw"><code>@woodruffw</code></a> made their first contribution in <a href="https://redirect.github.com/rust-lang/log/pull/723">rust-lang/log#723</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/rust-lang/log/compare/0.4.29...0.4.30">https://github.com/rust-lang/log/compare/0.4.29...0.4.30</a></p> <h3>Notable Changes</h3> <ul> <li>MSRV is bumped to 1.71.0 in <a href="https://redirect.github.com/rust-lang/log/pull/723">rust-lang/log#723</a></li> </ul> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/rust-lang/log/blob/master/CHANGELOG.md">log's changelog</a>.</em></p> <blockquote> <h2>[0.4.30] - 2026-05-21</h2> <h3>What's Changed</h3> <ul> <li>Support capturing of <code>std::net</code> types by <a href="https://github.com/KodrAus"><code>@KodrAus</code></a> in <a href="https://redirect.github.com/rust-lang/log/pull/724">rust-lang/log#724</a></li> </ul> <h3>New Contributors</h3> <ul> <li><a href="https://github.com/V0ldek"><code>@V0ldek</code></a> made their first contribution in <a href="https://redirect.github.com/rust-lang/log/pull/720">rust-lang/log#720</a></li> <li><a href="https://github.com/woodruffw"><code>@woodruffw</code></a> made their first contribution in <a href="https://redirect.github.com/rust-lang/log/pull/723">rust-lang/log#723</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/rust-lang/log/compare/0.4.29...0.4.30">https://github.com/rust-lang/log/compare/0.4.29...0.4.30</a></p> <h3>Notable Changes</h3> <ul> <li>MSRV is bumped to 1.71.0 in <a href="https://redirect.github.com/rust-lang/log/pull/723">rust-lang/log#723</a></li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="9c55760b49"><code>9c55760</code></a> Merge pull request <a href="https://redirect.github.com/rust-lang/log/issues/725">#725</a> from rust-lang/cargo/0.4.30</li> <li><a href="d1acb0585c"><code>d1acb05</code></a> update docs on current MSRV and note latest bump in changelog</li> <li><a href="50682937b0"><code>5068293</code></a> prepare for 0.4.30 release</li> <li><a href="7ccd873cb5"><code>7ccd873</code></a> Merge pull request <a href="https://redirect.github.com/rust-lang/log/issues/724">#724</a> from rust-lang/feat/net-to-value</li> <li><a href="923dfaaf00"><code>923dfaa</code></a> fix up test cfgs</li> <li><a href="ecb7de8daf"><code>ecb7de8</code></a> gate net value impls on std</li> <li><a href="67bb4f6d2e"><code>67bb4f6</code></a> run fmt</li> <li><a href="25f49fe3d3"><code>25f49fe</code></a> rework net type capturing</li> <li><a href="7087dcb95c"><code>7087dcb</code></a> feat: impl ToValue for core::net types</li> <li><a href="67bc7e32c6"><code>67bc7e3</code></a> Merge pull request <a href="https://redirect.github.com/rust-lang/log/issues/723">#723</a> from woodruffw-forks/ww/ci</li> <li>Additional commits viewable in <a href="https://github.com/rust-lang/log/compare/0.4.29...0.4.30">compare view</a></li> </ul> </details> <br /> Updates `serde_json` from 1.0.149 to 1.0.150 <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/serde-rs/json/releases">serde_json's releases</a>.</em></p> <blockquote> <h2>v1.0.150</h2> <ul> <li>Reject non-string enum object keys (<a href="https://redirect.github.com/serde-rs/json/issues/1324">#1324</a>, thanks <a href="https://github.com/puneetdixit200"><code>@puneetdixit200</code></a>)</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="a1ae73ac6a"><code>a1ae73a</code></a> Release 1.0.150</li> <li><a href="1a360b0a6c"><code>1a360b0</code></a> Merge pull request <a href="https://redirect.github.com/serde-rs/json/issues/1324">#1324</a> from puneetdixit200/reject-non-string-enum-keys</li> <li><a href="2037b634f9"><code>2037b63</code></a> Reject non-string enum object keys</li> <li><a href="5d30df60e9"><code>5d30df6</code></a> Resolve manual_assert_eq pedantic clippy lint</li> <li><a href="dc8003a88e"><code>dc8003a</code></a> Raise required compiler for preserve_order feature to 1.85</li> <li><a href="a42fa980f8"><code>a42fa98</code></a> Unpin CI miri toolchain</li> <li><a href="684a60eba1"><code>684a60e</code></a> Pin CI miri to nightly-2026-02-11</li> <li><a href="7c7da3302b"><code>7c7da33</code></a> Raise required compiler to Rust 1.71</li> <li><a href="acf4850e29"><code>acf4850</code></a> Simplify Number::is_f64</li> <li><a href="6b8ceab565"><code>6b8ceab</code></a> Resolve unnecessary_map_or clippy lint</li> <li>Additional commits viewable in <a href="https://github.com/serde-rs/json/compare/v1.0.149...v1.0.150">compare view</a></li> </ul> </details> <br /> Updates `http` from 1.4.0 to 1.4.1 <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/hyperium/http/releases">http's releases</a>.</em></p> <blockquote> <h2>v1.4.1</h2> <h2>tl;dr</h2> <ul> <li>Fix <code>PathAndQuery::from_static()</code> and <code>from_shared()</code> to reject inputs that do not start with <code>/</code>.</li> <li>Fix <code>Extend</code> for <code>HeaderMap</code> to clamp max size hint and not overflow.</li> <li>Fix <code>header::IntoIter</code> that could use-after-free if the generic value type could panic on drop.</li> <li>Fix <code>header::{IterMut, ValuesIterMut}</code> to not violate stacked borrows.</li> </ul> <h2>What's Changed</h2> <ul> <li>chore(header): fix clippy::assign_op_pattern by <a href="https://github.com/rxc-amzn"><code>@rxc-amzn</code></a> in <a href="https://redirect.github.com/hyperium/http/pull/806">hyperium/http#806</a></li> <li>ci: pin itoa in msrv job by <a href="https://github.com/seanmonstar"><code>@seanmonstar</code></a> in <a href="https://redirect.github.com/hyperium/http/pull/813">hyperium/http#813</a></li> <li>Remove unnecessary explicit lifetimes by <a href="https://github.com/jplatte"><code>@jplatte</code></a> in <a href="https://redirect.github.com/hyperium/http/pull/815">hyperium/http#815</a></li> <li>chore(ci): update to actions/checkout@v6 by <a href="https://github.com/tottoto"><code>@tottoto</code></a> in <a href="https://redirect.github.com/hyperium/http/pull/819">hyperium/http#819</a></li> <li>tests: update to rand 0.10 by <a href="https://github.com/tottoto"><code>@tottoto</code></a> in <a href="https://redirect.github.com/hyperium/http/pull/818">hyperium/http#818</a></li> <li>refactor: Remove usage of float instruction by <a href="https://github.com/AurelienFT"><code>@AurelienFT</code></a> in <a href="https://redirect.github.com/hyperium/http/pull/823">hyperium/http#823</a></li> <li>refactor(uri): consolidate PathAndQuery::from_shared and from_static by <a href="https://github.com/seanmonstar"><code>@seanmonstar</code></a> in <a href="https://redirect.github.com/hyperium/http/pull/825">hyperium/http#825</a></li> <li>fix(uri): reject Path::from_shared/from_static if doesn't start with slash by <a href="https://github.com/seanmonstar"><code>@seanmonstar</code></a> in <a href="https://redirect.github.com/hyperium/http/pull/826">hyperium/http#826</a></li> <li>Rephrase comment by <a href="https://github.com/daalfox"><code>@daalfox</code></a> in <a href="https://redirect.github.com/hyperium/http/pull/827">hyperium/http#827</a></li> <li>Fix typo in request builder docs by <a href="https://github.com/vleksis"><code>@vleksis</code></a> in <a href="https://redirect.github.com/hyperium/http/pull/831">hyperium/http#831</a></li> <li>fix: clamp Extend size hint so HeaderMap reserve cannot overflow by <a href="https://github.com/SAY-5"><code>@SAY-5</code></a> in <a href="https://redirect.github.com/hyperium/http/pull/833">hyperium/http#833</a></li> <li>fix(headers): fix stacked borrows for IterMut/ValuesIterMut by <a href="https://github.com/seanmonstar"><code>@seanmonstar</code></a> in <a href="https://redirect.github.com/hyperium/http/pull/837">hyperium/http#837</a></li> <li>fix(header): use a set_len guard in IntoIter drop by <a href="https://github.com/seanmonstar"><code>@seanmonstar</code></a> in <a href="https://redirect.github.com/hyperium/http/pull/838">hyperium/http#838</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/rxc-amzn"><code>@rxc-amzn</code></a> made their first contribution in <a href="https://redirect.github.com/hyperium/http/pull/806">hyperium/http#806</a></li> <li><a href="https://github.com/AurelienFT"><code>@AurelienFT</code></a> made their first contribution in <a href="https://redirect.github.com/hyperium/http/pull/823">hyperium/http#823</a></li> <li><a href="https://github.com/daalfox"><code>@daalfox</code></a> made their first contribution in <a href="https://redirect.github.com/hyperium/http/pull/827">hyperium/http#827</a></li> <li><a href="https://github.com/vleksis"><code>@vleksis</code></a> made their first contribution in <a href="https://redirect.github.com/hyperium/http/pull/831">hyperium/http#831</a></li> <li><a href="https://github.com/SAY-5"><code>@SAY-5</code></a> made their first contribution in <a href="https://redirect.github.com/hyperium/http/pull/833">hyperium/http#833</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/hyperium/http/compare/v1.4.0...v1.4.1">https://github.com/hyperium/http/compare/v1.4.0...v1.4.1</a></p> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/hyperium/http/blob/master/CHANGELOG.md">http's changelog</a>.</em></p> <blockquote> <h1>1.4.1 (May 25, 2026)</h1> <ul> <li>Fix <code>PathAndQuery::from_static()</code> and <code>from_shared()</code> to reject inputs that do not start with <code>/</code>.</li> <li>Fix <code>Extend</code> for <code>HeaderMap</code> to clamp max size hint and not overflow.</li> <li>Fix <code>header::IntoIter</code> that could use-after-free if the generic value type could panic on drop.</li> <li>Fix <code>header::{IterMut, ValuesIterMut}</code> to not violate stacked borrows.</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="a24c968ba3"><code>a24c968</code></a> v1.4.1</li> <li><a href="bc3b0441be"><code>bc3b044</code></a> fix(header): use a set_len guard in IntoIter drop (<a href="https://redirect.github.com/hyperium/http/issues/838">#838</a>)</li> <li><a href="1b968dc519"><code>1b968dc</code></a> fix(header): fix stacked borrows for IterMut/ValuesIterMut (<a href="https://redirect.github.com/hyperium/http/issues/837">#837</a>)</li> <li><a href="6e2dd42a15"><code>6e2dd42</code></a> fix: clamp Extend size hint so HeaderMap reserve cannot overflow (<a href="https://redirect.github.com/hyperium/http/issues/833">#833</a>)</li> <li><a href="68e0abb052"><code>68e0abb</code></a> docs: fix typo in request builder docs (<a href="https://redirect.github.com/hyperium/http/issues/831">#831</a>)</li> <li><a href="29dd307b3e"><code>29dd307</code></a> docs(extensions): rephrase internal comment (<a href="https://redirect.github.com/hyperium/http/issues/827">#827</a>)</li> <li><a href="ae48fb55b0"><code>ae48fb5</code></a> fix(uri): reject Path::from_shared/from_static if doesn't start with slash (#...</li> <li><a href="1ad200ec4c"><code>1ad200e</code></a> refactor(uri): consolidate PathAndQuery::from_shared and from_static (<a href="https://redirect.github.com/hyperium/http/issues/825">#825</a>)</li> <li><a href="d59d939f92"><code>d59d939</code></a> refactor: Remove usage of float instruction (<a href="https://redirect.github.com/hyperium/http/issues/823">#823</a>)</li> <li><a href="ed680c4d90"><code>ed680c4</code></a> tests: update to rand 0.10 (<a href="https://redirect.github.com/hyperium/http/issues/818">#818</a>)</li> <li>Additional commits viewable in <a href="https://github.com/hyperium/http/compare/v1.4.0...v1.4.1">compare view</a></li> </ul> </details> <br /> Updates `uuid` from 1.23.1 to 1.23.2 <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/uuid-rs/uuid/releases">uuid's releases</a>.</em></p> <blockquote> <h2>v1.23.2</h2> <h2>What's Changed</h2> <ul> <li>Improve error messages for ambiguous formats by <a href="https://github.com/KodrAus"><code>@KodrAus</code></a> in <a href="https://redirect.github.com/uuid-rs/uuid/pull/882">uuid-rs/uuid#882</a></li> <li>Prepare for 1.23.2 release by <a href="https://github.com/KodrAus"><code>@KodrAus</code></a> in <a href="https://redirect.github.com/uuid-rs/uuid/pull/883">uuid-rs/uuid#883</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/uuid-rs/uuid/compare/v1.23.1...v1.23.2">https://github.com/uuid-rs/uuid/compare/v1.23.1...v1.23.2</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="d11965705f"><code>d119657</code></a> Merge pull request <a href="https://redirect.github.com/uuid-rs/uuid/issues/883">#883</a> from uuid-rs/cargo/v1.23.2</li> <li><a href="0651cfcb89"><code>0651cfc</code></a> prepare for 1.23.2 release</li> <li><a href="e8dea0c1fd"><code>e8dea0c</code></a> Merge pull request <a href="https://redirect.github.com/uuid-rs/uuid/issues/882">#882</a> from uuid-rs/fix/error-msgs</li> <li><a href="bdc429a8c7"><code>bdc429a</code></a> fix up serde messages</li> <li><a href="d4342e400d"><code>d4342e4</code></a> make indexes 0 based and fix up more error messages</li> <li><a href="4ad479fc20"><code>4ad479f</code></a> work on more accurate parser errors</li> <li>See full diff in <a href="https://github.com/uuid-rs/uuid/compare/v1.23.1...v1.23.2">compare view</a></li> </ul> </details> <br /> Updates `aws-smithy-runtime` from 1.11.1 to 1.11.3 <details> <summary>Commits</summary> <ul> <li>See full diff in <a href="https://github.com/smithy-lang/smithy-rs/commits">compare view</a></li> </ul> </details> <br /> Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore <dependency name> major version` will close this group update PR and stop Dependabot creating any more for the specific dependency's major version (unless you unignore this specific dependency's major version or upgrade to it yourself) - `@dependabot ignore <dependency name> minor version` will close this group update PR and stop Dependabot creating any more for the specific dependency's minor version (unless you unignore this specific dependency's minor version or upgrade to it yourself) - `@dependabot ignore <dependency name>` will close this group update PR and stop Dependabot creating any more for the specific dependency (unless you unignore this specific dependency or upgrade to it yourself) - `@dependabot unignore <dependency name>` will remove all of the ignore conditions of the specified dependency - `@dependabot unignore <dependency name> <ignore condition>` will remove the ignore condition of the specified dependency and ignore conditions </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-06-01 19:30:45 +00:00 · 2026-06-01 18:29:58 +08:00 · 2026-05-30 19:22:50 -07:00 · 2026-05-30 00:18:22 +08:00 · 2026-05-29 08:48:11 -07:00 · 2026-05-29 08:21:45 -07:00
39 changed files with 3340 additions and 507 deletions
--- a/.github/workflows/codex-update-lance-dependency.yml
+++ b/.github/workflows/codex-update-lance-dependency.yml
@@ -4,14 +4,16 @@ on:
  workflow_call:
    inputs:
      tag:
-        description: "Tag name from Lance"
-        required: true
+        description: "Tag name from Lance. If omitted, the skill will use the latest Lance release that needs an update."
+        required: false
+        default: ""
        type: string
  workflow_dispatch:
    inputs:
      tag:
-        description: "Tag name from Lance"
-        required: true
+        description: "Tag name from Lance. Leave empty to use the latest Lance release that needs an update."
+        required: false
+        default: ""
        type: string

 permissions:
@@ -25,7 +27,7 @@ jobs:
    steps:
      - name: Show inputs
        run: |
-          echo "tag = ${{ inputs.tag }}"
+          echo "tag = ${{ inputs.tag || 'latest' }}"

      - name: Checkout Repo LanceDB
        uses: actions/checkout@v4
@@ -71,65 +73,22 @@ jobs:
          OPENAI_API_KEY: ${{ secrets.CODEX_TOKEN }}
        run: |
          set -euo pipefail
-          VERSION="${TAG#refs/tags/}"
-          VERSION="${VERSION#v}"
-          BRANCH_NAME="codex/update-lance-${VERSION//[^a-zA-Z0-9]/-}"
-
-          # Use "chore" for beta/rc versions, "feat" for stable releases
-          if [[ "${VERSION}" == *beta* ]] || [[ "${VERSION}" == *rc* ]]; then
-            COMMIT_TYPE="chore"
-          else
-            COMMIT_TYPE="feat"
-          fi
+          TARGET_TAG="${TAG:-latest}"

          cat <<EOF >/tmp/codex-prompt.txt
-          You are running inside the lancedb repository on a GitHub Actions runner. Update the Lance dependency to version ${VERSION} and prepare a pull request for maintainers to review.
+          You are running inside the lancedb repository on a GitHub Actions runner.

-          Follow these steps exactly:
-          1. Use script "ci/set_lance_version.py" to update Lance Rust dependencies. The script already refreshes Cargo metadata, so allow it to finish even if it takes time.
-          2. Update the Java lance-core dependency version in "java/pom.xml": change the "<lance-core.version>...</lance-core.version>" property to "${VERSION}".
-          3. Run "cargo clippy --workspace --tests --all-features -- -D warnings". If diagnostics appear, fix them yourself and rerun clippy until it exits cleanly. Do not skip any warnings.
-          4. After clippy succeeds, run "cargo fmt --all" to format the workspace.
-          5. Ensure the repository is clean except for intentional changes. Inspect "git status --short" and "git diff" to confirm the dependency update and any required fixes.
-          6. Create and switch to a new branch named "${BRANCH_NAME}" (replace any duplicated hyphens if necessary).
-          7. Stage all relevant files with "git add -A". Commit using the message "${COMMIT_TYPE}: update lance dependency to v${VERSION}".
-          8. Push the branch to origin. If the remote branch already exists, delete it first with "gh api -X DELETE repos/lancedb/lancedb/git/refs/heads/${BRANCH_NAME}" then push with "git push origin ${BRANCH_NAME}". Do NOT use "git push --force" or "git push -f".
-          9. env "GH_TOKEN" is available, use "gh" tools for github related operations like creating pull request.
-          10. Create a pull request targeting "main" with title "${COMMIT_TYPE}: update lance dependency to v${VERSION}". First, write the PR body to /tmp/pr-body.md using a heredoc (cat <<'EOF' > /tmp/pr-body.md). The body should summarize the dependency bump, clippy/fmt verification, and link the triggering tag (${TAG}). Then run "gh pr create --body-file /tmp/pr-body.md".
-          11. After creating the PR, display the PR URL, "git status --short", and a concise summary of the commands run and their results.
+          Read and use the in-repository skill at "skills/lancedb-update-lance-dependency/SKILL.md".
+          Update LanceDB's Lance dependency for target "${TARGET_TAG}" and create a pull request targeting "main" if an update is needed.

          Constraints:
-          - Use bash commands; avoid modifying GitHub workflow files other than through the scripted task above.
-          - Do not merge the PR.
-          - If any command fails, diagnose and fix the issue instead of aborting.
+          - Use env "GH_TOKEN" for GitHub operations.
+          - Do not merge the pull request.
+          - Do not force-push.
+          - Do not create a duplicate pull request if an open PR already exists for the target Lance version.
+          - If any command fails, diagnose and fix the root cause instead of aborting.
+          - After creating the PR, display the PR URL, "git status --short", and a concise summary of the commands run and their results.
          EOF

          printenv OPENAI_API_KEY | codex login --with-api-key
          codex --config shell_environment_policy.ignore_default_excludes=true exec --dangerously-bypass-approvals-and-sandbox "$(cat /tmp/codex-prompt.txt)"
-
-      - name: Trigger sophon dependency update
-        env:
-          TAG: ${{ inputs.tag }}
-          GH_TOKEN: ${{ secrets.ROBOT_TOKEN }}
-        run: |
-          set -euo pipefail
-          VERSION="${TAG#refs/tags/}"
-          VERSION="${VERSION#v}"
-          LANCEDB_BRANCH="codex/update-lance-${VERSION//[^a-zA-Z0-9]/-}"
-
-          echo "Triggering sophon workflow with:"
-          echo "  lance_ref: ${TAG#refs/tags/}"
-          echo "  lancedb_ref: ${LANCEDB_BRANCH}"
-
-          gh workflow run codex-bump-lancedb-lance.yml \
-            --repo lancedb/sophon \
-            -f lance_ref="${TAG#refs/tags/}" \
-            -f lancedb_ref="${LANCEDB_BRANCH}"
-
-      - name: Show latest sophon workflow run
-        env:
-          GH_TOKEN: ${{ secrets.ROBOT_TOKEN }}
-        run: |
-          set -euo pipefail
-          echo "Latest sophon workflow run:"
-          gh run list --repo lancedb/sophon --workflow codex-bump-lancedb-lance.yml --limit 1 --json databaseId,url,displayTitle
--- a/.github/workflows/lance-release-timer.yml
+++ b/.github/workflows/lance-release-timer.yml
@@ -1,62 +0,0 @@
-name: Lance Release Timer
-
-on:
-  schedule:
-    - cron: "*/10 * * * *"
-  workflow_dispatch:
-
-permissions:
-  contents: read
-  actions: write
-
-concurrency:
-  group: lance-release-timer
-  cancel-in-progress: false
-
-jobs:
-  trigger-update:
-    runs-on: ubuntu-latest
-    steps:
-      - name: Checkout repository
-        uses: actions/checkout@v4
-
-      - name: Check for new Lance tag
-        id: check
-        env:
-          GH_TOKEN: ${{ secrets.ROBOT_TOKEN }}
-        run: |
-          python3 ci/check_lance_release.py --github-output "$GITHUB_OUTPUT"
-
-      - name: Look for existing PR
-        if: steps.check.outputs.needs_update == 'true'
-        id: pr
-        env:
-          GH_TOKEN: ${{ secrets.ROBOT_TOKEN }}
-        run: |
-          set -euo pipefail
-          TITLE="chore: update lance dependency to v${{ steps.check.outputs.latest_version }}"
-          COUNT=$(gh pr list --search "\"$TITLE\" in:title" --state open --limit 1 --json number --jq 'length')
-          if [ "$COUNT" -gt 0 ]; then
-            echo "Open PR already exists for $TITLE"
-            echo "pr_exists=true" >> "$GITHUB_OUTPUT"
-          else
-            echo "No existing PR for $TITLE"
-            echo "pr_exists=false" >> "$GITHUB_OUTPUT"
-          fi
-
-      - name: Trigger codex update workflow
-        if: steps.check.outputs.needs_update == 'true' && steps.pr.outputs.pr_exists != 'true'
-        env:
-          GH_TOKEN: ${{ secrets.ROBOT_TOKEN }}
-        run: |
-          set -euo pipefail
-          TAG=${{ steps.check.outputs.latest_tag }}
-          gh workflow run codex-update-lance-dependency.yml -f tag=refs/tags/$TAG
-
-      - name: Show latest codex workflow run
-        if: steps.check.outputs.needs_update == 'true' && steps.pr.outputs.pr_exists != 'true'
-        env:
-          GH_TOKEN: ${{ secrets.ROBOT_TOKEN }}
-        run: |
-          set -euo pipefail
-          gh run list --workflow codex-update-lance-dependency.yml --limit 1 --json databaseId,url,displayTitle
--- a/Cargo.lock
+++ b/Cargo.lock
@@ -568,7 +568,7 @@ dependencies = [
 "bytes",
 "fastrand",
 "hex",
- "http 1.4.0",
+ "http 1.4.1",
 "sha1 0.10.6",
 "time",
 "tokio",
@@ -631,7 +631,7 @@ dependencies = [
 "bytes-utils",
 "fastrand",
 "http 0.2.12",
- "http 1.4.0",
+ "http 1.4.1",
 "http-body 0.4.6",
 "http-body 1.0.1",
 "percent-encoding",
@@ -661,7 +661,7 @@ dependencies = [
 "bytes",
 "fastrand",
 "http 0.2.12",
- "http 1.4.0",
+ "http 1.4.1",
 "http-body-util",
 "regex-lite",
 "tracing",
@@ -686,7 +686,7 @@ dependencies = [
 "bytes",
 "fastrand",
 "http 0.2.12",
- "http 1.4.0",
+ "http 1.4.1",
 "regex-lite",
 "tracing",
 ]
@@ -710,7 +710,7 @@ dependencies = [
 "bytes",
 "fastrand",
 "http 0.2.12",
- "http 1.4.0",
+ "http 1.4.1",
 "regex-lite",
 "tracing",
 ]
@@ -740,7 +740,7 @@ dependencies = [
 "hex",
 "hmac 0.13.0",
 "http 0.2.12",
- "http 1.4.0",
+ "http 1.4.1",
 "http-body 1.0.1",
 "lru",
 "percent-encoding",
@@ -769,7 +769,7 @@ dependencies = [
 "bytes",
 "fastrand",
 "http 0.2.12",
- "http 1.4.0",
+ "http 1.4.1",
 "regex-lite",
 "tracing",
 ]
@@ -793,7 +793,7 @@ dependencies = [
 "bytes",
 "fastrand",
 "http 0.2.12",
- "http 1.4.0",
+ "http 1.4.1",
 "regex-lite",
 "tracing",
 ]
@@ -818,7 +818,7 @@ dependencies = [
 "aws-types",
 "fastrand",
 "http 0.2.12",
- "http 1.4.0",
+ "http 1.4.1",
 "regex-lite",
 "tracing",
 ]
@@ -840,7 +840,7 @@ dependencies = [
 "hex",
 "hmac 0.13.0",
 "http 0.2.12",
- "http 1.4.0",
+ "http 1.4.1",
 "p256",
 "percent-encoding",
 "ring",
@@ -873,7 +873,7 @@ dependencies = [
 "bytes",
 "crc-fast",
 "hex",
- "http 1.4.0",
+ "http 1.4.1",
 "http-body 1.0.1",
 "http-body-util",
 "md-5 0.11.0",
@@ -907,7 +907,7 @@ dependencies = [
 "bytes-utils",
 "futures-core",
 "futures-util",
- "http 1.4.0",
+ "http 1.4.1",
 "http-body 1.0.1",
 "http-body-util",
 "percent-encoding",
@@ -928,7 +928,7 @@ dependencies = [
 "h2 0.3.27",
 "h2 0.4.14",
 "http 0.2.12",
- "http 1.4.0",
+ "http 1.4.1",
 "http-body 0.4.6",
 "hyper 0.14.32",
 "hyper 1.9.0",
@@ -976,20 +976,21 @@ dependencies = [

 [[package]]
 name = "aws-smithy-runtime"
-version = "1.11.1"
+version = "1.11.3"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "0504b1ab12debb5959e5165ee5fe97dd387e7aa7ea6a477bfd7635dfe769a4f5"
+checksum = "b8e6f5caf6fea86f8c2206541ab5857cfcda9013426cdbe8fa0098b9e2d32182"
 dependencies = [
 "aws-smithy-async",
 "aws-smithy-http",
 "aws-smithy-http-client",
 "aws-smithy-observability",
 "aws-smithy-runtime-api",
+ "aws-smithy-schema",
 "aws-smithy-types",
 "bytes",
 "fastrand",
 "http 0.2.12",
- "http 1.4.0",
+ "http 1.4.1",
 "http-body 0.4.6",
 "http-body 1.0.1",
 "http-body-util",
@@ -1001,16 +1002,16 @@ dependencies = [

 [[package]]
 name = "aws-smithy-runtime-api"
-version = "1.12.0"
+version = "1.12.1"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "b71a13df6ada0aafbf21a73bdfcdf9324cfa9df77d96b8446045be3cde61b42e"
+checksum = "dc117c179ecf39a62a0a3f49f600e9ac26a7ad7dd172177999f83933af776c32"
 dependencies = [
 "aws-smithy-async",
 "aws-smithy-runtime-api-macros",
 "aws-smithy-types",
 "bytes",
 "http 0.2.12",
- "http 1.4.0",
+ "http 1.4.1",
 "pin-project-lite",
 "tokio",
 "tracing",
@@ -1029,17 +1030,28 @@ dependencies = [
 ]

 [[package]]
-name = "aws-smithy-types"
-version = "1.4.7"
+name = "aws-smithy-schema"
+version = "0.1.0"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "9d73dbfbaa8e4bc57b9045137680b958d274823509a360abfd8e1d514d40c95c"
+checksum = "7442cb268338f0eb8278140a107c046756aa01093d8ef5e99628d34ae09c94f5"
+dependencies = [
+ "aws-smithy-runtime-api",
+ "aws-smithy-types",
+ "http 1.4.1",
+]
+
+[[package]]
+name = "aws-smithy-types"
+version = "1.4.8"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "056b66dbce2f81cc0c1e2b05bb402eb58f8a3530479d650efadd5bbae9a4050b"
 dependencies = [
 "base64-simd",
 "bytes",
 "bytes-utils",
 "futures-core",
 "http 0.2.12",
- "http 1.4.0",
+ "http 1.4.1",
 "http-body 0.4.6",
 "http-body 1.0.1",
 "http-body-util",
@@ -1087,7 +1099,7 @@ dependencies = [
 "axum-core",
 "bytes",
 "futures-util",
- "http 1.4.0",
+ "http 1.4.1",
 "http-body 1.0.1",
 "http-body-util",
 "hyper 1.9.0",
@@ -1120,7 +1132,7 @@ dependencies = [
 "async-trait",
 "bytes",
 "futures-util",
- "http 1.4.0",
+ "http 1.4.1",
 "http-body 1.0.1",
 "http-body-util",
 "mime",
@@ -1399,6 +1411,12 @@ dependencies = [
 "syn 2.0.117",
 ]

+[[package]]
+name = "bytecount"
+version = "0.6.9"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "175812e0be2bccb6abe50bb8d566126198344f707e304f45c648fd8f2cc0365e"
+
 [[package]]
 name = "bytemuck"
 version = "1.25.0"
@@ -1522,9 +1540,9 @@ dependencies = [

 [[package]]
 name = "cedarwood"
-version = "0.4.6"
+version = "0.5.0"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "6d910bedd62c24733263d0bed247460853c9d22e8956bd4cd964302095e04e90"
+checksum = "c0524a528a6a0288df1863c3c20fe92c301875b4941e7b6c4b394ab08c5a4c55"
 dependencies = [
 "smallvec",
 ]
@@ -3284,8 +3302,8 @@ checksum = "42703706b716c37f96a77aea830392ad231f44c9e9a67872fa5548707e11b11c"

 [[package]]
 name = "fsst"
-version = "7.1.0-beta.4"
-source = "git+https://github.com/lance-format/lance.git?tag=v7.1.0-beta.4#0c0b3e18c0a4c75bda1dd6ec9d6247ef75bd29d9"
+version = "7.2.0-beta.1"
+source = "git+https://github.com/lance-format/lance.git?tag=v7.2.0-beta.1#b9995aba6115e8e4bc43179a45cbd0f9a170f305"
 dependencies = [
 "arrow-array",
 "rand 0.9.4",
@@ -3675,7 +3693,7 @@ dependencies = [
 "fnv",
 "futures-core",
 "futures-sink",
- "http 1.4.0",
+ "http 1.4.1",
 "indexmap 2.14.0",
 "slab",
 "tokio",
@@ -3781,7 +3799,7 @@ checksum = "629d8f3bbeda9d148036d6b0de0a3ab947abd08ce90626327fc3547a49d59d97"
 dependencies = [
 "dirs",
 "futures",
- "http 1.4.0",
+ "http 1.4.1",
 "indicatif",
 "libc",
 "log",
@@ -3804,7 +3822,7 @@ checksum = "430b33fa84f92796d4d263070b6c0d3ca219df7b9a0e1853ee431029b1612bcd"
 dependencies = [
 "async-trait",
 "bytes",
- "http 1.4.0",
+ "http 1.4.1",
 "more-asserts",
 "serde",
 "thiserror 2.0.18",
@@ -3858,9 +3876,9 @@ dependencies = [

 [[package]]
 name = "http"
-version = "1.4.0"
+version = "1.4.1"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "e3ba2a386d7f85a81f119ad7498ebe444d2e22c2af0b86b069416ace48b3311a"
+checksum = "8be7462df143984c4598a256ef469b251d7d7f9e271135073e78fc535414f3d0"
 dependencies = [
 "bytes",
 "itoa",
@@ -3884,7 +3902,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "1efedce1fb8e6913f23e0c92de8e62cd5b772a67e7b3946df930a62566c93184"
 dependencies = [
 "bytes",
- "http 1.4.0",
+ "http 1.4.1",
 ]

 [[package]]
@@ -3895,7 +3913,7 @@ checksum = "b021d93e26becf5dc7e1b75b1bed1fd93124b374ceb73f43d4d4eafec896a64a"
 dependencies = [
 "bytes",
 "futures-core",
- "http 1.4.0",
+ "http 1.4.1",
 "http-body 1.0.1",
 "pin-project-lite",
 ]
@@ -3962,7 +3980,7 @@ dependencies = [
 "futures-channel",
 "futures-core",
 "h2 0.4.14",
- "http 1.4.0",
+ "http 1.4.1",
 "http-body 1.0.1",
 "httparse",
 "httpdate",
@@ -3994,7 +4012,7 @@ version = "0.27.9"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "33ca68d021ef39cf6463ab54c1d0f5daf03377b70561305bb89a8f83aab66e0f"
 dependencies = [
- "http 1.4.0",
+ "http 1.4.1",
 "hyper 1.9.0",
 "hyper-util",
 "rustls 0.23.40",
@@ -4015,7 +4033,7 @@ dependencies = [
 "bytes",
 "futures-channel",
 "futures-util",
- "http 1.4.0",
+ "http 1.4.1",
 "http-body 1.0.1",
 "hyper 1.9.0",
 "ipnet",
@@ -4077,6 +4095,21 @@ dependencies = [
 "zerovec",
 ]

+[[package]]
+name = "icu_locale"
+version = "2.2.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "d5a396343c7208121dc86e35623d3dfe19814a7613cfd14964994cdc9c9a2e26"
+dependencies = [
+ "icu_collections",
+ "icu_locale_core",
+ "icu_locale_data",
+ "icu_provider",
+ "potential_utf",
+ "tinystr",
+ "zerovec",
+]
+
 [[package]]
 name = "icu_locale_core"
 version = "2.2.0"
@@ -4085,11 +4118,18 @@ checksum = "92219b62b3e2b4d88ac5119f8904c10f8f61bf7e95b640d25ba3075e6cac2c29"
 dependencies = [
 "displaydoc",
 "litemap",
+ "serde",
 "tinystr",
 "writeable",
 "zerovec",
 ]

+[[package]]
+name = "icu_locale_data"
+version = "2.2.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "d5fdcc9ac77c6d74ff5cf6e65ef3181d6af32003b16fce3a77fb451d2f695993"
+
 [[package]]
 name = "icu_normalizer"
 version = "2.2.0"
@@ -4138,6 +4178,8 @@ checksum = "139c4cf31c8b5f33d7e199446eff9c1e02decfc2f0eec2c8d71f65befa45b421"
 dependencies = [
 "displaydoc",
 "icu_locale_core",
+ "serde",
+ "stable_deref_trait",
 "writeable",
 "yoke",
 "zerofrom",
@@ -4145,6 +4187,27 @@ dependencies = [
 "zerovec",
 ]

+[[package]]
+name = "icu_segmenter"
+version = "2.2.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "5c0794db0b1a86193ac9c48768d0e6c52c54448e0870ad87907d456ee0dac964"
+dependencies = [
+ "icu_collections",
+ "icu_locale",
+ "icu_provider",
+ "icu_segmenter_data",
+ "potential_utf",
+ "utf8_iter",
+ "zerovec",
+]
+
+[[package]]
+name = "icu_segmenter_data"
+version = "2.2.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "e4a2c462a4d927d512f5f882a033ddd62f33a05bb9f230d98f736ac3dc85938f"
+
 [[package]]
 name = "id-arena"
 version = "2.3.0"
@@ -4306,19 +4369,20 @@ checksum = "9028f49264629065d057f340a86acb84867925865f73bbf8d47b4d149a7e88b8"

 [[package]]
 name = "jieba-macros"
-version = "0.9.0"
+version = "0.10.1"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "a29cfc5dcd898604c6f80363411fa6b6b08e27d1d253d6225b9cb6702ea02fc0"
+checksum = "46adade69b634535a8f495cf87710ed893cff53e1dbc9dd750c2ab81c5defb82"
 dependencies = [
 "phf_codegen 0.13.1",
 ]

 [[package]]
 name = "jieba-rs"
-version = "0.9.0"
+version = "0.10.1"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "3245d6e9d1d5facbd6a23848d6b67e3439738ccbb4fa5a3d65da315ba1a910a2"
+checksum = "11b53580aaa8ec8b713da271da434f8947409242c537a9ab3f7b76bdbb19e8a9"
 dependencies = [
+ "bytecount",
 "cedarwood",
 "jieba-macros",
 "phf 0.13.1",
@@ -4506,8 +4570,8 @@ checksum = "e037a2e1d8d5fdbd49b16a4ea09d5d6401c1f29eca5ff29d03d3824dba16256a"

 [[package]]
 name = "lance"
-version = "7.1.0-beta.4"
-source = "git+https://github.com/lance-format/lance.git?tag=v7.1.0-beta.4#0c0b3e18c0a4c75bda1dd6ec9d6247ef75bd29d9"
+version = "7.2.0-beta.1"
+source = "git+https://github.com/lance-format/lance.git?tag=v7.2.0-beta.1#b9995aba6115e8e4bc43179a45cbd0f9a170f305"
 dependencies = [
 "arc-swap",
 "arrow",
@@ -4580,8 +4644,8 @@ dependencies = [

 [[package]]
 name = "lance-arrow"
-version = "7.1.0-beta.4"
-source = "git+https://github.com/lance-format/lance.git?tag=v7.1.0-beta.4#0c0b3e18c0a4c75bda1dd6ec9d6247ef75bd29d9"
+version = "7.2.0-beta.1"
+source = "git+https://github.com/lance-format/lance.git?tag=v7.2.0-beta.1#b9995aba6115e8e4bc43179a45cbd0f9a170f305"
 dependencies = [
 "arrow-array",
 "arrow-buffer",
@@ -4599,10 +4663,34 @@ dependencies = [
 "rand 0.9.4",
 ]

+[[package]]
+name = "lance-arrow-scalar"
+version = "58.0.0"
+source = "git+https://github.com/lance-format/lance.git?tag=v7.2.0-beta.1#b9995aba6115e8e4bc43179a45cbd0f9a170f305"
+dependencies = [
+ "arrow-array",
+ "arrow-buffer",
+ "arrow-cast",
+ "arrow-data",
+ "arrow-row",
+ "arrow-schema",
+ "half",
+]
+
+[[package]]
+name = "lance-arrow-stats"
+version = "58.0.0"
+source = "git+https://github.com/lance-format/lance.git?tag=v7.2.0-beta.1#b9995aba6115e8e4bc43179a45cbd0f9a170f305"
+dependencies = [
+ "arrow-array",
+ "arrow-schema",
+ "lance-arrow-scalar",
+]
+
 [[package]]
 name = "lance-bitpacking"
-version = "7.1.0-beta.4"
-source = "git+https://github.com/lance-format/lance.git?tag=v7.1.0-beta.4#0c0b3e18c0a4c75bda1dd6ec9d6247ef75bd29d9"
+version = "7.2.0-beta.1"
+source = "git+https://github.com/lance-format/lance.git?tag=v7.2.0-beta.1#b9995aba6115e8e4bc43179a45cbd0f9a170f305"
 dependencies = [
 "arrayref",
 "paste",
@@ -4611,8 +4699,8 @@ dependencies = [

 [[package]]
 name = "lance-core"
-version = "7.1.0-beta.4"
-source = "git+https://github.com/lance-format/lance.git?tag=v7.1.0-beta.4#0c0b3e18c0a4c75bda1dd6ec9d6247ef75bd29d9"
+version = "7.2.0-beta.1"
+source = "git+https://github.com/lance-format/lance.git?tag=v7.2.0-beta.1#b9995aba6115e8e4bc43179a45cbd0f9a170f305"
 dependencies = [
 "arrow-array",
 "arrow-buffer",
@@ -4647,8 +4735,8 @@ dependencies = [

 [[package]]
 name = "lance-datafusion"
-version = "7.1.0-beta.4"
-source = "git+https://github.com/lance-format/lance.git?tag=v7.1.0-beta.4#0c0b3e18c0a4c75bda1dd6ec9d6247ef75bd29d9"
+version = "7.2.0-beta.1"
+source = "git+https://github.com/lance-format/lance.git?tag=v7.2.0-beta.1#b9995aba6115e8e4bc43179a45cbd0f9a170f305"
 dependencies = [
 "arrow",
 "arrow-array",
@@ -4678,8 +4766,8 @@ dependencies = [

 [[package]]
 name = "lance-datagen"
-version = "7.1.0-beta.4"
-source = "git+https://github.com/lance-format/lance.git?tag=v7.1.0-beta.4#0c0b3e18c0a4c75bda1dd6ec9d6247ef75bd29d9"
+version = "7.2.0-beta.1"
+source = "git+https://github.com/lance-format/lance.git?tag=v7.2.0-beta.1#b9995aba6115e8e4bc43179a45cbd0f9a170f305"
 dependencies = [
 "arrow",
 "arrow-array",
@@ -4697,8 +4785,8 @@ dependencies = [

 [[package]]
 name = "lance-encoding"
-version = "7.1.0-beta.4"
-source = "git+https://github.com/lance-format/lance.git?tag=v7.1.0-beta.4#0c0b3e18c0a4c75bda1dd6ec9d6247ef75bd29d9"
+version = "7.2.0-beta.1"
+source = "git+https://github.com/lance-format/lance.git?tag=v7.2.0-beta.1#b9995aba6115e8e4bc43179a45cbd0f9a170f305"
 dependencies = [
 "arrow-arith",
 "arrow-array",
@@ -4733,8 +4821,8 @@ dependencies = [

 [[package]]
 name = "lance-file"
-version = "7.1.0-beta.4"
-source = "git+https://github.com/lance-format/lance.git?tag=v7.1.0-beta.4#0c0b3e18c0a4c75bda1dd6ec9d6247ef75bd29d9"
+version = "7.2.0-beta.1"
+source = "git+https://github.com/lance-format/lance.git?tag=v7.2.0-beta.1#b9995aba6115e8e4bc43179a45cbd0f9a170f305"
 dependencies = [
 "arrow-arith",
 "arrow-array",
@@ -4765,8 +4853,8 @@ dependencies = [

 [[package]]
 name = "lance-index"
-version = "7.1.0-beta.4"
-source = "git+https://github.com/lance-format/lance.git?tag=v7.1.0-beta.4#0c0b3e18c0a4c75bda1dd6ec9d6247ef75bd29d9"
+version = "7.2.0-beta.1"
+source = "git+https://github.com/lance-format/lance.git?tag=v7.2.0-beta.1#b9995aba6115e8e4bc43179a45cbd0f9a170f305"
 dependencies = [
 "arc-swap",
 "arrow",
@@ -4796,6 +4884,7 @@ dependencies = [
 "jieba-rs",
 "jsonb",
 "lance-arrow",
+ "lance-arrow-stats",
 "lance-core",
 "lance-datafusion",
 "lance-datagen",
@@ -4831,8 +4920,8 @@ dependencies = [

 [[package]]
 name = "lance-io"
-version = "7.1.0-beta.4"
-source = "git+https://github.com/lance-format/lance.git?tag=v7.1.0-beta.4#0c0b3e18c0a4c75bda1dd6ec9d6247ef75bd29d9"
+version = "7.2.0-beta.1"
+source = "git+https://github.com/lance-format/lance.git?tag=v7.2.0-beta.1#b9995aba6115e8e4bc43179a45cbd0f9a170f305"
 dependencies = [
 "arrow",
 "arrow-arith",
@@ -4851,7 +4940,7 @@ dependencies = [
 "chrono",
 "deepsize",
 "futures",
- "http 1.4.0",
+ "http 1.4.1",
 "io-uring",
 "lance-arrow",
 "lance-core",
@@ -4874,8 +4963,8 @@ dependencies = [

 [[package]]
 name = "lance-linalg"
-version = "7.1.0-beta.4"
-source = "git+https://github.com/lance-format/lance.git?tag=v7.1.0-beta.4#0c0b3e18c0a4c75bda1dd6ec9d6247ef75bd29d9"
+version = "7.2.0-beta.1"
+source = "git+https://github.com/lance-format/lance.git?tag=v7.2.0-beta.1#b9995aba6115e8e4bc43179a45cbd0f9a170f305"
 dependencies = [
 "arrow-array",
 "arrow-buffer",
@@ -4891,8 +4980,8 @@ dependencies = [

 [[package]]
 name = "lance-namespace"
-version = "7.1.0-beta.4"
-source = "git+https://github.com/lance-format/lance.git?tag=v7.1.0-beta.4#0c0b3e18c0a4c75bda1dd6ec9d6247ef75bd29d9"
+version = "7.2.0-beta.1"
+source = "git+https://github.com/lance-format/lance.git?tag=v7.2.0-beta.1#b9995aba6115e8e4bc43179a45cbd0f9a170f305"
 dependencies = [
 "arrow",
 "async-trait",
@@ -4904,8 +4993,8 @@ dependencies = [

 [[package]]
 name = "lance-namespace-impls"
-version = "7.1.0-beta.4"
-source = "git+https://github.com/lance-format/lance.git?tag=v7.1.0-beta.4#0c0b3e18c0a4c75bda1dd6ec9d6247ef75bd29d9"
+version = "7.2.0-beta.1"
+source = "git+https://github.com/lance-format/lance.git?tag=v7.2.0-beta.1#b9995aba6115e8e4bc43179a45cbd0f9a170f305"
 dependencies = [
 "arrow",
 "arrow-ipc",
@@ -4954,8 +5043,8 @@ dependencies = [

 [[package]]
 name = "lance-select"
-version = "7.1.0-beta.4"
-source = "git+https://github.com/lance-format/lance.git?tag=v7.1.0-beta.4#0c0b3e18c0a4c75bda1dd6ec9d6247ef75bd29d9"
+version = "7.2.0-beta.1"
+source = "git+https://github.com/lance-format/lance.git?tag=v7.2.0-beta.1#b9995aba6115e8e4bc43179a45cbd0f9a170f305"
 dependencies = [
 "arrow-array",
 "arrow-buffer",
@@ -4969,8 +5058,8 @@ dependencies = [

 [[package]]
 name = "lance-table"
-version = "7.1.0-beta.4"
-source = "git+https://github.com/lance-format/lance.git?tag=v7.1.0-beta.4#0c0b3e18c0a4c75bda1dd6ec9d6247ef75bd29d9"
+version = "7.2.0-beta.1"
+source = "git+https://github.com/lance-format/lance.git?tag=v7.2.0-beta.1#b9995aba6115e8e4bc43179a45cbd0f9a170f305"
 dependencies = [
 "arrow",
 "arrow-array",
@@ -5010,8 +5099,8 @@ dependencies = [

 [[package]]
 name = "lance-testing"
-version = "7.1.0-beta.4"
-source = "git+https://github.com/lance-format/lance.git?tag=v7.1.0-beta.4#0c0b3e18c0a4c75bda1dd6ec9d6247ef75bd29d9"
+version = "7.2.0-beta.1"
+source = "git+https://github.com/lance-format/lance.git?tag=v7.2.0-beta.1#b9995aba6115e8e4bc43179a45cbd0f9a170f305"
 dependencies = [
 "arrow-array",
 "arrow-schema",
@@ -5022,9 +5111,10 @@ dependencies = [

 [[package]]
 name = "lance-tokenizer"
-version = "7.1.0-beta.4"
-source = "git+https://github.com/lance-format/lance.git?tag=v7.1.0-beta.4#0c0b3e18c0a4c75bda1dd6ec9d6247ef75bd29d9"
+version = "7.2.0-beta.1"
+source = "git+https://github.com/lance-format/lance.git?tag=v7.2.0-beta.1#b9995aba6115e8e4bc43179a45cbd0f9a170f305"
 dependencies = [
+ "icu_segmenter",
 "jieba-rs",
 "lindera",
 "rust-stemmers",
@@ -5071,7 +5161,7 @@ dependencies = [
 "futures",
 "half",
 "hf-hub",
- "http 1.4.0",
+ "http 1.4.1",
 "http-body 1.0.1",
 "lance",
 "lance-arrow",
@@ -5351,9 +5441,9 @@ dependencies = [

 [[package]]
 name = "log"
-version = "0.4.29"
+version = "0.4.30"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "5e5032e24019045c762d3c0f28f5b6b8bbf38563a65908389bf7978758920897"
+checksum = "616ec5685824bcc94416c6d4a7a446eea774a31efd7062c8480ba6fd06d7a6e5"

 [[package]]
 name = "loom"
@@ -5923,7 +6013,7 @@ dependencies = [
 "futures-channel",
 "futures-core",
 "futures-util",
- "http 1.4.0",
+ "http 1.4.1",
 "http-body-util",
 "httparse",
 "humantime",
@@ -6036,7 +6126,7 @@ dependencies = [
 "base64 0.22.1",
 "bytes",
 "futures",
- "http 1.4.0",
+ "http 1.4.1",
 "http-body 1.0.1",
 "jiff",
 "log",
@@ -6061,7 +6151,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "048b1b29c503263bdd80a9afe46a68cd02ea9bd361185b1feab4b151078998e9"
 dependencies = [
 "futures",
- "http 1.4.0",
+ "http 1.4.1",
 "mea",
 "opendal-core",
 ]
@@ -6105,7 +6195,7 @@ checksum = "7452bf3ec61cfd81ac9ad9ada17825931e9e371d44a045c6bfab9596c0a2ac3b"
 dependencies = [
 "base64 0.22.1",
 "bytes",
- "http 1.4.0",
+ "http 1.4.1",
 "log",
 "opendal-core",
 "opendal-service-azure-common",
@@ -6125,7 +6215,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "8f9884c2d8cf8ba2bb077d79c877dac5863ba3bab9e2c9c1e41a2e0491404772"
 dependencies = [
 "bytes",
- "http 1.4.0",
+ "http 1.4.1",
 "log",
 "opendal-core",
 "opendal-service-azure-common",
@@ -6143,7 +6233,7 @@ version = "0.56.0"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "ffb0e45d6c8dcf66ce2da20e241bcb80e6e540e109a4ff20f318f6c9b4c54e0c"
 dependencies = [
- "http 1.4.0",
+ "http 1.4.1",
 "opendal-core",
 ]

@@ -6155,7 +6245,7 @@ checksum = "70a49477a10163431896d106136117f5670717f9c9e49cf6f710528800c6633a"
 dependencies = [
 "async-trait",
 "bytes",
- "http 1.4.0",
+ "http 1.4.1",
 "log",
 "opendal-core",
 "percent-encoding",
@@ -6176,7 +6266,7 @@ checksum = "7b2ab7a2a8a11dfe257ef4db5c0de798acbcd0d6429c37382dad2154bc06a388"
 dependencies = [
 "bytes",
 "hf-xet",
- "http 1.4.0",
+ "http 1.4.1",
 "log",
 "opendal-core",
 "percent-encoding",
@@ -6192,7 +6282,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "29c8a917829ad06d21b639558532cb0101fe49b040d946d673a73018683fac05"
 dependencies = [
 "bytes",
- "http 1.4.0",
+ "http 1.4.1",
 "log",
 "opendal-core",
 "quick-xml 0.38.4",
@@ -6211,7 +6301,7 @@ dependencies = [
 "base64 0.22.1",
 "bytes",
 "crc32c",
- "http 1.4.0",
+ "http 1.4.1",
 "log",
 "md-5 0.10.6",
 "opendal-core",
@@ -6955,6 +7045,8 @@ version = "0.1.5"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "0103b1cef7ec0cf76490e969665504990193874ea05c85ff9bab8b911d0a0564"
 dependencies = [
+ "serde_core",
+ "writeable",
 "zerovec",
 ]

@@ -7618,7 +7710,7 @@ checksum = "57ac2757f3140aa2e213b554148ae0b52733e624fc6723f0cc6bb3d440176c95"
 dependencies = [
 "anyhow",
 "form_urlencoded",
- "http 1.4.0",
+ "http 1.4.1",
 "log",
 "percent-encoding",
 "reqsign-core",
@@ -7636,7 +7728,7 @@ dependencies = [
 "anyhow",
 "bytes",
 "form_urlencoded",
- "http 1.4.0",
+ "http 1.4.1",
 "log",
 "percent-encoding",
 "quick-xml 0.39.4",
@@ -7658,7 +7750,7 @@ dependencies = [
 "base64 0.22.1",
 "bytes",
 "form_urlencoded",
- "http 1.4.0",
+ "http 1.4.1",
 "jsonwebtoken",
 "log",
 "pem",
@@ -7683,7 +7775,7 @@ dependencies = [
 "futures",
 "hex",
 "hmac 0.12.1",
- "http 1.4.0",
+ "http 1.4.1",
 "jiff",
 "log",
 "percent-encoding",
@@ -7710,7 +7802,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "35cc609b49c69e76ecaceb775a03f792d1ed3e7755ab3548d4534fd801e3242e"
 dependencies = [
 "form_urlencoded",
- "http 1.4.0",
+ "http 1.4.1",
 "jsonwebtoken",
 "log",
 "percent-encoding",
@@ -7735,7 +7827,7 @@ dependencies = [
 "futures-core",
 "futures-util",
 "h2 0.4.14",
- "http 1.4.0",
+ "http 1.4.1",
 "http-body 1.0.1",
 "http-body-util",
 "hyper 1.9.0",
@@ -7779,7 +7871,7 @@ dependencies = [
 "bytes",
 "futures-core",
 "futures-util",
- "http 1.4.0",
+ "http 1.4.1",
 "http-body 1.0.1",
 "http-body-util",
 "hyper 1.9.0",
@@ -7833,7 +7925,7 @@ checksum = "199dda04a536b532d0cc04d7979e39b1c763ea749bf91507017069c00b96056f"
 dependencies = [
 "anyhow",
 "async-trait",
- "http 1.4.0",
+ "http 1.4.1",
 "reqwest 0.13.3",
 "thiserror 2.0.18",
 "tower-service",
@@ -8321,9 +8413,9 @@ dependencies = [

 [[package]]
 name = "serde_json"
-version = "1.0.149"
+version = "1.0.150"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "83fc039473c5595ace860d8c4fafa220ff474b3fc6bfdb4293327f1a37e94d86"
+checksum = "e8014e44b4736ed0538adeecded0fce2a272f22dc9578a7eb6b2d9993c74cfb9"
 dependencies = [
 "itoa",
 "memchr",
@@ -8468,6 +8560,12 @@ dependencies = [
 "digest 0.11.3",
 ]

+[[package]]
+name = "sha1_smol"
+version = "1.0.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "bbfa15b3dddfee50a0fff136974b3e1bde555604ba463834a7eb7deb6417705d"
+
 [[package]]
 name = "sha2"
 version = "0.10.9"
@@ -9187,6 +9285,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "c8323304221c2a851516f22236c5722a72eaa19749016521d6dff0824447d96d"
 dependencies = [
 "displaydoc",
+ "serde_core",
 "zerovec",
 ]

@@ -9375,7 +9474,7 @@ checksum = "1e9cd434a998747dd2c4276bc96ee2e0c7a2eadf3cae88e52be55a05fa9053f5"
 dependencies = [
 "bitflags 2.11.1",
 "bytes",
- "http 1.4.0",
+ "http 1.4.1",
 "http-body 1.0.1",
 "http-body-util",
 "pin-project-lite",
@@ -9395,7 +9494,7 @@ dependencies = [
 "bytes",
 "futures-core",
 "futures-util",
- "http 1.4.0",
+ "http 1.4.1",
 "http-body 1.0.1",
 "http-body-util",
 "pin-project-lite",
@@ -9684,13 +9783,14 @@ checksum = "06abde3611657adf66d383f00b093d7faecc7fa57071cce2578660c9f1010821"

 [[package]]
 name = "uuid"
-version = "1.23.1"
+version = "1.23.2"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "ddd74a9687298c6858e9b88ec8935ec45d22e8fd5e6394fa1bd4e99a87789c76"
+checksum = "d258b83ceec21034727ecee8c382cfa6c3e133699b0742c64571814fb420c9f7"
 dependencies = [
 "getrandom 0.4.2",
 "js-sys",
 "serde_core",
+ "sha1_smol",
 "wasm-bindgen",
 ]

@@ -10415,7 +10515,7 @@ dependencies = [
 "clap",
 "crc32fast",
 "futures",
- "http 1.4.0",
+ "http 1.4.1",
 "hyper 1.9.0",
 "lazy_static",
 "more-asserts",
@@ -10489,7 +10589,7 @@ dependencies = [
 "chrono",
 "clap",
 "gearhash",
- "http 1.4.0",
+ "http 1.4.1",
 "itertools 0.14.0",
 "lazy_static",
 "more-asserts",
@@ -10654,6 +10754,7 @@ dependencies = [
 "displaydoc",
 "yoke",
 "zerofrom",
+ "zerovec",
 ]

 [[package]]
@@ -10662,6 +10763,7 @@ version = "0.11.6"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "90f911cbc359ab6af17377d242225f4d75119aec87ea711a880987b18cd7b239"
 dependencies = [
+ "serde",
 "yoke",
 "zerofrom",
 "zerovec-derive",
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -13,20 +13,20 @@ categories = ["database-implementations"]
 rust-version = "1.91.0"

 [workspace.dependencies]
-lance = { "version" = "=7.1.0-beta.4", default-features = false, "tag" = "v7.1.0-beta.4", "git" = "https://github.com/lance-format/lance.git" }
-lance-core = { "version" = "=7.1.0-beta.4", "tag" = "v7.1.0-beta.4", "git" = "https://github.com/lance-format/lance.git" }
-lance-datagen = { "version" = "=7.1.0-beta.4", "tag" = "v7.1.0-beta.4", "git" = "https://github.com/lance-format/lance.git" }
-lance-file = { "version" = "=7.1.0-beta.4", "tag" = "v7.1.0-beta.4", "git" = "https://github.com/lance-format/lance.git" }
-lance-io = { "version" = "=7.1.0-beta.4", default-features = false, "tag" = "v7.1.0-beta.4", "git" = "https://github.com/lance-format/lance.git" }
-lance-index = { "version" = "=7.1.0-beta.4", "tag" = "v7.1.0-beta.4", "git" = "https://github.com/lance-format/lance.git" }
-lance-linalg = { "version" = "=7.1.0-beta.4", "tag" = "v7.1.0-beta.4", "git" = "https://github.com/lance-format/lance.git" }
-lance-namespace = { "version" = "=7.1.0-beta.4", "tag" = "v7.1.0-beta.4", "git" = "https://github.com/lance-format/lance.git" }
-lance-namespace-impls = { "version" = "=7.1.0-beta.4", default-features = false, "tag" = "v7.1.0-beta.4", "git" = "https://github.com/lance-format/lance.git" }
-lance-table = { "version" = "=7.1.0-beta.4", "tag" = "v7.1.0-beta.4", "git" = "https://github.com/lance-format/lance.git" }
-lance-testing = { "version" = "=7.1.0-beta.4", "tag" = "v7.1.0-beta.4", "git" = "https://github.com/lance-format/lance.git" }
-lance-datafusion = { "version" = "=7.1.0-beta.4", "tag" = "v7.1.0-beta.4", "git" = "https://github.com/lance-format/lance.git" }
-lance-encoding = { "version" = "=7.1.0-beta.4", "tag" = "v7.1.0-beta.4", "git" = "https://github.com/lance-format/lance.git" }
-lance-arrow = { "version" = "=7.1.0-beta.4", "tag" = "v7.1.0-beta.4", "git" = "https://github.com/lance-format/lance.git" }
+lance = { "version" = "=7.2.0-beta.1", default-features = false, "tag" = "v7.2.0-beta.1", "git" = "https://github.com/lance-format/lance.git" }
+lance-core = { "version" = "=7.2.0-beta.1", "tag" = "v7.2.0-beta.1", "git" = "https://github.com/lance-format/lance.git" }
+lance-datagen = { "version" = "=7.2.0-beta.1", "tag" = "v7.2.0-beta.1", "git" = "https://github.com/lance-format/lance.git" }
+lance-file = { "version" = "=7.2.0-beta.1", "tag" = "v7.2.0-beta.1", "git" = "https://github.com/lance-format/lance.git" }
+lance-io = { "version" = "=7.2.0-beta.1", default-features = false, "tag" = "v7.2.0-beta.1", "git" = "https://github.com/lance-format/lance.git" }
+lance-index = { "version" = "=7.2.0-beta.1", "tag" = "v7.2.0-beta.1", "git" = "https://github.com/lance-format/lance.git" }
+lance-linalg = { "version" = "=7.2.0-beta.1", "tag" = "v7.2.0-beta.1", "git" = "https://github.com/lance-format/lance.git" }
+lance-namespace = { "version" = "=7.2.0-beta.1", "tag" = "v7.2.0-beta.1", "git" = "https://github.com/lance-format/lance.git" }
+lance-namespace-impls = { "version" = "=7.2.0-beta.1", default-features = false, "tag" = "v7.2.0-beta.1", "git" = "https://github.com/lance-format/lance.git" }
+lance-table = { "version" = "=7.2.0-beta.1", "tag" = "v7.2.0-beta.1", "git" = "https://github.com/lance-format/lance.git" }
+lance-testing = { "version" = "=7.2.0-beta.1", "tag" = "v7.2.0-beta.1", "git" = "https://github.com/lance-format/lance.git" }
+lance-datafusion = { "version" = "=7.2.0-beta.1", "tag" = "v7.2.0-beta.1", "git" = "https://github.com/lance-format/lance.git" }
+lance-encoding = { "version" = "=7.2.0-beta.1", "tag" = "v7.2.0-beta.1", "git" = "https://github.com/lance-format/lance.git" }
+lance-arrow = { "version" = "=7.2.0-beta.1", "tag" = "v7.2.0-beta.1", "git" = "https://github.com/lance-format/lance.git" }
 ahash = "0.8"
 # Note that this one does not include pyarrow
 arrow = { version = "58.0.0", optional = false }
--- a/ci/update_lance_dependency.py
+++ b/ci/update_lance_dependency.py
@@ -0,0 +1,126 @@
+#!/usr/bin/env python3
+"""Prepare a Lance dependency update for LanceDB."""
+
+from __future__ import annotations
+
+import argparse
+import json
+import re
+import subprocess
+import sys
+from pathlib import Path
+from typing import Sequence
+
+try:
+    from check_lance_release import parse_semver
+except ModuleNotFoundError:
+    # Supports importing as ci.update_lance_dependency from tests or ad hoc checks.
+    from ci.check_lance_release import parse_semver  # type: ignore
+
+
+def normalize_version(raw: str) -> str:
+    value = raw.strip()
+    value = value.removeprefix("refs/tags/")
+    value = value.removeprefix("v")
+    try:
+        parse_semver(value)
+    except ValueError:
+        raise ValueError(f"Unsupported Lance version or tag: {raw}")
+    return value
+
+
+def normalized_tag(version: str) -> str:
+    return f"v{version}"
+
+
+def branch_name(version: str) -> str:
+    suffix = re.sub(r"[^a-zA-Z0-9]+", "-", version).strip("-")
+    suffix = re.sub(r"-+", "-", suffix)
+    return f"codex/update-lance-{suffix}"
+
+
+def commit_type(version: str) -> str:
+    prerelease = version.split("-", maxsplit=1)[1] if "-" in version else ""
+    return "chore" if "beta" in prerelease or "rc" in prerelease else "feat"
+
+
+def metadata_for(version: str) -> dict[str, str]:
+    kind = commit_type(version)
+    message = f"{kind}: update lance dependency to v{version}"
+    return {
+        "version": version,
+        "tag": normalized_tag(version),
+        "branch_name": branch_name(version),
+        "commit_type": kind,
+        "commit_message": message,
+        "pr_title": message,
+    }
+
+
+def run_command(cmd: Sequence[str], *, cwd: Path) -> None:
+    subprocess.run(cmd, cwd=cwd, check=True)
+
+
+def update_java_lance_core_version(repo_root: Path, version: str) -> None:
+    pom_path = repo_root / "java" / "pom.xml"
+    contents = pom_path.read_text(encoding="utf-8")
+    updated, count = re.subn(
+        r"(<lance-core\.version>)[^<]+(</lance-core\.version>)",
+        rf"\g<1>{version}\g<2>",
+        contents,
+        count=1,
+    )
+    if count != 1:
+        raise RuntimeError(
+            "Expected exactly one <lance-core.version> entry in java/pom.xml"
+        )
+    pom_path.write_text(updated, encoding="utf-8")
+
+
+def write_github_outputs(path: str | None, payload: dict[str, str]) -> None:
+    if not path:
+        return
+    with open(path, "a", encoding="utf-8") as output:
+        for key, value in payload.items():
+            output.write(f"{key}={value}\n")
+
+
+def main(argv: Sequence[str] | None = None) -> int:
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument(
+        "tag_or_version",
+        help="Lance tag or version, for example refs/tags/v7.2.0-beta.1 or 7.2.0",
+    )
+    parser.add_argument(
+        "--repo-root",
+        type=Path,
+        default=Path(__file__).resolve().parents[1],
+        help="Path to the lancedb repository root",
+    )
+    parser.add_argument(
+        "--github-output",
+        default=None,
+        help="Optional GitHub Actions output file to receive metadata fields",
+    )
+    parser.add_argument(
+        "--metadata-only",
+        action="store_true",
+        help="Only print derived metadata; do not modify dependency files",
+    )
+    args = parser.parse_args(argv)
+
+    repo_root = args.repo_root.resolve()
+    version = normalize_version(args.tag_or_version)
+    payload = metadata_for(version)
+
+    if not args.metadata_only:
+        run_command([sys.executable, "ci/set_lance_version.py", version], cwd=repo_root)
+        update_java_lance_core_version(repo_root, version)
+
+    write_github_outputs(args.github_output, payload)
+    print(json.dumps(payload, sort_keys=True))
+    return 0
+
+
+if __name__ == "__main__":
+    sys.exit(main())
--- a/docs/src/js/classes/MergeInsertBuilder.md
+++ b/docs/src/js/classes/MergeInsertBuilder.md
@@ -76,6 +76,57 @@ the query optimizer chooses a suboptimal path.

 ***

+### useLsmWrite()
+
+```ts
+useLsmWrite(useLsmWrite): MergeInsertBuilder
+```
+
+Controls whether the merge uses the MemWAL LSM write path.
+
+By default (unset), a `mergeInsert` on a table with an LSM write spec is
+routed through Lance's MemWAL shard writer, and a table without one uses
+the standard path. Pass `false` to force the standard path even when a
+spec is set. Pass `true` to require a spec — `mergeInsert` rejects if none
+is installed.
+
+#### Parameters
+
+* **useLsmWrite**: `boolean`
+    Whether to use the LSM write path.
+
+#### Returns
+
+[`MergeInsertBuilder`](MergeInsertBuilder.md)
+
+***
+
+### validateSingleShard()
+
+```ts
+validateSingleShard(validateSingleShard): MergeInsertBuilder
+```
+
+Controls how an LSM merge checks that its input targets a single shard.
+
+When a table has an LSM write spec, every row in a `mergeInsert` call must
+route to the same shard. When `true` (the default), every row is inspected
+to verify this. When `false`, only the first row is inspected and the
+shard it routes to is used for the whole input — a faster path for callers
+that have already pre-sharded their input. Has no effect on tables without
+an LSM write spec.
+
+#### Parameters
+
+* **validateSingleShard**: `boolean`
+    Whether to check every row routes to one shard. Defaults to `true`.
+
+#### Returns
+
+[`MergeInsertBuilder`](MergeInsertBuilder.md)
+
+***
+
 ### whenMatchedUpdateAll()

 ```ts
--- a/docs/src/js/classes/Table.md
+++ b/docs/src/js/classes/Table.md
@@ -187,6 +187,25 @@ Any attempt to use the table after it is closed will result in an error.

 ***

+### closeLsmWriters()
+
+```ts
+abstract closeLsmWriters(): Promise<void>
+```
+
+Drain and close any cached MemWAL shard writers held for this table.
+
+When an [LsmWriteSpec](../interfaces/LsmWriteSpec.md) is installed, `mergeInsert` opens MemWAL
+shard writers and caches them for reuse across calls. This closes them,
+flushing pending data; writers reopen lazily on the next `mergeInsert`.
+It is a no-op when no writers are cached.
+
+#### Returns
+
+`Promise`&lt;`void`&gt;
+
+***
+
 ### countRows()

 ```ts
--- a/docs/src/js/interfaces/LsmWriteSpec.md
+++ b/docs/src/js/interfaces/LsmWriteSpec.md
@@ -11,7 +11,10 @@ Specification selecting Lance's MemWAL LSM-style write path for

 `specType` is `"bucket"`, `"identity"`, or `"unsharded"`. For `"bucket"`,
 `column` and `numBuckets` are required; for `"identity"`, `column` is
-required.
+required and must be a deterministic function of the unenforced primary
+key (every row with a given primary key must always produce the same
+`column` value, or upserts of that key can land in different shards and a
+stale version can win).

 ## Properties

--- a/docs/src/js/interfaces/MergeResult.md
+++ b/docs/src/js/interfaces/MergeResult.md
@@ -32,6 +32,14 @@ numInsertedRows: number;

 ***

+### numRows
+
+```ts
+numRows: number;
+```
+
+***
+
 ### numUpdatedRows

 ```ts
--- a/java/pom.xml
+++ b/java/pom.xml
@@ -28,7 +28,7 @@
    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <arrow.version>15.0.0</arrow.version>
-        <lance-core.version>7.1.0-beta.4</lance-core.version>
+        <lance-core.version>7.2.0-beta.1</lance-core.version>
        <spotless.skip>false</spotless.skip>
        <spotless.version>2.30.0</spotless.version>
        <spotless.java.googlejavaformat.version>1.7</spotless.java.googlejavaformat.version>
--- a/nodejs/test/table.test.ts
+++ b/nodejs/test/table.test.ts
@@ -2625,3 +2625,97 @@ describe("setLsmWriteSpec / unsetLsmWriteSpec", () => {
    ).rejects.toThrow();
  });
 });
+
+describe("LSM merge insert", () => {
+  let tmpDir: tmp.DirResult;
+
+  beforeEach(() => {
+    tmpDir = tmp.dirSync({ unsafeCleanup: true });
+  });
+  afterEach(() => tmpDir.removeCallback());
+
+  async function bucketTable(conn: Connection): Promise<Table> {
+    // The primary key column must be non-nullable.
+    const table = await conn.createEmptyTable(
+      "t",
+      new arrow.Schema([
+        new arrow.Field("id", new arrow.Utf8(), false),
+        new arrow.Field("value", new arrow.Float64(), true),
+      ]),
+    );
+    await table.add([
+      { id: "a", value: 1 },
+      { id: "b", value: 2 },
+    ]);
+    await table.setUnenforcedPrimaryKey("id");
+    // numBuckets = 1: every row routes to the single bucket.
+    await table.setLsmWriteSpec({
+      specType: "bucket",
+      column: "id",
+      numBuckets: 1,
+    });
+    return table;
+  }
+
+  it("routes merge_insert through the shard writer", async () => {
+    const conn = await connect(tmpDir.name);
+    const table = await bucketTable(conn);
+
+    const res = await table
+      .mergeInsert("id")
+      .whenMatchedUpdateAll()
+      .whenNotMatchedInsertAll()
+      .execute([
+        { id: "c", value: 3 },
+        { id: "d", value: 4 },
+      ]);
+    // LSM path: rows go to the MemWAL, so only numRows is populated.
+    expect(res.numRows).toBe(2);
+    expect(res.version).toBe(0);
+    expect(res.numInsertedRows).toBe(0);
+
+    await table.closeLsmWriters();
+  });
+
+  it("falls back to the standard path with useLsmWrite(false)", async () => {
+    const conn = await connect(tmpDir.name);
+    const table = await bucketTable(conn);
+
+    const res = await table
+      .mergeInsert("id")
+      .whenNotMatchedInsertAll()
+      .useLsmWrite(false)
+      .execute([
+        { id: "b", value: 9 },
+        { id: "e", value: 5 },
+      ]);
+    // Standard path commits: id="e" inserted ("b" already exists).
+    expect(res.numInsertedRows).toBe(1);
+    expect(await table.countRows()).toBe(3);
+  });
+
+  it("supports validateSingleShard(false)", async () => {
+    const conn = await connect(tmpDir.name);
+    const table = await bucketTable(conn);
+
+    const res = await table
+      .mergeInsert("id")
+      .whenMatchedUpdateAll()
+      .whenNotMatchedInsertAll()
+      .validateSingleShard(false)
+      .execute([{ id: "f", value: 6 }]);
+    expect(res.numRows).toBe(1);
+  });
+
+  it("rejects a non-upsert merge under an LSM spec", async () => {
+    const conn = await connect(tmpDir.name);
+    const table = await bucketTable(conn);
+
+    await expect(
+      table
+        .mergeInsert("id")
+        .whenNotMatchedInsertAll()
+        .execute([{ id: "g", value: 7 }]),
+    ).rejects.toThrow();
+  });
+});
--- a/nodejs/lancedb/merge.ts
+++ b/nodejs/lancedb/merge.ts
@@ -87,6 +87,41 @@ export class MergeInsertBuilder {
      this.#schema,
    );
  }
+  /**
+   * Controls whether the merge uses the MemWAL LSM write path.
+   *
+   * By default (unset), a `mergeInsert` on a table with an LSM write spec is
+   * routed through Lance's MemWAL shard writer, and a table without one uses
+   * the standard path. Pass `false` to force the standard path even when a
+   * spec is set. Pass `true` to require a spec — `mergeInsert` rejects if none
+   * is installed.
+   *
+   * @param useLsmWrite - Whether to use the LSM write path.
+   */
+  useLsmWrite(useLsmWrite: boolean): MergeInsertBuilder {
+    return new MergeInsertBuilder(
+      this.#native.useLsmWrite(useLsmWrite),
+      this.#schema,
+    );
+  }
+  /**
+   * Controls how an LSM merge checks that its input targets a single shard.
+   *
+   * When a table has an LSM write spec, every row in a `mergeInsert` call must
+   * route to the same shard. When `true` (the default), every row is inspected
+   * to verify this. When `false`, only the first row is inspected and the
+   * shard it routes to is used for the whole input — a faster path for callers
+   * that have already pre-sharded their input. Has no effect on tables without
+   * an LSM write spec.
+   *
+   * @param validateSingleShard - Whether to check every row routes to one shard. Defaults to `true`.
+   */
+  validateSingleShard(validateSingleShard: boolean): MergeInsertBuilder {
+    return new MergeInsertBuilder(
+      this.#native.validateSingleShard(validateSingleShard),
+      this.#schema,
+    );
+  }
  /**
   * Executes the merge insert operation
   *
--- a/nodejs/lancedb/table.ts
+++ b/nodejs/lancedb/table.ts
@@ -161,7 +161,10 @@ export interface Version {
 *
 * `specType` is `"bucket"`, `"identity"`, or `"unsharded"`. For `"bucket"`,
 * `column` and `numBuckets` are required; for `"identity"`, `column` is
- * required.
+ * required and must be a deterministic function of the unenforced primary
+ * key (every row with a given primary key must always produce the same
+ * `column` value, or upserts of that key can land in different shards and a
+ * stale version can win).
 */
 export interface LsmWriteSpec {
  /** One of `"bucket"`, `"identity"`, or `"unsharded"`. */
@@ -567,6 +570,16 @@ export abstract class Table {
   * @returns {Promise<void>}
   */
  abstract unsetLsmWriteSpec(): Promise<void>;
+  /**
+   * Drain and close any cached MemWAL shard writers held for this table.
+   *
+   * When an {@link LsmWriteSpec} is installed, `mergeInsert` opens MemWAL
+   * shard writers and caches them for reuse across calls. This closes them,
+   * flushing pending data; writers reopen lazily on the next `mergeInsert`.
+   * It is a no-op when no writers are cached.
+   * @returns {Promise<void>}
+   */
+  abstract closeLsmWriters(): Promise<void>;
  /** Retrieve the version of the table */

  abstract version(): Promise<number>;
@@ -1041,6 +1054,10 @@ export class LocalTable extends Table {
    return await this.inner.unsetLsmWriteSpec();
  }

+  async closeLsmWriters(): Promise<void> {
+    return await this.inner.closeLsmWriters();
+  }
+
  async version(): Promise<number> {
    return await this.inner.version();
  }
--- a/nodejs/src/merge.rs
+++ b/nodejs/src/merge.rs
@@ -50,6 +50,20 @@ impl NativeMergeInsertBuilder {
        this
    }

+    #[napi]
+    pub fn use_lsm_write(&self, use_lsm_write: bool) -> Self {
+        let mut this = self.clone();
+        this.inner.use_lsm_write(use_lsm_write);
+        this
+    }
+
+    #[napi]
+    pub fn validate_single_shard(&self, validate_single_shard: bool) -> Self {
+        let mut this = self.clone();
+        this.inner.validate_single_shard(validate_single_shard);
+        this
+    }
+
    #[napi(catch_unwind)]
    pub async fn execute(&self, buf: Buffer) -> napi::Result<MergeResult> {
        let data = ipc_file_to_batches(buf.to_vec())
--- a/nodejs/src/table.rs
+++ b/nodejs/src/table.rs
@@ -391,6 +391,11 @@ impl Table {
            .default_error()
    }

+    #[napi(catch_unwind)]
+    pub async fn close_lsm_writers(&self) -> napi::Result<()> {
+        self.inner_ref()?.close_lsm_writers().await.default_error()
+    }
+
    #[napi(catch_unwind)]
    pub async fn version(&self) -> napi::Result<i64> {
        self.inner_ref()?
@@ -940,6 +945,7 @@ pub struct MergeResult {
    pub num_updated_rows: i64,
    pub num_deleted_rows: i64,
    pub num_attempts: i64,
+    pub num_rows: i64,
 }

 impl From<lancedb::table::MergeResult> for MergeResult {
@@ -950,6 +956,7 @@ impl From<lancedb::table::MergeResult> for MergeResult {
            num_updated_rows: value.num_updated_rows as i64,
            num_deleted_rows: value.num_deleted_rows as i64,
            num_attempts: value.num_attempts as i64,
+            num_rows: value.num_rows as i64,
        }
    }
 }
--- a/python/python/lancedb/_lancedb.pyi
+++ b/python/python/lancedb/_lancedb.pyi
@@ -220,6 +220,7 @@ class Table:
    async def set_unenforced_primary_key(self, columns: List[str]) -> None: ...
    async def set_lsm_write_spec(self, spec: LsmWriteSpec) -> None: ...
    async def unset_lsm_write_spec(self) -> None: ...
+    async def close_lsm_writers(self) -> None: ...
    @property
    def tags(self) -> Tags: ...
    def query(self) -> Query: ...
@@ -420,6 +421,7 @@ class MergeResult:
    num_inserted_rows: int
    num_deleted_rows: int
    num_attempts: int
+    num_rows: int

 class LsmWriteSpec:
    """Specification selecting Lance's MemWAL LSM-style write path for
--- a/python/python/lancedb/index.py
+++ b/python/python/lancedb/index.py
@@ -281,6 +281,9 @@ class HnswPq:
    m: int = 20
    ef_construction: int = 300
    target_partition_size: Optional[int] = None
+    # Name of the accelerator (e.g. "cuda") to use for IVF training. When set,
+    # create_index() dispatches to pylance to build the index on the accelerator.
+    accelerator: Optional[str] = None


@dataclass
@@ -386,6 +389,9 @@ class HnswSq:
    m: int = 20
    ef_construction: int = 300
    target_partition_size: Optional[int] = None
+    # Name of the accelerator (e.g. "cuda") to use for IVF training. When set,
+    # create_index() dispatches to pylance to build the index on the accelerator.
+    accelerator: Optional[str] = None


@dataclass
@@ -579,6 +585,9 @@ class IvfFlat:
    max_iterations: int = 50
    sample_rate: int = 256
    target_partition_size: Optional[int] = None
+    # Name of the accelerator (e.g. "cuda") to use for IVF training. When set,
+    # create_index() dispatches to pylance to build the index on the accelerator.
+    accelerator: Optional[str] = None


@dataclass
@@ -609,6 +618,9 @@ class IvfSq:
    max_iterations: int = 50
    sample_rate: int = 256
    target_partition_size: Optional[int] = None
+    # Name of the accelerator (e.g. "cuda") to use for IVF training. When set,
+    # create_index() dispatches to pylance to build the index on the accelerator.
+    accelerator: Optional[str] = None


@dataclass
@@ -739,6 +751,9 @@ class IvfPq:
    max_iterations: int = 50
    sample_rate: int = 256
    target_partition_size: Optional[int] = None
+    # Name of the accelerator (e.g. "cuda") to use for IVF training. When set,
+    # create_index() dispatches to pylance to build the index on the accelerator.
+    accelerator: Optional[str] = None


@dataclass
@@ -792,6 +807,9 @@ class IvfRq:
    max_iterations: int = 50
    sample_rate: int = 256
    target_partition_size: Optional[int] = None
+    # Name of the accelerator (e.g. "cuda") to use for IVF training. When set,
+    # create_index() dispatches to pylance to build the index on the accelerator.
+    accelerator: Optional[str] = None


 __all__ = [
--- a/python/python/lancedb/merge.py
+++ b/python/python/lancedb/merge.py
@@ -34,6 +34,8 @@ class LanceMergeInsertBuilder(object):
        self._when_not_matched_by_source_condition = None
        self._timeout = None
        self._use_index = True
+        self._use_lsm_write = None
+        self._validate_single_shard = None

    def when_matched_update_all(
        self, *, where: Optional[str] = None
@@ -96,6 +98,46 @@ class LanceMergeInsertBuilder(object):
        self._use_index = use_index
        return self

+    def use_lsm_write(self, use_lsm_write: bool) -> LanceMergeInsertBuilder:
+        """
+        Controls whether the merge uses the MemWAL LSM write path.
+
+        By default (unset), a `merge_insert` on a table with an LSM write spec
+        is routed through Lance's MemWAL shard writer, and a table without one
+        uses the standard path. Pass `False` to force the standard path even
+        when a spec is set. Pass `True` to require a spec — `merge_insert`
+        raises an error if none is installed.
+
+        Parameters
+        ----------
+        use_lsm_write: bool
+            Whether to use the LSM write path.
+        """
+        self._use_lsm_write = use_lsm_write
+        return self
+
+    def validate_single_shard(
+        self, validate_single_shard: bool
+    ) -> LanceMergeInsertBuilder:
+        """
+        Controls how an LSM merge checks that its input targets a single shard.
+
+        When a table has an LSM write spec, every row in a `merge_insert` call
+        must route to the same shard. When `True` (the default), every row is
+        inspected to verify this. When `False`, only the first row is inspected
+        and the shard it routes to is used for the whole input — a faster path
+        for callers that have already pre-sharded their input.
+
+        Has no effect on tables without an LSM write spec.
+
+        Parameters
+        ----------
+        validate_single_shard: bool
+            Whether to check every row routes to one shard. Defaults to `True`.
+        """
+        self._validate_single_shard = validate_single_shard
+        return self
+
    def execute(
        self,
        new_data: DATA,
--- a/python/python/lancedb/remote/table.py
+++ b/python/python/lancedb/remote/table.py
@@ -2,11 +2,24 @@
 # SPDX-FileCopyrightText: Copyright The LanceDB Authors

 from datetime import timedelta
+import deprecation
 import logging
 from functools import cached_property
-from typing import Any, Callable, Dict, Iterable, List, Optional, Union, Literal
+from typing import (
+    Any,
+    Callable,
+    Dict,
+    Iterable,
+    List,
+    Optional,
+    Union,
+    Literal,
+    overload,
+)
 import warnings

+from lancedb import __version__
+
 from lancedb._lancedb import (
    AddColumnsResult,
    AddResult,
@@ -32,6 +45,7 @@ from lancedb.index import (
    LabelList,
 )
 from lancedb.remote.db import LOOP
+from lancedb.table import IndexConfigType, KNOWN_METRICS
 import pyarrow as pa

 from lancedb.common import DATA, VEC, VECTOR_COLUMN_NAME
@@ -122,6 +136,11 @@ class RemoteTable(Table):
        """List all the stats of a specified index"""
        return LOOP.run(self._table.index_stats(index_uuid))

+    @deprecation.deprecated(
+        deprecated_in="0.25.0",
+        current_version=__version__,
+        details="Use create_index() with config=BTree()/Bitmap()/LabelList() instead.",
+    )
    def create_scalar_index(
        self,
        column: str,
@@ -131,7 +150,12 @@ class RemoteTable(Table):
        wait_timeout: Optional[timedelta] = None,
        name: Optional[str] = None,
    ):
-        """Creates a scalar index
+        """Creates a scalar index.
+
+        .. deprecated:: 0.25.0
+            Use :meth:`create_index` with a BTree, Bitmap, or LabelList config instead.
+            Example: ``table.create_index("column", config=BTree())``
+
        Parameters
        ----------
        column : str
@@ -162,6 +186,11 @@ class RemoteTable(Table):
            )
        )

+    @deprecation.deprecated(
+        deprecated_in="0.25.0",
+        current_version=__version__,
+        details="Use create_index() with config=FTS() instead.",
+    )
    def create_fts_index(
        self,
        column: str,
@@ -182,6 +211,12 @@ class RemoteTable(Table):
        prefix_only: bool = False,
        name: Optional[str] = None,
    ):
+        """Create a full-text search index on a column.
+
+        .. deprecated:: 0.25.0
+            Use :meth:`create_index` with an FTS config instead.
+            Example: ``table.create_index("text_column", config=FTS())``
+        """
        config = FTS(
            with_position=with_position,
            base_tokenizer=base_tokenizer,
@@ -205,9 +240,43 @@ class RemoteTable(Table):
            )
        )

+    # New unified API overload
+    @overload
    def create_index(
        self,
-        metric="l2",
+        column: str,
+        /,
+        *,
+        config: IndexConfigType,
+        wait_timeout: Optional[timedelta] = ...,
+        name: Optional[str] = ...,
+        train: bool = ...,
+    ) -> None: ...
+
+    # Legacy API overload (deprecated)
+    @overload
+    def create_index(
+        self,
+        metric: Literal["l2", "cosine", "dot", "hamming"] = ...,
+        vector_column_name: str = ...,
+        index_cache_size: Optional[int] = ...,
+        num_partitions: Optional[int] = ...,
+        num_sub_vectors: Optional[int] = ...,
+        replace: Optional[bool] = ...,
+        accelerator: Optional[str] = ...,
+        index_type: Literal[
+            "VECTOR", "IVF_FLAT", "IVF_SQ", "IVF_PQ", "IVF_HNSW_SQ", "IVF_HNSW_PQ"
+        ] = ...,
+        wait_timeout: Optional[timedelta] = ...,
+        *,
+        num_bits: int = ...,
+        name: Optional[str] = ...,
+        train: bool = ...,
+    ) -> None: ...
+
+    def create_index(
+        self,
+        metric: str = "l2",
        vector_column_name: str = VECTOR_COLUMN_NAME,
        index_cache_size: Optional[int] = None,
        num_partitions: Optional[int] = None,
@@ -218,89 +287,113 @@ class RemoteTable(Table):
        wait_timeout: Optional[timedelta] = None,
        *,
        num_bits: int = 8,
+        config: Optional[IndexConfigType] = None,
        name: Optional[str] = None,
        train: bool = True,
    ):
-        """Create an index on the table.
+        """Create an index on a column.

-        Parameters
-        ----------
-        metric : str
-            The metric to use for the index. Default is "l2".
-        vector_column_name : str
-            The name of the vector column. Default is "vector".
+        This method supports both the new unified API and the legacy API
+        for backwards compatibility. The new API takes the column name as the
+        first positional argument and an index configuration object via
+        ``config``; the legacy API takes the distance metric as the first
+        argument plus separate ``vector_column_name`` / ``num_partitions`` /
+        etc. parameters, and emits a ``DeprecationWarning``.

        Examples
        --------
-        >>> import lancedb
-        >>> import uuid
-        >>> from lancedb.schema import vector
-        >>> db = lancedb.connect("db://...", api_key="...", # doctest: +SKIP
-        ...                      region="...") # doctest: +SKIP
-        >>> table_name = uuid.uuid4().hex
-        >>> schema = pa.schema(
-        ...     [
-        ...             pa.field("id", pa.uint32(), False),
-        ...            pa.field("vector", vector(128), False),
-        ...             pa.field("s", pa.string(), False),
-        ...     ]
+        New API (recommended):
+
+        >>> table.create_index(  # doctest: +SKIP
+        ...     "vector", config=IvfPq(distance_type="l2")
        ... )
-        >>> table = db.create_table( # doctest: +SKIP
-        ...     table_name, # doctest: +SKIP
-        ...     schema=schema, # doctest: +SKIP
+        >>> table.create_index("category", config=BTree())  # doctest: +SKIP
+        >>> table.create_index("content", config=FTS())  # doctest: +SKIP
+
+        Legacy API (deprecated):
+
+        >>> table.create_index(  # doctest: +SKIP
+        ...     "l2", vector_column_name="vector"
        ... )
-        >>> table.create_index("l2", "vector") # doctest: +SKIP
        """
+        # Detect whether this is a legacy API call
+        is_legacy = self._is_legacy_create_index_call(
+            metric,
+            config,
+            num_partitions,
+            num_sub_vectors,
+            vector_column_name,
+            accelerator,
+            index_cache_size,
+            replace,
+        )

-        if accelerator is not None:
-            logging.warning(
-                "GPU accelerator is not yet supported on LanceDB cloud."
-                "If you have 100M+ vectors to index,"
-                "please contact us at contact@lancedb.com"
-            )
-        if replace is not None:
-            logging.warning(
-                "replace is not supported on LanceDB cloud."
-                "Existing indexes will always be replaced."
+        if is_legacy:
+            warnings.warn(
+                "The create_index() API with metric/num_partitions parameters is "
+                "deprecated and will be removed in a future version. "
+                "Please migrate to the new unified API:\n"
+                "  # Old (deprecated):\n"
+                "  table.create_index('l2', vector_column_name='my_vector')\n"
+                "  # New (recommended):\n"
+                "  table.create_index('my_vector', config=IvfPq(distance_type='l2'))",
+                DeprecationWarning,
+                stacklevel=2,
            )

-        index_type = index_type.upper()
-        if index_type == "VECTOR" or index_type == "IVF_PQ":
-            config = IvfPq(
-                distance_type=metric,
-                num_partitions=num_partitions,
-                num_sub_vectors=num_sub_vectors,
-                num_bits=num_bits,
-            )
-        elif index_type == "IVF_RQ":
-            config = IvfRq(
-                distance_type=metric,
-                num_partitions=num_partitions,
-                num_bits=num_bits,
-            )
-        elif index_type == "IVF_SQ":
-            config = IvfSq(distance_type=metric, num_partitions=num_partitions)
-        elif index_type == "IVF_HNSW_PQ":
-            raise ValueError(
-                "IVF_HNSW_PQ is not supported on LanceDB cloud."
-                "Please use IVF_HNSW_SQ instead."
-            )
-        elif index_type == "IVF_HNSW_SQ":
-            config = HnswSq(distance_type=metric, num_partitions=num_partitions)
-        elif index_type == "IVF_HNSW_FLAT":
-            config = HnswFlat(distance_type=metric, num_partitions=num_partitions)
-        elif index_type == "IVF_FLAT":
-            config = IvfFlat(distance_type=metric, num_partitions=num_partitions)
+            column = vector_column_name
+
+            if accelerator is not None:
+                logging.warning(
+                    "GPU accelerator is not yet supported on LanceDB cloud."
+                    "If you have 100M+ vectors to index,"
+                    "please contact us at contact@lancedb.com"
+                )
+            if replace is not None:
+                logging.warning(
+                    "replace is not supported on LanceDB cloud."
+                    "Existing indexes will always be replaced."
+                )
+
+            idx_type = index_type.upper()
+            if idx_type == "VECTOR" or idx_type == "IVF_PQ":
+                config = IvfPq(
+                    distance_type=metric,
+                    num_partitions=num_partitions,
+                    num_sub_vectors=num_sub_vectors,
+                    num_bits=num_bits,
+                )
+            elif idx_type == "IVF_RQ":
+                config = IvfRq(
+                    distance_type=metric,
+                    num_partitions=num_partitions,
+                    num_bits=num_bits,
+                )
+            elif idx_type == "IVF_SQ":
+                config = IvfSq(distance_type=metric, num_partitions=num_partitions)
+            elif idx_type == "IVF_HNSW_PQ":
+                raise ValueError(
+                    "IVF_HNSW_PQ is not supported on LanceDB cloud."
+                    "Please use IVF_HNSW_SQ instead."
+                )
+            elif idx_type == "IVF_HNSW_SQ":
+                config = HnswSq(distance_type=metric, num_partitions=num_partitions)
+            elif idx_type == "IVF_HNSW_FLAT":
+                config = HnswFlat(distance_type=metric, num_partitions=num_partitions)
+            elif idx_type == "IVF_FLAT":
+                config = IvfFlat(distance_type=metric, num_partitions=num_partitions)
+            else:
+                raise ValueError(
+                    f"Unknown vector index type: {idx_type}. Valid options are"
+                    " 'IVF_FLAT', 'IVF_PQ', 'IVF_RQ', 'IVF_SQ',"
+                    " 'IVF_HNSW_PQ', 'IVF_HNSW_SQ', 'IVF_HNSW_FLAT'"
+                )
        else:
-            raise ValueError(
-                f"Unknown vector index type: {index_type}. Valid options are"
-                " 'IVF_FLAT', 'IVF_PQ', 'IVF_RQ', 'IVF_SQ',"
-                " 'IVF_HNSW_PQ', 'IVF_HNSW_SQ', 'IVF_HNSW_FLAT'"
-            )
+            column = metric

        LOOP.run(
            self._table.create_index(
-                vector_column_name,
+                column,
                config=config,
                wait_timeout=wait_timeout,
                name=name,
@@ -308,6 +401,37 @@ class RemoteTable(Table):
            )
        )

+    def _is_legacy_create_index_call(
+        self,
+        first_arg: str,
+        config: Optional[IndexConfigType],
+        num_partitions: Optional[int],
+        num_sub_vectors: Optional[int],
+        vector_column_name: str,
+        accelerator: Optional[str],
+        index_cache_size: Optional[int],
+        replace: Optional[bool],
+    ) -> bool:
+        """Detect if this is a legacy create_index call."""
+        if config is not None:
+            return False
+        if any(
+            x is not None
+            for x in (
+                num_partitions,
+                num_sub_vectors,
+                accelerator,
+                index_cache_size,
+                replace,
+            )
+        ):
+            return True
+        if vector_column_name != VECTOR_COLUMN_NAME:
+            return True
+        if first_arg.lower() in KNOWN_METRICS:
+            return True
+        return False
+
    def add(
        self,
        data: DATA,
@@ -668,6 +792,10 @@ class RemoteTable(Table):
        """Not supported on LanceDB Cloud."""
        return LOOP.run(self._table.unset_lsm_write_spec())

+    def close_lsm_writers(self) -> None:
+        """No-op on LanceDB Cloud (no local shard writers)."""
+        return LOOP.run(self._table.close_lsm_writers())
+
    def drop_index(self, index_name: str):
        return LOOP.run(self._table.drop_index(index_name))

--- a/python/python/lancedb/table.py
+++ b/python/python/lancedb/table.py
@@ -174,6 +174,24 @@ if TYPE_CHECKING:
        DistanceType,
    )

+# Type alias for index configuration objects
+IndexConfigType = Union[
+    IvfFlat,
+    IvfPq,
+    IvfSq,
+    IvfRq,
+    HnswFlat,
+    HnswPq,
+    HnswSq,
+    BTree,
+    Bitmap,
+    LabelList,
+    FTS,
+]
+
+# Known distance metrics for legacy API detection
+KNOWN_METRICS = {"l2", "cosine", "dot", "hamming"}
+

 def _into_pyarrow_reader(
    data, schema: Optional[pa.Schema] = None
@@ -807,11 +825,49 @@ class Table(ABC):
        """
        raise NotImplementedError

+    # New unified API overload
+    @overload
    def create_index(
        self,
-        metric="l2",
-        num_partitions=256,
-        num_sub_vectors=96,
+        column: str,
+        /,
+        *,
+        config: IndexConfigType,
+        replace: bool = ...,
+        wait_timeout: Optional[timedelta] = ...,
+        name: Optional[str] = ...,
+        train: bool = ...,
+    ) -> None: ...
+
+    # Legacy API overload (deprecated)
+    @overload
+    def create_index(
+        self,
+        metric: Literal["l2", "cosine", "dot", "hamming"] = ...,
+        num_partitions: Optional[int] = ...,
+        num_sub_vectors: Optional[int] = ...,
+        vector_column_name: str = ...,
+        replace: bool = ...,
+        accelerator: Optional[str] = ...,
+        index_cache_size: Optional[int] = ...,
+        *,
+        index_type: VectorIndexType = ...,
+        wait_timeout: Optional[timedelta] = ...,
+        num_bits: int = ...,
+        max_iterations: int = ...,
+        sample_rate: int = ...,
+        m: int = ...,
+        ef_construction: int = ...,
+        name: Optional[str] = ...,
+        train: bool = ...,
+        target_partition_size: Optional[int] = ...,
+    ) -> None: ...
+
+    def create_index(
+        self,
+        metric: DistanceType = "l2",
+        num_partitions: Optional[int] = None,
+        num_sub_vectors: Optional[int] = None,
        vector_column_name: str = VECTOR_COLUMN_NAME,
        replace: bool = True,
        accelerator: Optional[str] = None,
@@ -824,46 +880,53 @@ class Table(ABC):
        sample_rate: int = 256,
        m: int = 20,
        ef_construction: int = 300,
+        config: Optional[IndexConfigType] = None,
        name: Optional[str] = None,
        train: bool = True,
        target_partition_size: Optional[int] = None,
    ):
-        """Create an index on the table.
+        """Create an index on a column.
+
+        This method supports both the new unified API and the legacy API
+        for backwards compatibility. The new API takes the column name as the
+        first positional argument and an index configuration object via
+        ``config``; the legacy API takes the distance metric as the first
+        argument plus separate ``vector_column_name`` / ``num_partitions`` /
+        etc. parameters, and emits a ``DeprecationWarning``.

        Parameters
        ----------
-        metric: str, default "l2"
-            The distance metric to use when creating the index.
-            Valid values are "l2", "cosine", "dot", or "hamming".
-            l2 is euclidean distance.
-            Hamming is available only for binary vectors.
-        num_partitions: int, default 256
-            The number of IVF partitions to use when creating the index.
-            Default is 256.
-        num_sub_vectors: int, default 96
-            The number of PQ sub-vectors to use when creating the index.
-            Default is 96.
-        vector_column_name: str, default "vector"
-            The vector column name to create the index.
-        replace: bool, default True
-            - If True, replace the existing index if it exists.
+        metric : str
+            For new API: the column name to index.
+            For legacy API: the distance metric ("l2", "cosine", "dot", "hamming").
+        config : IndexConfigType, optional
+            The index configuration object. If provided, uses the new unified API.
+            Can be one of: IvfFlat, IvfPq, IvfSq, IvfRq, HnswPq, HnswSq,
+            BTree, Bitmap, LabelList, FTS.
+        replace : bool, default True
+            Whether to replace an existing index on this column.
+        wait_timeout : timedelta, optional
+            Timeout to wait for async indexing to complete.
+        name : str, optional
+            Custom name for the index.
+        train : bool, default True
+            Whether to train the index with existing data.

-            - If False, raise an error if duplicate index exists.
-        accelerator: str, default None
-            If set, use the given accelerator to create the index.
-            Only support "cuda" for now.
-        index_cache_size : int, optional
-            The size of the index cache in number of entries. Default value is 256.
-        num_bits: int
-            The number of bits to encode sub-vectors. Only used with the IVF_PQ index.
-            Only 4 and 8 are supported.
-        wait_timeout: timedelta, optional
-            The timeout to wait if indexing is asynchronous.
-        name: str, optional
-            The name of the index. If not provided, a default name will be generated.
-        train: bool, default True
-            Whether to train the index with existing data. Vector indices always train
-            with existing data.
+        Examples
+        --------
+        New API (recommended):
+
+        >>> table.create_index(  # doctest: +SKIP
+        ...     "vector", config=IvfPq(distance_type="l2")
+        ... )
+        >>> table.create_index("category", config=BTree())  # doctest: +SKIP
+        >>> table.create_index("content", config=FTS())  # doctest: +SKIP
+
+        Legacy API (deprecated):
+
+        >>> table.create_index(  # doctest: +SKIP
+        ...     "l2", vector_column_name="vector"
+        ... )
        """
        raise NotImplementedError

@@ -1188,7 +1251,7 @@ class Table(ABC):
        ...      .when_not_matched_insert_all() \\
        ...      .execute(new_data)
        >>> res
-        MergeResult(version=2, num_updated_rows=2, num_inserted_rows=1, num_deleted_rows=0, num_attempts=1)
+        MergeResult(version=2, num_updated_rows=2, num_inserted_rows=1, num_deleted_rows=0, num_attempts=1, num_rows=3)
        >>> # The order of new rows is non-deterministic since we use
        >>> # a hash-join as part of this operation and so we sort here
        >>> table.to_arrow().sort_by("a").to_pandas()
@@ -2250,11 +2313,51 @@ class LanceTable(Table):
            dataset, allow_pyarrow_filter=False, batch_size=batch_size
        )

+    # New unified API overload
+    @overload
    def create_index(
        self,
-        metric: DistanceType = "l2",
-        num_partitions=None,
-        num_sub_vectors=None,
+        column: str,
+        /,
+        *,
+        config: IndexConfigType,
+        replace: bool = ...,
+        wait_timeout: Optional[timedelta] = ...,
+        name: Optional[str] = ...,
+        train: bool = ...,
+    ) -> None: ...
+
+    # Legacy API overload (deprecated)
+    @overload
+    def create_index(
+        self,
+        metric: Literal["l2", "cosine", "dot", "hamming"] = ...,
+        num_partitions: Optional[int] = ...,
+        num_sub_vectors: Optional[int] = ...,
+        vector_column_name: str = ...,
+        replace: bool = ...,
+        accelerator: Optional[str] = ...,
+        index_cache_size: Optional[int] = ...,
+        num_bits: int = ...,
+        index_type: Literal[
+            "IVF_FLAT", "IVF_SQ", "IVF_PQ", "IVF_RQ", "IVF_HNSW_SQ", "IVF_HNSW_PQ"
+        ] = ...,
+        max_iterations: int = ...,
+        sample_rate: int = ...,
+        m: int = ...,
+        ef_construction: int = ...,
+        *,
+        wait_timeout: Optional[timedelta] = ...,
+        name: Optional[str] = ...,
+        train: bool = ...,
+        target_partition_size: Optional[int] = ...,
+    ) -> None: ...
+
+    def create_index(
+        self,
+        metric: str = "l2",
+        num_partitions: Optional[int] = None,
+        num_sub_vectors: Optional[int] = None,
        vector_column_name: str = VECTOR_COLUMN_NAME,
        replace: bool = True,
        accelerator: Optional[str] = None,
@@ -2274,47 +2377,232 @@ class LanceTable(Table):
        m: int = 20,
        ef_construction: int = 300,
        *,
+        config: Optional[IndexConfigType] = None,
+        wait_timeout: Optional[timedelta] = None,
        name: Optional[str] = None,
        train: bool = True,
        target_partition_size: Optional[int] = None,
    ):
-        """Create an index on the table."""
-        if accelerator is not None:
-            # accelerator is only supported through pylance.
-            self.to_lance().create_index(
-                column=vector_column_name,
-                index_type=index_type,
+        """Create an index on a column.
+
+        This method supports both the new unified API and the legacy API
+        for backwards compatibility. The new API takes the column name as the
+        first positional argument and an index configuration object via
+        ``config``; the legacy API takes the distance metric as the first
+        argument plus separate ``vector_column_name`` / ``num_partitions`` /
+        etc. parameters, and emits a ``DeprecationWarning``.
+
+        Parameters
+        ----------
+        metric : str
+            For new API: the column name to index.
+            For legacy API: the distance metric ("l2", "cosine", "dot", "hamming").
+        config : IndexConfigType, optional
+            The index configuration object. If provided, uses the new unified API.
+            Can be one of: IvfFlat, IvfPq, IvfSq, IvfRq, HnswPq, HnswSq,
+            BTree, Bitmap, LabelList, FTS.
+        replace : bool, default True
+            Whether to replace an existing index on this column.
+        wait_timeout : timedelta, optional
+            Timeout to wait for async indexing to complete.
+        name : str, optional
+            Custom name for the index.
+        train : bool, default True
+            Whether to train the index with existing data.
+
+        Examples
+        --------
+        New API (recommended):
+
+        >>> table.create_index(  # doctest: +SKIP
+        ...     "vector", config=IvfPq(distance_type="l2")
+        ... )
+        >>> table.create_index("category", config=BTree())  # doctest: +SKIP
+        >>> table.create_index("content", config=FTS())  # doctest: +SKIP
+
+        Legacy API (deprecated):
+
+        >>> table.create_index(  # doctest: +SKIP
+        ...     "l2", vector_column_name="vector"
+        ... )
+        """
+        # Detect whether this is a legacy API call
+        is_legacy = self._is_legacy_create_index_call(
+            metric,
+            config,
+            num_partitions,
+            num_sub_vectors,
+            vector_column_name,
+            accelerator,
+            index_cache_size,
+        )
+
+        if is_legacy:
+            warnings.warn(
+                "The create_index() API with metric/num_partitions parameters is "
+                "deprecated and will be removed in a future version. "
+                "Please migrate to the new unified API:\n"
+                "  # Old (deprecated):\n"
+                "  table.create_index('l2', vector_column_name='my_vector')\n"
+                "  # New (recommended):\n"
+                "  table.create_index('my_vector', config=IvfPq(distance_type='l2'))",
+                DeprecationWarning,
+                stacklevel=2,
+            )
+
+            # Legacy API: first arg is the distance metric
+            column = vector_column_name
+
+            # Build config from legacy parameters
+            config = self._build_vector_config_from_legacy_params(
                metric=metric,
+                index_type=index_type,
                num_partitions=num_partitions,
                num_sub_vectors=num_sub_vectors,
-                replace=replace,
-                accelerator=accelerator,
-                index_cache_size=index_cache_size,
                num_bits=num_bits,
+                max_iterations=max_iterations,
+                sample_rate=sample_rate,
                m=m,
                ef_construction=ef_construction,
                target_partition_size=target_partition_size,
+                accelerator=accelerator,
            )
-            self.checkout_latest()
-            return
-        elif index_type == "IVF_FLAT":
-            config = IvfFlat(
+
+            # Handle accelerator through pylance
+            if accelerator is not None:
+                self.to_lance().create_index(
+                    column=column,
+                    index_type=index_type,
+                    metric=metric,
+                    num_partitions=num_partitions,
+                    num_sub_vectors=num_sub_vectors,
+                    replace=replace,
+                    accelerator=accelerator,
+                    index_cache_size=index_cache_size,
+                    num_bits=num_bits,
+                    m=m,
+                    ef_construction=ef_construction,
+                    target_partition_size=target_partition_size,
+                )
+                self.checkout_latest()
+                return
+        else:
+            # New API: metric is the column name
+            column = metric
+
+            # Check if config has accelerator set and dispatch to pylance
+            if config is not None and hasattr(config, "accelerator"):
+                acc = getattr(config, "accelerator", None)
+                if acc is not None:
+                    # Dispatch to pylance for GPU acceleration
+                    index_type_map = {
+                        "IvfFlat": "IVF_FLAT",
+                        "IvfSq": "IVF_SQ",
+                        "IvfPq": "IVF_PQ",
+                        "IvfRq": "IVF_RQ",
+                        "HnswPq": "IVF_HNSW_PQ",
+                        "HnswSq": "IVF_HNSW_SQ",
+                    }
+                    cfg_type = type(config).__name__
+                    lance_index_type = index_type_map.get(cfg_type, "IVF_PQ")
+
+                    self.to_lance().create_index(
+                        column=column,
+                        index_type=lance_index_type,
+                        metric=getattr(config, "distance_type", "l2"),
+                        num_partitions=getattr(config, "num_partitions", None),
+                        num_sub_vectors=getattr(config, "num_sub_vectors", None),
+                        replace=replace,
+                        accelerator=acc,
+                        num_bits=getattr(config, "num_bits", 8),
+                        m=getattr(config, "m", 20),
+                        ef_construction=getattr(config, "ef_construction", 300),
+                        target_partition_size=getattr(
+                            config, "target_partition_size", None
+                        ),
+                    )
+                    self.checkout_latest()
+                    return
+
+        return LOOP.run(
+            self._table.create_index(
+                column,
+                replace=replace,
+                config=config,
+                wait_timeout=wait_timeout,
+                name=name,
+                train=train,
+            )
+        )
+
+    def _is_legacy_create_index_call(
+        self,
+        first_arg: str,
+        config: Optional[IndexConfigType],
+        num_partitions: Optional[int],
+        num_sub_vectors: Optional[int],
+        vector_column_name: str,
+        accelerator: Optional[str],
+        index_cache_size: Optional[int],
+    ) -> bool:
+        """Detect if this is a legacy create_index call."""
+        # If config is provided, it's definitely the new API
+        if config is not None:
+            return False
+
+        # If old-style parameters were explicitly set, it's legacy
+        if any(
+            x is not None
+            for x in (num_partitions, num_sub_vectors, accelerator, index_cache_size)
+        ):
+            return True
+
+        # If vector_column_name differs from default, it's legacy
+        if vector_column_name != VECTOR_COLUMN_NAME:
+            return True
+
+        # If first arg is a known metric, assume legacy
+        if first_arg.lower() in KNOWN_METRICS:
+            return True
+
+        # Otherwise assume new API
+        return False
+
+    def _build_vector_config_from_legacy_params(
+        self,
+        metric: str,
+        index_type: str,
+        num_partitions: Optional[int],
+        num_sub_vectors: Optional[int],
+        num_bits: int,
+        max_iterations: int,
+        sample_rate: int,
+        m: int,
+        ef_construction: int,
+        target_partition_size: Optional[int],
+        accelerator: Optional[str],
+    ) -> IndexConfigType:
+        """Build an index config object from legacy parameters."""
+        if index_type == "IVF_FLAT":
+            return IvfFlat(
                distance_type=metric,
                num_partitions=num_partitions,
                max_iterations=max_iterations,
                sample_rate=sample_rate,
                target_partition_size=target_partition_size,
+                accelerator=accelerator,
            )
        elif index_type == "IVF_SQ":
-            config = IvfSq(
+            return IvfSq(
                distance_type=metric,
                num_partitions=num_partitions,
                max_iterations=max_iterations,
                sample_rate=sample_rate,
                target_partition_size=target_partition_size,
+                accelerator=accelerator,
            )
        elif index_type == "IVF_PQ":
-            config = IvfPq(
+            return IvfPq(
                distance_type=metric,
                num_partitions=num_partitions,
                num_sub_vectors=num_sub_vectors,
@@ -2322,18 +2610,20 @@ class LanceTable(Table):
                max_iterations=max_iterations,
                sample_rate=sample_rate,
                target_partition_size=target_partition_size,
+                accelerator=accelerator,
            )
        elif index_type == "IVF_RQ":
-            config = IvfRq(
+            return IvfRq(
                distance_type=metric,
                num_partitions=num_partitions,
                num_bits=num_bits,
                max_iterations=max_iterations,
                sample_rate=sample_rate,
                target_partition_size=target_partition_size,
+                accelerator=accelerator,
            )
        elif index_type == "IVF_HNSW_PQ":
-            config = HnswPq(
+            return HnswPq(
                distance_type=metric,
                num_partitions=num_partitions,
                num_sub_vectors=num_sub_vectors,
@@ -2343,9 +2633,10 @@ class LanceTable(Table):
                m=m,
                ef_construction=ef_construction,
                target_partition_size=target_partition_size,
+                accelerator=accelerator,
            )
        elif index_type == "IVF_HNSW_SQ":
-            config = HnswSq(
+            return HnswSq(
                distance_type=metric,
                num_partitions=num_partitions,
                max_iterations=max_iterations,
@@ -2353,9 +2644,10 @@ class LanceTable(Table):
                m=m,
                ef_construction=ef_construction,
                target_partition_size=target_partition_size,
+                accelerator=accelerator,
            )
        elif index_type == "IVF_HNSW_FLAT":
-            config = HnswFlat(
+            return HnswFlat(
                distance_type=metric,
                num_partitions=num_partitions,
                max_iterations=max_iterations,
@@ -2367,16 +2659,6 @@ class LanceTable(Table):
        else:
            raise ValueError(f"Unknown index type {index_type}")

-        return LOOP.run(
-            self._table.create_index(
-                vector_column_name,
-                replace=replace,
-                config=config,
-                name=name,
-                train=train,
-            )
-        )
-
    def drop_index(self, name: str) -> None:
        """
        Drops an index from the table
@@ -2476,6 +2758,11 @@ class LanceTable(Table):
        """
        return LOOP.run(self._table.latest_storage_options())

+    @deprecation.deprecated(
+        deprecated_in="0.25.0",
+        current_version=__version__,
+        details="Use create_index() with config=BTree()/Bitmap()/LabelList() instead.",
+    )
    def create_scalar_index(
        self,
        column: str,
@@ -2484,6 +2771,12 @@ class LanceTable(Table):
        index_type: ScalarIndexType = "BTREE",
        name: Optional[str] = None,
    ):
+        """Create a scalar index on a column.
+
+        .. deprecated:: 0.25.0
+            Use :meth:`create_index` with a BTree, Bitmap, or LabelList config instead.
+            Example: ``table.create_index("column", config=BTree())``
+        """
        if index_type == "BTREE":
            config = BTree()
        elif index_type == "BITMAP":
@@ -2496,6 +2789,11 @@ class LanceTable(Table):
            self._table.create_index(column, replace=replace, config=config, name=name)
        )

+    @deprecation.deprecated(
+        deprecated_in="0.25.0",
+        current_version=__version__,
+        details="Use create_index() with config=FTS() instead.",
+    )
    def create_fts_index(
        self,
        field_names: Union[str, List[str]],
@@ -2519,6 +2817,12 @@ class LanceTable(Table):
        prefix_only: bool = False,
        name: Optional[str] = None,
    ):
+        """Create a full-text search index on a column.
+
+        .. deprecated:: 0.25.0
+            Use :meth:`create_index` with an FTS config instead.
+            Example: ``table.create_index("text_column", config=FTS())``
+        """
        self._ensure_no_legacy_fts_index()

        if use_tantivy:
@@ -3297,6 +3601,11 @@ class LanceTable(Table):
        [`AsyncTable.unset_lsm_write_spec`][lancedb.AsyncTable.unset_lsm_write_spec]."""
        return LOOP.run(self._table.unset_lsm_write_spec())

+    def close_lsm_writers(self) -> None:
+        """Close cached MemWAL shard writers. See
+        [`AsyncTable.close_lsm_writers`][lancedb.AsyncTable.close_lsm_writers]."""
+        return LOOP.run(self._table.close_lsm_writers())
+
    def uses_v2_manifest_paths(self) -> bool:
        """
        Check if the table is using the new v2 manifest paths.
@@ -3905,6 +4214,16 @@ class AsyncTable:
        """
        await self._inner.unset_lsm_write_spec()

+    async def close_lsm_writers(self) -> None:
+        """Drain and close any cached MemWAL shard writers for this table.
+
+        When an LSM write spec is installed, `merge_insert` opens MemWAL shard
+        writers and caches them for reuse across calls. This closes them,
+        flushing pending data; writers reopen lazily on the next
+        `merge_insert`. It is a no-op when no writers are cached.
+        """
+        await self._inner.close_lsm_writers()
+
    @property
    def name(self) -> str:
        """The name of the table."""
@@ -4355,7 +4674,7 @@ class AsyncTable:
        ...      .when_not_matched_insert_all() \\
        ...      .execute(new_data)
        >>> res
-        MergeResult(version=2, num_updated_rows=2, num_inserted_rows=1, num_deleted_rows=0, num_attempts=1)
+        MergeResult(version=2, num_updated_rows=2, num_inserted_rows=1, num_deleted_rows=0, num_attempts=1, num_rows=3)
        >>> # The order of new rows is non-deterministic since we use
        >>> # a hash-join as part of this operation and so we sort here
        >>> table.to_arrow().sort_by("a").to_pandas()
@@ -4735,6 +5054,8 @@ class AsyncTable:
                when_not_matched_by_source_condition=merge._when_not_matched_by_source_condition,
                timeout=merge._timeout,
                use_index=merge._use_index,
+                use_lsm_write=merge._use_lsm_write,
+                validate_single_shard=merge._validate_single_shard,
            ),
        )

--- a/python/python/tests/docs/test_merge_insert.py
+++ b/python/python/tests/docs/test_merge_insert.py
@@ -57,7 +57,7 @@ async def test_upsert_async(mem_db_async):
    await table.count_rows()  # 3
    res
    # MergeResult(version=2, num_updated_rows=1,
-    # num_inserted_rows=1, num_deleted_rows=0)
+    # num_inserted_rows=1, num_deleted_rows=0, num_rows=2)
    # --8<-- [end:upsert_basic_async]
    assert await table.count_rows() == 3
    assert res.version == 2
@@ -86,7 +86,7 @@ def test_insert_if_not_exists(mem_db):
    table.count_rows()  # 3
    res
    # MergeResult(version=2, num_updated_rows=0,
-    # num_inserted_rows=1, num_deleted_rows=0)
+    # num_inserted_rows=1, num_deleted_rows=0, num_rows=1)
    # --8<-- [end:insert_if_not_exists]
    assert table.count_rows() == 3
    assert res.version == 2
@@ -116,7 +116,7 @@ async def test_insert_if_not_exists_async(mem_db_async):
    await table.count_rows()  # 3
    res
    # MergeResult(version=2, num_updated_rows=0,
-    # num_inserted_rows=1, num_deleted_rows=0)
+    # num_inserted_rows=1, num_deleted_rows=0, num_rows=1)
    # --8<-- [end:insert_if_not_exists]
    assert await table.count_rows() == 3
    assert res.version == 2
@@ -150,7 +150,7 @@ def test_replace_range(mem_db):
    table.count_rows("doc_id = 1")  # 1
    res
    # MergeResult(version=2, num_updated_rows=1,
-    # num_inserted_rows=0, num_deleted_rows=1)
+    # num_inserted_rows=0, num_deleted_rows=1, num_rows=1)
    # --8<-- [end:insert_if_not_exists]
    assert table.count_rows("doc_id = 1") == 1
    assert res.version == 2
@@ -185,7 +185,7 @@ async def test_replace_range_async(mem_db_async):
    await table.count_rows("doc_id = 1")  # 1
    res
    # MergeResult(version=2, num_updated_rows=1,
-    # num_inserted_rows=0, num_deleted_rows=1)
+    # num_inserted_rows=0, num_deleted_rows=1, num_rows=1)
    # --8<-- [end:insert_if_not_exists]
    assert await table.count_rows("doc_id = 1") == 1
    assert res.version == 2
--- a/python/python/tests/test_fts.py
+++ b/python/python/tests/test_fts.py
@@ -215,11 +215,12 @@ def test_reject_legacy_tantivy_index(table):

@pytest.mark.parametrize("with_position", [True, False])
 def test_create_inverted_index(table, with_position):
-    table.create_fts_index(
-        "text",
-        with_position=with_position,
-        name="custom_fts_index",
-    )
+    with pytest.warns(DeprecationWarning, match="create_fts_index"):
+        table.create_fts_index(
+            "text",
+            with_position=with_position,
+            name="custom_fts_index",
+        )
    indices = table.list_indices()
    fts_indices = [i for i in indices if i.index_type == "FTS"]
    assert any(i.name == "custom_fts_index" for i in fts_indices)
--- a/python/python/tests/test_index.py
+++ b/python/python/tests/test_index.py
@@ -162,12 +162,13 @@ async def test_create_bitmap_index(some_table: AsyncTable):
    await some_table.create_index("data", config=Bitmap())
    indices = await some_table.list_indices()
    assert len(indices) == 3
+    # list_indices returns indices in alphabetical order by name
    assert indices[0].index_type == "Bitmap"
-    assert indices[0].columns == ["id"]
+    assert indices[0].columns == ["data"]
    assert indices[1].index_type == "Bitmap"
-    assert indices[1].columns == ["is_active"]
+    assert indices[1].columns == ["id"]
    assert indices[2].index_type == "Bitmap"
-    assert indices[2].columns == ["data"]
+    assert indices[2].columns == ["is_active"]

    index_name = indices[0].name
    stats = await some_table.index_stats(index_name)
--- a/python/python/tests/test_merge_insert_lsm.py
+++ b/python/python/tests/test_merge_insert_lsm.py
@@ -0,0 +1,196 @@
+# SPDX-License-Identifier: Apache-2.0
+# SPDX-FileCopyrightText: Copyright The LanceDB Authors
+
+"""Tests for the MemWAL LSM ``merge_insert`` dispatch."""
+
+from datetime import timedelta
+
+import lancedb
+import pyarrow as pa
+import pytest
+from lancedb._lancedb import LsmWriteSpec
+
+SCHEMA = pa.schema(
+    [
+        pa.field("id", pa.int64(), nullable=False),
+        pa.field("value", pa.int64(), nullable=False),
+    ]
+)
+
+REGION_SCHEMA = pa.schema(
+    [
+        pa.field("id", pa.int64(), nullable=False),
+        pa.field("region", pa.utf8(), nullable=False),
+    ]
+)
+
+
+def _reader(ids):
+    batch = pa.RecordBatch.from_arrays(
+        [
+            pa.array(ids, type=pa.int64()),
+            pa.array(list(range(len(ids))), type=pa.int64()),
+        ],
+        schema=SCHEMA,
+    )
+    return pa.RecordBatchReader.from_batches(SCHEMA, [batch])
+
+
+def _region_reader(rows):
+    batch = pa.RecordBatch.from_arrays(
+        [
+            pa.array([row[0] for row in rows], type=pa.int64()),
+            pa.array([row[1] for row in rows], type=pa.utf8()),
+        ],
+        schema=REGION_SCHEMA,
+    )
+    return pa.RecordBatchReader.from_batches(REGION_SCHEMA, [batch])
+
+
+def _bucket_table(tmp_path):
+    """A table with ``id`` as the primary key and a single-bucket LSM spec."""
+    db = lancedb.connect(tmp_path, read_consistency_interval=timedelta(seconds=0))
+    table = db.create_table("t", _reader([1, 2, 3]))
+    table.set_unenforced_primary_key("id")
+    # num_buckets = 1: every row routes to the single bucket.
+    table.set_lsm_write_spec(LsmWriteSpec.bucket("id", 1))
+    return table
+
+
+def test_lsm_merge_insert_bucket(tmp_path):
+    table = _bucket_table(tmp_path)
+    # Empty `on` defaults to the primary key.
+    result = (
+        table.merge_insert([])
+        .when_matched_update_all()
+        .when_not_matched_insert_all()
+        .execute(_reader([3, 4, 5]))
+    )
+    # LSM path: rows go to the MemWAL, so only num_rows is populated.
+    assert result.num_rows == 3
+    assert result.version == 0
+    assert result.num_inserted_rows == 0
+    assert result.num_updated_rows == 0
+
+
+def test_lsm_merge_insert_unsharded(tmp_path):
+    db = lancedb.connect(tmp_path, read_consistency_interval=timedelta(seconds=0))
+    table = db.create_table("t", _reader([1, 2, 3]))
+    table.set_unenforced_primary_key("id")
+    table.set_lsm_write_spec(LsmWriteSpec.unsharded())
+    result = (
+        table.merge_insert("id")
+        .when_matched_update_all()
+        .when_not_matched_insert_all()
+        .execute(_reader([10, 11, 12, 13]))
+    )
+    assert result.num_rows == 4
+
+
+def test_lsm_merge_insert_identity(tmp_path):
+    db = lancedb.connect(tmp_path, read_consistency_interval=timedelta(seconds=0))
+    table = db.create_table("t", _region_reader([(1, "us"), (2, "us")]))
+    table.set_unenforced_primary_key("id")
+    table.set_lsm_write_spec(LsmWriteSpec.identity("region"))
+    # All rows share one identity value, so they route to one shard.
+    result = (
+        table.merge_insert([])
+        .when_matched_update_all()
+        .when_not_matched_insert_all()
+        .execute(_region_reader([(3, "us"), (4, "us")]))
+    )
+    assert result.num_rows == 2
+
+
+def test_lsm_merge_insert_use_lsm_write_false(tmp_path):
+    table = _bucket_table(tmp_path)  # rows id = 1, 2, 3
+    # use_lsm_write(False) opts out: the standard path runs and commits.
+    result = (
+        table.merge_insert("id")
+        .when_not_matched_insert_all()
+        .use_lsm_write(False)
+        .execute(_reader([3, 4, 5]))
+    )
+    assert result.num_inserted_rows == 2
+    assert table.count_rows() == 5
+
+
+def test_lsm_merge_insert_validate_single_shard_off(tmp_path):
+    table = _bucket_table(tmp_path)
+    result = (
+        table.merge_insert([])
+        .when_matched_update_all()
+        .when_not_matched_insert_all()
+        .validate_single_shard(False)
+        .execute(_reader([6, 7, 8]))
+    )
+    assert result.num_rows == 3
+
+
+def test_lsm_merge_insert_use_lsm_write_true_requires_spec(tmp_path):
+    # A table with a primary key but no LSM write spec installed.
+    db = lancedb.connect(tmp_path, read_consistency_interval=timedelta(seconds=0))
+    table = db.create_table("t", _reader([1, 2, 3]))
+    table.set_unenforced_primary_key("id")
+    with pytest.raises(Exception, match="use_lsm_write"):
+        (
+            table.merge_insert("id")
+            .when_matched_update_all()
+            .when_not_matched_insert_all()
+            .use_lsm_write(True)
+            .execute(_reader([4]))
+        )
+
+
+def test_lsm_merge_insert_rejects_on_not_primary_key(tmp_path):
+    table = _bucket_table(tmp_path)
+    with pytest.raises(Exception, match="primary key"):
+        (
+            table.merge_insert("value")
+            .when_matched_update_all()
+            .when_not_matched_insert_all()
+            .execute(_reader([1]))
+        )
+
+
+def test_lsm_merge_insert_rejects_non_upsert(tmp_path):
+    table = _bucket_table(tmp_path)
+    # Insert-only (no when_matched_update_all) is not the upsert shape.
+    with pytest.raises(Exception, match="upsert"):
+        table.merge_insert([]).when_not_matched_insert_all().execute(_reader([4]))
+
+
+def test_lsm_close_writers(tmp_path):
+    table = _bucket_table(tmp_path)
+    (
+        table.merge_insert([])
+        .when_matched_update_all()
+        .when_not_matched_insert_all()
+        .execute(_reader([7, 8]))
+    )
+    table.close_lsm_writers()
+    # The writer reopens lazily on the next merge_insert.
+    result = (
+        table.merge_insert([])
+        .when_matched_update_all()
+        .when_not_matched_insert_all()
+        .execute(_reader([9]))
+    )
+    assert result.num_rows == 1
+
+
+@pytest.mark.asyncio
+async def test_async_lsm_merge_insert(tmp_path):
+    db = await lancedb.connect_async(
+        tmp_path, read_consistency_interval=timedelta(seconds=0)
+    )
+    table = await db.create_table("t", _reader([1, 2, 3]))
+    await table.set_unenforced_primary_key("id")
+    await table.set_lsm_write_spec(LsmWriteSpec.bucket("id", 1))
+
+    builder = (
+        table.merge_insert([]).when_matched_update_all().when_not_matched_insert_all()
+    )
+    result = await builder.execute(_reader([3, 4, 5]))
+    assert result.num_rows == 3
+    await table.close_lsm_writers()
--- a/python/python/tests/test_remote_db.py
+++ b/python/python/tests/test_remote_db.py
@@ -436,22 +436,25 @@ def test_table_create_indices():
        # This is a smoke-test.
        table = db.create_table("test", [{"id": 1}])

-        # Test create_scalar_index with custom name
-        table.create_scalar_index(
-            "id", wait_timeout=timedelta(seconds=2), name="custom_scalar_idx"
-        )
+        # Test create_scalar_index with custom name (legacy method)
+        with pytest.warns(DeprecationWarning, match="create_scalar_index"):
+            table.create_scalar_index(
+                "id", wait_timeout=timedelta(seconds=2), name="custom_scalar_idx"
+            )

-        # Test create_fts_index with custom name
-        table.create_fts_index(
-            "text", wait_timeout=timedelta(seconds=2), name="custom_fts_idx"
-        )
+        # Test create_fts_index with custom name (legacy method)
+        with pytest.warns(DeprecationWarning, match="create_fts_index"):
+            table.create_fts_index(
+                "text", wait_timeout=timedelta(seconds=2), name="custom_fts_idx"
+            )

-        # Test create_index with custom name
-        table.create_index(
-            vector_column_name="vector",
-            wait_timeout=timedelta(seconds=10),
-            name="custom_vector_idx",
-        )
+        # Test create_index with custom name (legacy form: vector_column_name kwarg)
+        with pytest.warns(DeprecationWarning, match="create_index"):
+            table.create_index(
+                vector_column_name="vector",
+                wait_timeout=timedelta(seconds=10),
+                name="custom_vector_idx",
+            )

        # Validate that the name parameter was passed correctly in requests
        assert len(received_requests) == 3
@@ -480,6 +483,98 @@ def test_table_create_indices():
        table.drop_index("custom_fts_idx")


+def test_remote_create_index_new_api():
+    received_requests = []
+
+    def handler(request):
+        if request.path == "/v1/table/test/create_index/":
+            content_len = int(request.headers.get("Content-Length", 0))
+            body = request.rfile.read(content_len) if content_len > 0 else b""
+            received_requests.append(json.loads(body) if body else {})
+            request.send_response(200)
+            request.end_headers()
+        elif request.path == "/v1/table/test/create/?mode=create":
+            request.send_response(200)
+            request.send_header("Content-Type", "application/json")
+            request.end_headers()
+            request.wfile.write(b"{}")
+        elif request.path == "/v1/table/test/describe/":
+            request.send_response(200)
+            request.send_header("Content-Type", "application/json")
+            request.end_headers()
+            request.wfile.write(
+                json.dumps(
+                    dict(
+                        version=1,
+                        schema=dict(
+                            fields=[
+                                dict(name="id", type={"type": "int64"}, nullable=False),
+                                dict(
+                                    name="category",
+                                    type={"type": "string"},
+                                    nullable=False,
+                                ),
+                                dict(
+                                    name="text", type={"type": "string"}, nullable=False
+                                ),
+                                dict(
+                                    name="vector",
+                                    type={
+                                        "type": "fixed_size_list",
+                                        "fields": [
+                                            dict(
+                                                name="item",
+                                                type={"type": "float"},
+                                                nullable=True,
+                                            )
+                                        ],
+                                        "length": 2,
+                                    },
+                                    nullable=False,
+                                ),
+                            ]
+                        ),
+                    )
+                ).encode()
+            )
+        else:
+            request.send_response(404)
+            request.end_headers()
+
+    from lancedb.index import BTree, FTS, IvfPq, IvfRq
+
+    with mock_lancedb_connection(handler) as db:
+        table = db.create_table("test", [{"id": 1}])
+
+        # New API: column-first, config= kwarg. Should NOT emit DeprecationWarning.
+        import warnings as _warnings
+
+        with _warnings.catch_warnings():
+            _warnings.simplefilter("error", DeprecationWarning)
+            table.create_index("vector", config=IvfPq(distance_type="l2"))
+            table.create_index("category", config=BTree())
+            table.create_index("text", config=FTS())
+            # IvfRq via new API
+            table.create_index("vector", config=IvfRq(distance_type="l2"))
+
+        # Legacy index_type="IVF_RQ" routes to IvfRq config under the hood.
+        with pytest.warns(DeprecationWarning, match="create_index"):
+            table.create_index(
+                vector_column_name="vector",
+                index_type="IVF_RQ",
+                num_partitions=8,
+            )
+
+        assert len(received_requests) == 5
+        assert [req["column"] for req in received_requests] == [
+            "vector",
+            "category",
+            "text",
+            "vector",
+            "vector",
+        ]
+
+
 def test_table_wait_for_index_timeout():
    def handler(request):
        index_stats = dict(
--- a/python/python/tests/test_table.py
+++ b/python/python/tests/test_table.py
@@ -4,6 +4,7 @@

 import os
 import sys
+import warnings
 from datetime import date, datetime, timedelta
 from time import sleep
 from typing import List
@@ -11,7 +12,7 @@ from unittest.mock import patch

 import lancedb
 from lancedb.dependencies import _PANDAS_AVAILABLE
-from lancedb.index import HnswFlat, HnswPq, HnswSq, IvfPq
+from lancedb.index import BTree, FTS, HnswFlat, HnswPq, HnswSq, IvfPq
 import numpy as np
 import polars as pl
 import pyarrow as pa
@@ -928,7 +929,12 @@ def test_create_index_method(mock_create_index, mem_db: DBConnection):
        num_bits=4,
    )
    mock_create_index.assert_called_with(
-        "vector", replace=True, config=expected_config, name=None, train=True
+        "vector",
+        replace=True,
+        config=expected_config,
+        wait_timeout=None,
+        name=None,
+        train=True,
    )

    # Test with target_partition_size
@@ -948,7 +954,12 @@ def test_create_index_method(mock_create_index, mem_db: DBConnection):
        target_partition_size=8192,
    )
    mock_create_index.assert_called_with(
-        "vector", replace=True, config=expected_config, name=None, train=True
+        "vector",
+        replace=True,
+        config=expected_config,
+        wait_timeout=None,
+        name=None,
+        train=True,
    )

    # target_partition_size has a default value,
@@ -967,7 +978,12 @@ def test_create_index_method(mock_create_index, mem_db: DBConnection):
        num_bits=4,
    )
    mock_create_index.assert_called_with(
-        "vector", replace=True, config=expected_config, name=None, train=True
+        "vector",
+        replace=True,
+        config=expected_config,
+        wait_timeout=None,
+        name=None,
+        train=True,
    )

    table.create_index(
@@ -978,7 +994,12 @@ def test_create_index_method(mock_create_index, mem_db: DBConnection):
    )
    expected_config = HnswPq(distance_type="dot")
    mock_create_index.assert_called_with(
-        "my_vector", replace=False, config=expected_config, name=None, train=True
+        "my_vector",
+        replace=False,
+        config=expected_config,
+        wait_timeout=None,
+        name=None,
+        train=True,
    )

    table.create_index(
@@ -993,7 +1014,12 @@ def test_create_index_method(mock_create_index, mem_db: DBConnection):
        distance_type="cosine", sample_rate=0.1, m=29, ef_construction=10
    )
    mock_create_index.assert_called_with(
-        "my_vector", replace=True, config=expected_config, name=None, train=True
+        "my_vector",
+        replace=True,
+        config=expected_config,
+        wait_timeout=None,
+        name=None,
+        train=True,
    )

    table.create_index(
@@ -1008,7 +1034,12 @@ def test_create_index_method(mock_create_index, mem_db: DBConnection):
        distance_type="cosine", sample_rate=0.1, m=29, ef_construction=10
    )
    mock_create_index.assert_called_with(
-        "my_vector", replace=True, config=expected_config, name=None, train=True
+        "my_vector",
+        replace=True,
+        config=expected_config,
+        wait_timeout=None,
+        name=None,
+        train=True,
    )


@@ -1032,6 +1063,7 @@ def test_create_index_name_and_train_parameters(
        "vector",
        replace=True,
        config=expected_config,
+        wait_timeout=None,
        name="my_custom_index",
        train=True,
    )
@@ -1039,13 +1071,82 @@ def test_create_index_name_and_train_parameters(
    # Test with train=False
    table.create_index(vector_column_name="vector", train=False)
    mock_create_index.assert_called_with(
-        "vector", replace=True, config=expected_config, name=None, train=False
+        "vector",
+        replace=True,
+        config=expected_config,
+        wait_timeout=None,
+        name=None,
+        train=False,
    )

    # Test with both name and train
    table.create_index(vector_column_name="vector", name="my_index_name", train=True)
    mock_create_index.assert_called_with(
-        "vector", replace=True, config=expected_config, name="my_index_name", train=True
+        "vector",
+        replace=True,
+        config=expected_config,
+        wait_timeout=None,
+        name="my_index_name",
+        train=True,
+    )
+
+
+@patch("lancedb.table.AsyncTable.create_index")
+def test_create_index_legacy_emits_deprecation_warning(
+    mock_create_index, mem_db: DBConnection
+):
+    table = mem_db.create_table(
+        "test",
+        data=[{"vector": [3.1, 4.1]}, {"vector": [5.9, 26.5]}],
+    )
+
+    with pytest.warns(DeprecationWarning, match="create_index"):
+        table.create_index(metric="l2", num_partitions=8, vector_column_name="vector")
+
+
+@patch("lancedb.table.AsyncTable.create_index")
+def test_create_index_new_api(mock_create_index, mem_db: DBConnection):
+    table = mem_db.create_table(
+        "test",
+        data=[
+            {"vector": [3.1, 4.1], "category": "a", "text": "hello world"},
+            {"vector": [5.9, 26.5], "category": "b", "text": "goodbye"},
+        ],
+    )
+
+    # Vector index via new API should not warn
+    with warnings.catch_warnings():
+        warnings.simplefilter("error", DeprecationWarning)
+        table.create_index("vector", config=IvfPq(distance_type="l2"))
+    mock_create_index.assert_called_with(
+        "vector",
+        replace=True,
+        config=IvfPq(distance_type="l2"),
+        wait_timeout=None,
+        name=None,
+        train=True,
+    )
+
+    # Scalar index via new API
+    table.create_index("category", config=BTree())
+    mock_create_index.assert_called_with(
+        "category",
+        replace=True,
+        config=BTree(),
+        wait_timeout=None,
+        name=None,
+        train=True,
+    )
+
+    # FTS index via new API
+    table.create_index("text", config=FTS(with_position=True))
+    mock_create_index.assert_called_with(
+        "text",
+        replace=True,
+        config=FTS(with_position=True),
+        wait_timeout=None,
+        name=None,
+        train=True,
    )


@@ -1861,8 +1962,9 @@ def test_create_scalar_index(mem_db: DBConnection):
        "my_table",
        data=test_data,
    )
-    # Test with default name
-    table.create_scalar_index("x")
+    # Test with default name; confirm DeprecationWarning fires
+    with pytest.warns(DeprecationWarning, match="create_scalar_index"):
+        table.create_scalar_index("x")
    indices = table.list_indices()
    assert len(indices) == 1
    scalar_index = indices[0]
--- a/python/src/table.rs
+++ b/python/src/table.rs
@@ -143,18 +143,20 @@ pub struct MergeResult {
    pub num_inserted_rows: u64,
    pub num_deleted_rows: u64,
    pub num_attempts: u32,
+    pub num_rows: u64,
 }

 #[pymethods]
 impl MergeResult {
    pub fn __repr__(&self) -> String {
        format!(
-            "MergeResult(version={}, num_updated_rows={}, num_inserted_rows={}, num_deleted_rows={}, num_attempts={})",
+            "MergeResult(version={}, num_updated_rows={}, num_inserted_rows={}, num_deleted_rows={}, num_attempts={}, num_rows={})",
            self.version,
            self.num_updated_rows,
            self.num_inserted_rows,
            self.num_deleted_rows,
-            self.num_attempts
+            self.num_attempts,
+            self.num_rows
        )
    }
 }
@@ -167,6 +169,7 @@ impl From<lancedb::table::MergeResult> for MergeResult {
            num_inserted_rows: result.num_inserted_rows,
            num_deleted_rows: result.num_deleted_rows,
            num_attempts: result.num_attempts,
+            num_rows: result.num_rows,
        }
    }
 }
@@ -194,6 +197,12 @@ impl LsmWriteSpec {
    }

    /// Identity sharding — shard by the raw value of `column`.
+    ///
+    /// `column` must be a deterministic function of the unenforced primary
+    /// key: every row with a given primary key must always produce the same
+    /// `column` value, or upserts of that key can land in different shards
+    /// and a stale version can win. Typically `column` is the primary key
+    /// itself or a stable attribute of it.
    #[staticmethod]
    pub fn identity(column: String) -> Self {
        Self {
@@ -933,6 +942,12 @@ impl Table {
        if let Some(use_index) = parameters.use_index {
            builder.use_index(use_index);
        }
+        if let Some(use_lsm_write) = parameters.use_lsm_write {
+            builder.use_lsm_write(use_lsm_write);
+        }
+        if let Some(validate_single_shard) = parameters.validate_single_shard {
+            builder.validate_single_shard(validate_single_shard);
+        }

        future_into_py(self_.py(), async move {
            let res = builder.execute(Box::new(batches)).await.infer_error()?;
@@ -971,6 +986,13 @@ impl Table {
        })
    }

+    pub fn close_lsm_writers(self_: PyRef<'_, Self>) -> PyResult<Bound<'_, PyAny>> {
+        let inner = self_.inner_ref()?.clone();
+        future_into_py(self_.py(), async move {
+            inner.close_lsm_writers().await.infer_error()
+        })
+    }
+
    pub fn uses_v2_manifest_paths(self_: PyRef<'_, Self>) -> PyResult<Bound<'_, PyAny>> {
        let inner = self_.inner_ref()?.clone();
        future_into_py(self_.py(), async move {
@@ -1124,6 +1146,8 @@ pub struct MergeInsertParams {
    when_not_matched_by_source_condition: Option<String>,
    timeout: Option<std::time::Duration>,
    use_index: Option<bool>,
+    use_lsm_write: Option<bool>,
+    validate_single_shard: Option<bool>,
 }

 #[pyclass]
--- a/rust-toolchain.toml
+++ b/rust-toolchain.toml
@@ -1,2 +1,2 @@
 [toolchain]
-channel = "1.94.0"
+channel = "1.95.0"
--- a/rust/lancedb/Cargo.toml
+++ b/rust/lancedb/Cargo.toml
@@ -75,7 +75,7 @@ reqwest = { version = "0.12.0", default-features = false, features = [
    "stream",
 ], optional = true }
 http = { version = "1", optional = true } # Matching what is in reqwest
-uuid = { version = "1.7.0", features = ["v4"] }
+uuid = { version = "1.7.0", features = ["v4", "v5"] }
 polars-arrow = { version = ">=0.37,<0.40.0", optional = true }
 polars = { version = ">=0.37,<0.40.0", optional = true }
 hf-hub = { version = "0.4.1", optional = true, default-features = false, features = [
--- a/rust/lancedb/src/dataloader/permutation/shuffle.rs
+++ b/rust/lancedb/src/dataloader/permutation/shuffle.rs
@@ -464,11 +464,9 @@ mod tests {
        let mut iter = ids.into_iter().map(|o| o.unwrap());
        while let Some(first) = iter.next() {
            let rows_left_in_clump = if first == 4470 { 19 } else { 29 };
-            let mut expected_next = first + 1;
-            for _ in 0..rows_left_in_clump {
+            for expected_next in (first + 1)..=(first + rows_left_in_clump) {
                let next = iter.next().unwrap();
                assert_eq!(next, expected_next);
-                expected_next += 1;
            }
        }
    }
--- a/rust/lancedb/src/remote/client.rs
+++ b/rust/lancedb/src/remote/client.rs
@@ -908,6 +908,15 @@ mod tests {
    use serial_test::serial;
    use std::time::Duration;

+    // Serializes the env-var-mutating tests below: cargo test runs tests in
+    // parallel, but several of these tests read and write the same process-
+    // global env vars (`LANCEDB_USER_ID*`), so they would race without this.
+    static ENV_MUTEX: std::sync::Mutex<()> = std::sync::Mutex::new(());
+
+    fn lock_env() -> std::sync::MutexGuard<'static, ()> {
+        ENV_MUTEX.lock().unwrap_or_else(|e| e.into_inner())
+    }
+
    #[test]
    fn test_timeout_config_default() {
        let config = TimeoutConfig::default();
@@ -1166,6 +1175,7 @@ mod tests {
    #[test]
    #[serial(user_id_env)]
    fn test_resolve_user_id_none() {
+        let _guard = lock_env();
        let config = ClientConfig::default();
        // Clear env vars that might be set from other tests
        // SAFETY: This is only called in tests
@@ -1179,6 +1189,7 @@ mod tests {
    #[test]
    #[serial(user_id_env)]
    fn test_resolve_user_id_from_env() {
+        let _guard = lock_env();
        // SAFETY: This is only called in tests
        unsafe {
            std::env::set_var("LANCEDB_USER_ID", "env-user-id");
@@ -1194,6 +1205,7 @@ mod tests {
    #[test]
    #[serial(user_id_env)]
    fn test_resolve_user_id_from_env_key() {
+        let _guard = lock_env();
        // SAFETY: This is only called in tests
        unsafe {
            std::env::remove_var("LANCEDB_USER_ID");
@@ -1215,6 +1227,7 @@ mod tests {
    #[test]
    #[serial(user_id_env)]
    fn test_resolve_user_id_direct_takes_precedence() {
+        let _guard = lock_env();
        // SAFETY: This is only called in tests
        unsafe {
            std::env::set_var("LANCEDB_USER_ID", "env-user-id");
@@ -1233,6 +1246,7 @@ mod tests {
    #[test]
    #[serial(user_id_env)]
    fn test_resolve_user_id_empty_env_ignored() {
+        let _guard = lock_env();
        // SAFETY: This is only called in tests
        unsafe {
            std::env::set_var("LANCEDB_USER_ID", "");
--- a/rust/lancedb/src/remote/table.rs
+++ b/rust/lancedb/src/remote/table.rs
@@ -1805,6 +1805,7 @@ impl<S: HttpSend> BaseTable for RemoteTable<S> {
                num_inserted_rows: 0,
                num_updated_rows: 0,
                num_attempts: 0,
+                num_rows: 0,
            });
        }

--- a/rust/lancedb/src/table.rs
+++ b/rust/lancedb/src/table.rs
@@ -89,7 +89,6 @@ use futures::future::join_all;
 pub use lance::dataset::refs::{TagContents, Tags as LanceTags};
 pub use lance::dataset::scanner::DatasetRecordBatchStream;
 use lance::dataset::statistics::DatasetStatisticsExt;
-use lance_index::frag_reuse::FRAG_REUSE_INDEX_NAME;
 pub use lance_index::optimize::OptimizeOptions;
 pub use optimize::{CompactionOptions, OptimizeAction, OptimizeStats};
 pub use schema_evolution::{AddColumnsResult, AlterColumnsResult, DropColumnsResult};
@@ -367,6 +366,14 @@ impl LsmWriteSpec {

    /// Construct an identity-sharding spec (shard by the raw value of
    /// `column`) with no maintained indexes.
+    ///
+    /// `column` must be a deterministic function of the unenforced primary
+    /// key: every row with a given primary key must always produce the same
+    /// `column` value. MemWAL dedups upserts by primary key but tracks
+    /// generations per shard, so if the same key is written with two
+    /// different `column` values its versions land in different shards and a
+    /// stale value can win. Typically `column` is the primary key itself, or
+    /// a stable attribute of it (e.g. a tenant id).
    pub fn identity(column: impl Into<String>) -> Self {
        Self::Identity {
            column: column.into(),
@@ -581,6 +588,13 @@ pub trait BaseTable: std::fmt::Display + std::fmt::Debug + Send + Sync {
            message: "unset_lsm_write_spec is not supported on this table type".into(),
        })
    }
+    /// Drain and close any cached MemWAL shard writers for this table.
+    ///
+    /// The default implementation is a no-op; table types that maintain
+    /// MemWAL shard writers override it.
+    async fn close_lsm_writers(&self) -> Result<()> {
+        Ok(())
+    }
    /// Gets the table tag manager.
    async fn tags(&self) -> Result<Box<dyn Tags + '_>>;
    /// Optimize the dataset.
@@ -1387,6 +1401,16 @@ impl Table {
        self.inner.unset_lsm_write_spec().await
    }

+    /// Drain and close any cached MemWAL shard writers held for this table.
+    ///
+    /// When an [`LsmWriteSpec`] is installed, `merge_insert` opens MemWAL shard
+    /// writers and caches them for reuse across calls. This closes them,
+    /// flushing pending data; writers reopen lazily on the next `merge_insert`.
+    /// It is a no-op when no writers are cached.
+    pub async fn close_lsm_writers(&self) -> Result<()> {
+        self.inner.close_lsm_writers().await
+    }
+
    /// Retrieve the version of the table
    ///
    /// LanceDb supports versioning.  Every operation that modifies the table increases
@@ -2830,6 +2854,10 @@ impl BaseTable for NativeTable {
        merge::lsm::unset_lsm_write_spec(self).await
    }

+    async fn close_lsm_writers(&self) -> Result<()> {
+        merge::lsm::close_lsm_writers(self).await
+    }
+
    /// Delete rows from the table
    async fn delete(&self, predicate: Predicate<'_>) -> Result<DeleteResult> {
        delete::execute_delete(self, predicate).await
@@ -2864,71 +2892,32 @@ impl BaseTable for NativeTable {

    async fn list_indices(&self) -> Result<Vec<IndexConfig>> {
        let dataset = self.dataset.get().await?;
-        let indices = dataset.load_indices().await?;
-        let results = futures::stream::iter(indices.as_slice())
-            .then(|idx| async {
-                // skip Lance internal indexes
-                if idx.name == FRAG_REUSE_INDEX_NAME {
-                    return None;
-                }
-
-                let stats = match dataset.index_statistics(idx.name.as_str()).await {
-                    Ok(stats) => stats,
-                    Err(e) => {
-                        log::warn!(
-                            "Failed to get statistics for index {} ({}): {}",
-                            idx.name,
-                            idx.uuid,
-                            e
-                        );
-                        return None;
-                    }
-                };
-
-                let stats: serde_json::Value = match serde_json::from_str(&stats) {
-                    Ok(stats) => stats,
-                    Err(e) => {
-                        log::warn!(
-                            "Failed to deserialize index statistics for index {} ({}): {}",
-                            idx.name,
-                            idx.uuid,
-                            e
-                        );
-                        return None;
-                    }
-                };
-
-                let Some(index_type) = stats.get("index_type").and_then(|v| v.as_str()) else {
-                    log::warn!(
-                        "Index statistics was missing 'index_type' field for index {} ({})",
-                        idx.name,
-                        idx.uuid
-                    );
-                    return None;
-                };
-
-                let index_type: crate::index::IndexType = match index_type.parse() {
+        let indices = dataset
+            .describe_indices(None)
+            .await?
+            .into_iter()
+            .filter_map(|idx_desc| {
+                let index_type: crate::index::IndexType = match idx_desc.index_type().parse() {
                    Ok(index_type) => index_type,
                    Err(e) => {
                        log::warn!(
-                            "Failed to parse index type for index {} ({}): {}",
-                            idx.name,
-                            idx.uuid,
+                            "Failed to parse index type for index {}: {}",
+                            idx_desc.name(),
                            e
                        );
                        return None;
                    }
                };

-                let mut columns = Vec::with_capacity(idx.fields.len());
-                for field_id in &idx.fields {
-                    let field_path = match dataset.schema().field_path(*field_id) {
+                let field_ids = idx_desc.field_ids();
+                let mut columns = Vec::with_capacity(field_ids.len());
+                for field_id in field_ids {
+                    let field_path = match dataset.schema().field_path(*field_id as i32) {
                        Ok(field_path) => field_path,
                        Err(e) => {
                            log::warn!(
-                                "Failed to resolve field path for index {} ({}) field id {}: {}",
-                                idx.name,
-                                idx.uuid,
+                                "Failed to resolve field path for index {} field id {}: {}",
+                                idx_desc.name(),
                                field_id,
                                e
                            );
@@ -2938,17 +2927,14 @@ impl BaseTable for NativeTable {
                    columns.push(field_path);
                }

-                let name = idx.name.clone();
                Some(IndexConfig {
+                    name: idx_desc.name().to_string(),
                    index_type,
                    columns,
-                    name,
                })
            })
-            .collect::<Vec<_>>()
-            .await;
-
-        Ok(results.into_iter().flatten().collect())
+            .collect();
+        Ok(indices)
    }

    async fn uri(&self) -> Result<String> {
@@ -3058,11 +3044,12 @@ impl BaseTable for NativeTable {
        let p99 = *sorted_sizes.get(num_fragments * 99 / 100).unwrap_or(&0);
        let min = sorted_sizes.first().copied().unwrap_or(0);
        let max = sorted_sizes.last().copied().unwrap_or(0);
-        let mean = if num_fragments == 0 {
-            0
-        } else {
-            sorted_sizes.iter().copied().sum::<usize>() / num_fragments
-        };
+        let mean = sorted_sizes
+            .iter()
+            .copied()
+            .sum::<usize>()
+            .checked_div(num_fragments)
+            .unwrap_or(0);

        let frag_stats = FragmentStatistics {
            num_fragments,
@@ -4062,26 +4049,27 @@ mod tests {
        let index_configs = table.list_indices().await.unwrap();
        assert_eq!(index_configs.len(), 5);

+        // list_indices returns indices in alphabetical order by name
        let mut configs_iter = index_configs.into_iter();
        let index = configs_iter.next().unwrap();
        assert_eq!(index.index_type, crate::index::IndexType::Bitmap);
        assert_eq!(index.columns, vec!["category".to_string()]);

-        let index = configs_iter.next().unwrap();
-        assert_eq!(index.index_type, crate::index::IndexType::Bitmap);
-        assert_eq!(index.columns, vec!["is_active".to_string()]);
-
        let index = configs_iter.next().unwrap();
        assert_eq!(index.index_type, crate::index::IndexType::Bitmap);
        assert_eq!(index.columns, vec!["data".to_string()]);

        let index = configs_iter.next().unwrap();
        assert_eq!(index.index_type, crate::index::IndexType::Bitmap);
-        assert_eq!(index.columns, vec!["large_data".to_string()]);
+        assert_eq!(index.columns, vec!["is_active".to_string()]);

        let index = configs_iter.next().unwrap();
        assert_eq!(index.index_type, crate::index::IndexType::Bitmap);
        assert_eq!(index.columns, vec!["large_category".to_string()]);
+
+        let index = configs_iter.next().unwrap();
+        assert_eq!(index.index_type, crate::index::IndexType::Bitmap);
+        assert_eq!(index.columns, vec!["large_data".to_string()]);
    }

    #[tokio::test]
--- a/rust/lancedb/src/table/datafusion/udtf/fts.rs
+++ b/rust/lancedb/src/table/datafusion/udtf/fts.rs
@@ -870,8 +870,10 @@ mod tests {
            .await
            .unwrap();

-        // Should return empty or nearly empty result
-        assert!(result[0].num_rows() <= 1);
+        assert_eq!(
+            result.iter().map(|batch| batch.num_rows()).sum::<usize>(),
+            0
+        );
    }

    #[tokio::test]
--- a/rust/lancedb/src/table/dataset.rs
+++ b/rust/lancedb/src/table/dataset.rs
@@ -8,6 +8,7 @@ use std::{

 use lance::{Dataset, dataset::refs};

+use crate::table::merge::lsm::ShardWriterCache;
 use crate::{Error, error::Result, utils::background_cache::BackgroundCache};

 /// A wrapper around a [Dataset] that provides consistency checks.
@@ -18,6 +19,10 @@ use crate::{Error, error::Result, utils::background_cache::BackgroundCache};
 pub struct DatasetConsistencyWrapper {
    state: Arc<Mutex<DatasetState>>,
    consistency: ConsistencyMode,
+    /// The single MemWAL `ShardWriter` for this dataset, co-located so it is
+    /// cached for the session and shares the dataset's lifecycle. A dataset
+    /// writes to one shard at a time. Shared by `Arc` across clones.
+    shard_writer: Arc<ShardWriterCache>,
 }

 /// The current dataset and whether it is pinned to a specific version.
@@ -67,9 +72,15 @@ impl DatasetConsistencyWrapper {
                pinned_version: None,
            })),
            consistency,
+            shard_writer: Arc::new(ShardWriterCache::default()),
        }
    }

+    /// The MemWAL `ShardWriter` cache co-located with this dataset.
+    pub(crate) fn shard_writer(&self) -> &Arc<ShardWriterCache> {
+        &self.shard_writer
+    }
+
    /// Get the current dataset.
    ///
    /// Behavior depends on the consistency mode:
--- a/rust/lancedb/src/table/merge.rs
+++ b/rust/lancedb/src/table/merge.rs
@@ -41,6 +41,16 @@ pub struct MergeResult {
    /// A value of 1 means the operation succeeded on the first try.
    #[serde(default)]
    pub num_attempts: u32,
+    /// Total number of rows written.
+    ///
+    /// On the standard `merge_insert` path this equals
+    /// `num_inserted_rows + num_updated_rows`. On the MemWAL LSM write path the
+    /// insert/update breakdown is not known until compaction; in that mode
+    /// `num_inserted_rows`, `num_updated_rows`, `num_deleted_rows`, `version`
+    /// and `num_attempts` are all `0` and this field holds the total number of
+    /// rows written through the shard writer.
+    #[serde(default)]
+    pub num_rows: u64,
 }

 /// A builder used to create and run a merge insert operation
@@ -57,6 +67,8 @@ pub struct MergeInsertBuilder {
    pub(crate) when_not_matched_by_source_delete_filt: Option<String>,
    pub(crate) timeout: Option<Duration>,
    pub(crate) use_index: bool,
+    pub(crate) use_lsm_write: Option<bool>,
+    pub(crate) validate_single_shard: bool,
 }

 impl MergeInsertBuilder {
@@ -71,6 +83,8 @@ impl MergeInsertBuilder {
            when_not_matched_by_source_delete_filt: None,
            timeout: None,
            use_index: true,
+            use_lsm_write: None,
+            validate_single_shard: true,
        }
    }

@@ -150,6 +164,34 @@ impl MergeInsertBuilder {
        self
    }

+    /// Controls whether `merge_insert` uses the MemWAL LSM write path.
+    ///
+    /// By default (unset), a `merge_insert` on a table with an
+    /// [`LsmWriteSpec`](super::LsmWriteSpec) installed is routed through
+    /// Lance's MemWAL shard writer, and a table without one uses the standard
+    /// path. Calling this with `false` forces the standard path even when a
+    /// spec is set. Calling it with `true` requires a spec — `merge_insert`
+    /// errors if none is installed.
+    pub fn use_lsm_write(&mut self, use_lsm_write: bool) -> &mut Self {
+        self.use_lsm_write = Some(use_lsm_write);
+        self
+    }
+
+    /// Controls how an LSM `merge_insert` checks that its input targets a
+    /// single shard.
+    ///
+    /// When a table has an LSM write spec, every row in a `merge_insert` call
+    /// must route to the same shard. When `true` (the default), every row is
+    /// inspected to verify this. When `false`, only the first row is inspected
+    /// and the shard it routes to is used for the whole input — a faster path
+    /// for callers that have already pre-sharded their input.
+    ///
+    /// Has no effect on tables without an LSM write spec.
+    pub fn validate_single_shard(&mut self, validate_single_shard: bool) -> &mut Self {
+        self.validate_single_shard = validate_single_shard;
+        self
+    }
+
    /// Executes the merge insert operation
    ///
    /// Returns version and statistics about the merge operation including the number of rows
@@ -167,6 +209,23 @@ pub(crate) async fn execute_merge_insert(
    params: MergeInsertBuilder,
    new_data: Box<dyn RecordBatchReader + Send>,
 ) -> Result<MergeResult> {
+    match lsm::lsm_dispatch_decision(table, &params).await? {
+        lsm::LsmDispatch::Lsm(plan) => {
+            let future =
+                lsm::execute_lsm_merge_insert(table, plan, params.validate_single_shard, new_data);
+            return match params.timeout {
+                Some(timeout) => match tokio::time::timeout(timeout, future).await {
+                    Ok(result) => result,
+                    Err(_) => Err(Error::Runtime {
+                        message: "merge insert timed out".to_string(),
+                    }),
+                },
+                None => future.await,
+            };
+        }
+        lsm::LsmDispatch::Standard => {}
+    }
+
    let dataset = table.dataset.get().await?;
    let mut builder = LanceMergeInsertBuilder::try_new(dataset.clone(), params.on)?;
    match (
@@ -219,6 +278,7 @@ pub(crate) async fn execute_merge_insert(
        num_inserted_rows: stats.num_inserted_rows,
        num_deleted_rows: stats.num_deleted_rows,
        num_attempts: stats.num_attempts,
+        num_rows: stats.num_inserted_rows + stats.num_updated_rows,
    })
 }

@@ -327,3 +387,366 @@ mod tests {
        assert_eq!(table.count_rows(None).await.unwrap(), 25);
    }
 }
+
+#[cfg(test)]
+mod lsm_tests {
+    use std::sync::Arc;
+
+    use arrow_array::{
+        Int64Array, RecordBatch, RecordBatchIterator, RecordBatchReader, StringArray,
+    };
+    use arrow_schema::{DataType, Field, Schema};
+    use tempfile::{TempDir, tempdir};
+
+    use crate::connect;
+    use crate::error::Error;
+    use crate::table::{LsmWriteSpec, Table};
+
+    /// A reader of `[id: Int64, value: Int64]` rows; `value` is `0..n`.
+    fn id_value_reader(ids: Vec<i64>) -> Box<dyn RecordBatchReader + Send> {
+        let schema = Arc::new(Schema::new(vec![
+            Field::new("id", DataType::Int64, false),
+            Field::new("value", DataType::Int64, false),
+        ]));
+        let n = ids.len() as i64;
+        let batch = RecordBatch::try_new(
+            schema.clone(),
+            vec![
+                Arc::new(Int64Array::from(ids)),
+                Arc::new(Int64Array::from_iter_values(0..n)),
+            ],
+        )
+        .unwrap();
+        Box::new(RecordBatchIterator::new(vec![Ok(batch)], schema))
+    }
+
+    /// A reader of `[id: Int64, region: Utf8]` rows.
+    fn id_region_reader(rows: Vec<(i64, &str)>) -> Box<dyn RecordBatchReader + Send> {
+        let schema = Arc::new(Schema::new(vec![
+            Field::new("id", DataType::Int64, false),
+            Field::new("region", DataType::Utf8, false),
+        ]));
+        let ids: Vec<i64> = rows.iter().map(|(id, _)| *id).collect();
+        let regions: Vec<&str> = rows.iter().map(|(_, region)| *region).collect();
+        let batch = RecordBatch::try_new(
+            schema.clone(),
+            vec![
+                Arc::new(Int64Array::from(ids)),
+                Arc::new(StringArray::from(regions)),
+            ],
+        )
+        .unwrap();
+        Box::new(RecordBatchIterator::new(vec![Ok(batch)], schema))
+    }
+
+    /// A multi-batch reader of `[id: Int64, region: Utf8]` rows.
+    fn id_region_multi_reader(batches: Vec<Vec<(i64, &str)>>) -> Box<dyn RecordBatchReader + Send> {
+        let schema = Arc::new(Schema::new(vec![
+            Field::new("id", DataType::Int64, false),
+            Field::new("region", DataType::Utf8, false),
+        ]));
+        let records: Vec<_> = batches
+            .into_iter()
+            .map(|rows| {
+                let ids: Vec<i64> = rows.iter().map(|(id, _)| *id).collect();
+                let regions: Vec<&str> = rows.iter().map(|(_, region)| *region).collect();
+                Ok(RecordBatch::try_new(
+                    schema.clone(),
+                    vec![
+                        Arc::new(Int64Array::from(ids)),
+                        Arc::new(StringArray::from(regions)),
+                    ],
+                )
+                .unwrap())
+            })
+            .collect();
+        Box::new(RecordBatchIterator::new(records, schema))
+    }
+
+    /// Create an `[id, value]` table with `id` as the unenforced primary key.
+    async fn id_value_table(dir: &TempDir) -> Table {
+        let conn = connect(dir.path().to_str().unwrap())
+            .execute()
+            .await
+            .unwrap();
+        let table = conn
+            .create_table("t", id_value_reader(vec![1, 2, 3]))
+            .execute()
+            .await
+            .unwrap();
+        table.set_unenforced_primary_key(["id"]).await.unwrap();
+        table
+    }
+
+    #[tokio::test]
+    async fn lsm_merge_insert_bucket() {
+        let dir = tempdir().unwrap();
+        let table = id_value_table(&dir).await;
+        // num_buckets = 1: every row routes to the single bucket.
+        table
+            .set_lsm_write_spec(LsmWriteSpec::bucket("id", 1))
+            .await
+            .unwrap();
+
+        // Empty `on` defaults to the primary key.
+        let mut builder = table.merge_insert(&[]);
+        builder
+            .when_matched_update_all(None)
+            .when_not_matched_insert_all();
+        let result = builder
+            .execute(id_value_reader(vec![3, 4, 5]))
+            .await
+            .unwrap();
+
+        // LSM path: rows go to the MemWAL, the breakdown is unknown until
+        // compaction, so only `num_rows` is populated.
+        assert_eq!(result.num_rows, 3);
+        assert_eq!(result.version, 0);
+        assert_eq!(result.num_inserted_rows, 0);
+        assert_eq!(result.num_updated_rows, 0);
+    }
+
+    #[tokio::test]
+    async fn lsm_merge_insert_unsharded() {
+        let dir = tempdir().unwrap();
+        let table = id_value_table(&dir).await;
+        table
+            .set_lsm_write_spec(LsmWriteSpec::unsharded())
+            .await
+            .unwrap();
+
+        let mut builder = table.merge_insert(&["id"]);
+        builder
+            .when_matched_update_all(None)
+            .when_not_matched_insert_all();
+        let result = builder
+            .execute(id_value_reader(vec![10, 11, 12, 13]))
+            .await
+            .unwrap();
+        assert_eq!(result.num_rows, 4);
+    }
+
+    #[tokio::test]
+    async fn lsm_merge_insert_identity() {
+        let dir = tempdir().unwrap();
+        let conn = connect(dir.path().to_str().unwrap())
+            .execute()
+            .await
+            .unwrap();
+        let table = conn
+            .create_table("t", id_region_reader(vec![(1, "us"), (2, "us")]))
+            .execute()
+            .await
+            .unwrap();
+        table.set_unenforced_primary_key(["id"]).await.unwrap();
+        table
+            .set_lsm_write_spec(LsmWriteSpec::identity("region"))
+            .await
+            .unwrap();
+
+        // All rows share one identity value, so they route to one shard.
+        let mut builder = table.merge_insert(&[]);
+        builder
+            .when_matched_update_all(None)
+            .when_not_matched_insert_all();
+        let result = builder
+            .execute(id_region_reader(vec![(3, "us"), (4, "us")]))
+            .await
+            .unwrap();
+        assert_eq!(result.num_rows, 2);
+    }
+
+    #[tokio::test]
+    async fn lsm_merge_insert_use_lsm_write_false_falls_back() {
+        let dir = tempdir().unwrap();
+        let table = id_value_table(&dir).await;
+        table
+            .set_lsm_write_spec(LsmWriteSpec::bucket("id", 1))
+            .await
+            .unwrap();
+
+        // use_lsm_write(false) opts out: the standard path runs and commits.
+        let mut builder = table.merge_insert(&["id"]);
+        builder.when_not_matched_insert_all().use_lsm_write(false);
+        let result = builder
+            .execute(id_value_reader(vec![3, 4, 5]))
+            .await
+            .unwrap();
+
+        assert_eq!(result.num_inserted_rows, 2);
+        assert_eq!(table.count_rows(None).await.unwrap(), 5);
+    }
+
+    #[tokio::test]
+    async fn lsm_merge_insert_rejects_on_not_primary_key() {
+        let dir = tempdir().unwrap();
+        let table = id_value_table(&dir).await;
+        table
+            .set_lsm_write_spec(LsmWriteSpec::bucket("id", 1))
+            .await
+            .unwrap();
+
+        let mut builder = table.merge_insert(&["value"]);
+        builder
+            .when_matched_update_all(None)
+            .when_not_matched_insert_all();
+        let err = builder.execute(id_value_reader(vec![1])).await.unwrap_err();
+        assert!(matches!(err, Error::InvalidInput { .. }), "got {err:?}");
+    }
+
+    #[tokio::test]
+    async fn lsm_merge_insert_rejects_non_upsert() {
+        let dir = tempdir().unwrap();
+        let table = id_value_table(&dir).await;
+        table
+            .set_lsm_write_spec(LsmWriteSpec::bucket("id", 1))
+            .await
+            .unwrap();
+
+        // Insert-only (no when_matched_update_all) is not the upsert shape.
+        let mut builder = table.merge_insert(&[]);
+        builder.when_not_matched_insert_all();
+        let err = builder.execute(id_value_reader(vec![4])).await.unwrap_err();
+        assert!(matches!(err, Error::InvalidInput { .. }), "got {err:?}");
+    }
+
+    #[tokio::test]
+    async fn lsm_close_writers_then_reopen() {
+        let dir = tempdir().unwrap();
+        let table = id_value_table(&dir).await;
+        table
+            .set_lsm_write_spec(LsmWriteSpec::bucket("id", 1))
+            .await
+            .unwrap();
+
+        let mut builder = table.merge_insert(&[]);
+        builder
+            .when_matched_update_all(None)
+            .when_not_matched_insert_all();
+        builder.execute(id_value_reader(vec![7, 8])).await.unwrap();
+
+        table.close_lsm_writers().await.unwrap();
+
+        // The writer reopens lazily on the next merge_insert.
+        let mut builder = table.merge_insert(&[]);
+        builder
+            .when_matched_update_all(None)
+            .when_not_matched_insert_all();
+        let result = builder.execute(id_value_reader(vec![9])).await.unwrap();
+        assert_eq!(result.num_rows, 1);
+    }
+
+    #[tokio::test]
+    async fn lsm_merge_insert_multi_batch() {
+        let dir = tempdir().unwrap();
+        let conn = connect(dir.path().to_str().unwrap())
+            .execute()
+            .await
+            .unwrap();
+        let table = conn
+            .create_table("t", id_region_reader(vec![(1, "us")]))
+            .execute()
+            .await
+            .unwrap();
+        table.set_unenforced_primary_key(["id"]).await.unwrap();
+        table
+            .set_lsm_write_spec(LsmWriteSpec::identity("region"))
+            .await
+            .unwrap();
+
+        // Multiple batches that all route to one shard are written together.
+        let mut builder = table.merge_insert(&[]);
+        builder
+            .when_matched_update_all(None)
+            .when_not_matched_insert_all();
+        let result = builder
+            .execute(id_region_multi_reader(vec![
+                vec![(2, "us"), (3, "us")],
+                vec![(4, "us")],
+            ]))
+            .await
+            .unwrap();
+        assert_eq!(result.num_rows, 3);
+
+        // Batches that route to different shards are rejected; the validation
+        // runs before any write, so no partial write is left behind.
+        let mut builder = table.merge_insert(&[]);
+        builder
+            .when_matched_update_all(None)
+            .when_not_matched_insert_all();
+        let err = builder
+            .execute(id_region_multi_reader(vec![
+                vec![(5, "us")],
+                vec![(6, "eu")],
+            ]))
+            .await
+            .unwrap_err();
+        assert!(matches!(err, Error::InvalidInput { .. }), "got {err:?}");
+    }
+
+    #[tokio::test]
+    async fn lsm_merge_insert_use_lsm_write_true_requires_spec() {
+        let dir = tempdir().unwrap();
+        // id_value_table sets a primary key but no LSM write spec.
+        let table = id_value_table(&dir).await;
+
+        let mut builder = table.merge_insert(&["id"]);
+        builder
+            .when_matched_update_all(None)
+            .when_not_matched_insert_all()
+            .use_lsm_write(true);
+        let err = builder.execute(id_value_reader(vec![4])).await.unwrap_err();
+        assert!(matches!(err, Error::InvalidInput { .. }), "got {err:?}");
+    }
+
+    #[tokio::test]
+    async fn lsm_merge_insert_rejects_second_shard() {
+        let dir = tempdir().unwrap();
+        let conn = connect(dir.path().to_str().unwrap())
+            .execute()
+            .await
+            .unwrap();
+        let table = conn
+            .create_table("t", id_region_reader(vec![(1, "us")]))
+            .execute()
+            .await
+            .unwrap();
+        table.set_unenforced_primary_key(["id"]).await.unwrap();
+        table
+            .set_lsm_write_spec(LsmWriteSpec::identity("region"))
+            .await
+            .unwrap();
+
+        // The first merge_insert opens the single writer for shard "us".
+        let mut builder = table.merge_insert(&[]);
+        builder
+            .when_matched_update_all(None)
+            .when_not_matched_insert_all();
+        builder
+            .execute(id_region_reader(vec![(2, "us")]))
+            .await
+            .unwrap();
+
+        // A merge_insert routing to a different shard is rejected.
+        let mut builder = table.merge_insert(&[]);
+        builder
+            .when_matched_update_all(None)
+            .when_not_matched_insert_all();
+        let err = builder
+            .execute(id_region_reader(vec![(3, "eu")]))
+            .await
+            .unwrap_err();
+        assert!(matches!(err, Error::InvalidInput { .. }), "got {err:?}");
+
+        // After closing the writer, a different shard can be written.
+        table.close_lsm_writers().await.unwrap();
+        let mut builder = table.merge_insert(&[]);
+        builder
+            .when_matched_update_all(None)
+            .when_not_matched_insert_all();
+        builder
+            .execute(id_region_reader(vec![(4, "eu")]))
+            .await
+            .unwrap();
+    }
+}
--- a/rust/lancedb/src/table/merge/lsm.rs
+++ b/rust/lancedb/src/table/merge/lsm.rs
--- a/skills/README.md
+++ b/skills/README.md
@@ -0,0 +1,13 @@
+# Skills
+
+This directory contains code agent skills for the LanceDB project.
+
+Each skill is a folder that contains a required `SKILL.md` and optional bundled resources.
+
+## Install
+
+```bash
+npx skills add lancedb/lancedb
+```
+
+Restart code agents after installing.
--- a/skills/lancedb-update-lance-dependency/SKILL.md
+++ b/skills/lancedb-update-lance-dependency/SKILL.md
@@ -0,0 +1,98 @@
+---
+name: lancedb-update-lance-dependency
+description: Update LanceDB to a specific Lance release or tag. Use when bumping Lance dependencies in the lancedb repository, including Rust workspace Lance crates, Java lance-core, validation, branch creation, commit, push, and PR creation when requested.
+---
+
+# LanceDB Update Lance Dependency
+
+## Scope
+
+Use this skill in the `lancedb/lancedb` repository when updating the Lance dependency to a specific Lance version or tag.
+
+Inputs can be a version (`7.2.0-beta.1`), a tag (`v7.2.0-beta.1`), a tag ref (`refs/tags/v7.2.0-beta.1`), or `latest`.
+
+## Workflow
+
+1. Confirm the worktree status with `git status --short`.
+2. Resolve the target Lance version:
+
+   - If the input is `latest`, empty, or omitted, run:
+
+     ```bash
+     python3 ci/check_lance_release.py
+     ```
+
+     Parse the JSON output. If `needs_update` is not `true`, stop without creating a PR. Otherwise use `latest_tag`.
+
+   - If the input is explicit, use it directly.
+
+3. Compute update metadata without changing files:
+
+   ```bash
+   python3 ci/update_lance_dependency.py "$TAG_OR_VERSION" --metadata-only
+   ```
+
+   Before making changes, check for an existing open PR with the emitted `pr_title`:
+
+   ```bash
+   gh pr list --search "\"$PR_TITLE\" in:title" --state open --limit 1 --json number,url,title
+   ```
+
+   If a matching open PR exists, stop and report it instead of creating a duplicate.
+
+4. Run the deterministic update entrypoint:
+
+   ```bash
+   python3 ci/update_lance_dependency.py "$TAG_OR_VERSION"
+   ```
+
+   This updates the Rust workspace Lance dependencies through `ci/set_lance_version.py`, updates `java/pom.xml`, refreshes Cargo metadata, and prints JSON metadata containing `branch_name`, `commit_message`, and `pr_title`.
+
+5. Run validation:
+
+   ```bash
+   cargo clippy --workspace --tests --all-features -- -D warnings
+   cargo fmt --all
+   ```
+
+   Fix real diagnostics and rerun clippy until it succeeds. Do not skip warnings.
+
+6. Inspect `git status --short` and `git diff` to ensure only the Lance dependency update and required compatibility fixes are present.
+
+7. If the task only asks to prepare local changes, stop here and report the changed files and validation result.
+
+8. If the task asks to publish the update, create a branch using the printed `branch_name`, stage all relevant files, and commit using the printed `commit_message`. Do not amend or rewrite existing commits.
+
+9. Push to `origin`. Before creating the PR, check that the current token has push permission:
+
+   ```bash
+   gh api repos/lancedb/lancedb --jq .permissions.push
+   ```
+
+   If the remote branch already exists for the same generated branch name, delete the remote ref with `gh api -X DELETE repos/lancedb/lancedb/git/refs/heads/$BRANCH_NAME`, then push. Do not force-push.
+
+10. Create a PR targeting `main` with the printed `pr_title`. If there is no PR template, keep the body to two or three concise sentences: state the Lance dependency bump, note any required compatibility fixes, and link the triggering Lance tag or release.
+
+11. Read back the remote PR title after creation. If it is not a Conventional Commit title, fix it immediately.
+
+12. When running in GitHub Actions after creating the LanceDB PR, trigger the Sophon dependency update:
+
+    ```bash
+    gh workflow run codex-bump-lancedb-lance.yml \
+      --repo lancedb/sophon \
+      -f lance_ref="$LANCE_TAG" \
+      -f lancedb_ref="$BRANCH_NAME"
+    gh run list --repo lancedb/sophon --workflow codex-bump-lancedb-lance.yml --limit 1 --json databaseId,url,displayTitle
+    ```
+
+    Use the emitted metadata `tag` value as `LANCE_TAG`. Do this only after a new LanceDB PR has been created. If the update was skipped because no update is needed or an open PR already exists, do not trigger Sophon.
+
+## GitHub Actions
+
+When this skill is used from GitHub Actions, `TAG`, `GH_TOKEN`, and `GITHUB_TOKEN` may already be set. Resolve `latest` first when `TAG` is empty. Once an explicit tag or version is known, use:
+
+```bash
+python3 ci/update_lance_dependency.py "$TAG" --github-output "$GITHUB_OUTPUT"
+```
+
+Then use the emitted `branch_name`, `commit_message`, and `pr_title` values for branch, commit, and PR creation.