feat: add mem_wal flag to merge insert for MemWAL write path

Add support for enabling MemWAL (Memory Write-Ahead Log) mode on merge insert operations. This allows streaming writes to route through memory nodes for high-performance buffered writes. Changes: - Add `mem_wal` field to MergeInsertBuilder with validation - Add `x-lancedb-mem-wal-enabled` header for remote requests - Add Python `mem_wal()` method to LanceMergeInsertBuilder - Add validation to ensure only upsert pattern is supported: - when_matched_update_all() without filter - when_not_matched_insert_all() - Throw NotSupported error for native tables - Add mem_wal_enabled to ClientConfig for Python/Node bindings Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
fixes for breaking changes
2026-06-13 01:00:39 +00:00 · 2026-03-16 01:04:04 -07:00 · 2026-03-04 15:27:32 -08:00 · 2026-03-04 14:57:17 -08:00 · 2026-03-04 11:11:36 -08:00 · 2026-03-03 16:23:29 -08:00
138 changed files with 9175 additions and 3088 deletions
--- a/.bumpversion.toml
+++ b/.bumpversion.toml
@@ -1,5 +1,5 @@
 [tool.bumpversion]
-current_version = "0.26.2"
+current_version = "0.27.0-beta.3"
 parse = """(?x)
    (?P<major>0|[1-9]\\d*)\\.
    (?P<minor>0|[1-9]\\d*)\\.
--- a/.github/workflows/build_linux_wheel/action.yml
+++ b/.github/workflows/build_linux_wheel/action.yml
@@ -29,6 +29,7 @@ runs:
      if: ${{ inputs.arm-build == 'false' }}
      uses: PyO3/maturin-action@v1
      with:
+        maturin-version: "1.12.4"
        command: build
        working-directory: python
        docker-options: "-e PIP_EXTRA_INDEX_URL='https://pypi.fury.io/lance-format/ https://pypi.fury.io/lancedb/'"
@@ -44,6 +45,7 @@ runs:
      if: ${{ inputs.arm-build == 'true' }}
      uses: PyO3/maturin-action@v1
      with:
+        maturin-version: "1.12.4"
        command: build
        working-directory: python
        docker-options: "-e PIP_EXTRA_INDEX_URL='https://pypi.fury.io/lance-format/ https://pypi.fury.io/lancedb/'"
--- a/.github/workflows/build_mac_wheel/action.yml
+++ b/.github/workflows/build_mac_wheel/action.yml
@@ -20,6 +20,7 @@ runs:
      uses: PyO3/maturin-action@v1
      with:
        command: build
+        maturin-version: "1.12.4"
        # TODO: pass through interpreter
        args: ${{ inputs.args }}
        docker-options: "-e PIP_EXTRA_INDEX_URL='https://pypi.fury.io/lance-format/ https://pypi.fury.io/lancedb/'"
--- a/.github/workflows/build_windows_wheel/action.yml
+++ b/.github/workflows/build_windows_wheel/action.yml
@@ -25,6 +25,7 @@ runs:
      uses: PyO3/maturin-action@v1
      with:
        command: build
+        maturin-version: "1.12.4"
        args: ${{ inputs.args }}
        docker-options: "-e PIP_EXTRA_INDEX_URL='https://pypi.fury.io/lance-format/ https://pypi.fury.io/lancedb/'"
        working-directory: python
--- a/.github/workflows/nodejs.yml
+++ b/.github/workflows/nodejs.yml
@@ -8,6 +8,7 @@ on:
    paths:
      - Cargo.toml
      - nodejs/**
+      - rust/**
      - docs/src/js/**
      - .github/workflows/nodejs.yml
      - docker-compose.yml
@@ -77,8 +78,11 @@ jobs:
        fetch-depth: 0
        lfs: true
    - uses: actions/setup-node@v3
+      name: Setup Node.js 20 for build
      with:
-        node-version: ${{ matrix.node-version }}
+        # @napi-rs/cli v3 requires Node >= 20.12 (via @inquirer/prompts@8).
+        # Build always on Node 20; tests run on the matrix version below.
+        node-version: 20
        cache: 'npm'
        cache-dependency-path: nodejs/package-lock.json
    - uses: Swatinem/rust-cache@v2
@@ -86,12 +90,16 @@ jobs:
      run: |
        sudo apt update
        sudo apt install -y protobuf-compiler libssl-dev
-        npm install -g @napi-rs/cli
    - name: Build
      run: |
        npm ci --include=optional
        npm run build:debug -- --profile ci
-        npm run tsc
+    - uses: actions/setup-node@v3
+      name: Setup Node.js ${{ matrix.node-version }} for test
+      with:
+        node-version: ${{ matrix.node-version }}
+    - name: Compile TypeScript
+      run: npm run tsc
    - name: Setup localstack
      working-directory: .
      run: docker compose up --detach --wait
@@ -144,7 +152,6 @@ jobs:
    - name: Install dependencies
      run: |
        brew install protobuf
-        npm install -g @napi-rs/cli
    - name: Build
      run: |
        npm ci --include=optional
--- a/.github/workflows/npm-publish.yml
+++ b/.github/workflows/npm-publish.yml
@@ -128,16 +128,13 @@ jobs:
          - target: x86_64-unknown-linux-musl
            # This one seems to need some extra memory
            host: ubuntu-2404-8x-x64
-            # https://github.com/napi-rs/napi-rs/blob/main/alpine.Dockerfile
-            docker: ghcr.io/napi-rs/napi-rs/nodejs-rust:lts-alpine
            features: fp16kernels
            pre_build: |-
              set -e &&
-              apk add protobuf-dev curl &&
-              ln -s /usr/lib/gcc/x86_64-alpine-linux-musl/14.2.0/crtbeginS.o /usr/lib/crtbeginS.o &&
-              ln -s /usr/lib/libgcc_s.so /usr/lib/libgcc.so &&
-              CC=gcc &&
-              CXX=g++
+              sudo apt-get update &&
+              sudo apt-get install -y protobuf-compiler pkg-config &&
+              rustup target add x86_64-unknown-linux-musl &&
+              export EXTRA_ARGS="-x"
          - target: aarch64-unknown-linux-gnu
            host: ubuntu-2404-8x-x64
            # https://github.com/napi-rs/napi-rs/blob/main/debian-aarch64.Dockerfile
@@ -153,15 +150,13 @@ jobs:
              rustup target add aarch64-unknown-linux-gnu
          - target: aarch64-unknown-linux-musl
            host: ubuntu-2404-8x-x64
-            # https://github.com/napi-rs/napi-rs/blob/main/alpine.Dockerfile
-            docker: ghcr.io/napi-rs/napi-rs/nodejs-rust:lts-alpine
            features: ","
            pre_build: |-
              set -e &&
-              apk add protobuf-dev &&
+              sudo apt-get update &&
+              sudo apt-get install -y protobuf-compiler &&
              rustup target add aarch64-unknown-linux-musl &&
-              export CC_aarch64_unknown_linux_musl=aarch64-linux-musl-gcc &&
-              export CXX_aarch64_unknown_linux_musl=aarch64-linux-musl-g++
+              export EXTRA_ARGS="-x"
    name: build - ${{ matrix.settings.target }}
    runs-on: ${{ matrix.settings.host }}
    defaults:
@@ -192,12 +187,18 @@ jobs:
            .cargo-cache
            target/
          key: nodejs-${{ matrix.settings.target }}-cargo-${{ matrix.settings.host }}
-      - name: Setup toolchain
-        run: ${{ matrix.settings.setup }}
-        if: ${{ matrix.settings.setup }}
-        shell: bash
      - name: Install dependencies
        run: npm ci
+      - name: Install Zig
+        uses: mlugg/setup-zig@v2
+        if: ${{ contains(matrix.settings.target, 'musl') }}
+        with:
+          version: 0.14.1
+      - name: Install cargo-zigbuild
+        uses: taiki-e/install-action@v2
+        if: ${{ contains(matrix.settings.target, 'musl') }}
+        with:
+          tool: cargo-zigbuild
      - name: Build in docker
        uses: addnab/docker-run-action@v3
        if: ${{ matrix.settings.docker }}
@@ -210,24 +211,24 @@ jobs:
          run: |
            set -e
            ${{ matrix.settings.pre_build }}
-            npx napi build --platform  --release --no-const-enum \
+            npx napi build --platform --release \
              --features ${{ matrix.settings.features }} \
              --target ${{ matrix.settings.target }} \
              --dts ../lancedb/native.d.ts \
              --js ../lancedb/native.js \
              --strip \
-              dist/
+              --output-dir dist/
      - name: Build
        run: |
          ${{ matrix.settings.pre_build }}
-          npx napi build --platform  --release --no-const-enum \
+          npx napi build --platform --release \
              --features ${{ matrix.settings.features }} \
              --target ${{ matrix.settings.target }} \
              --dts ../lancedb/native.d.ts \
              --js ../lancedb/native.js \
              --strip \
              $EXTRA_ARGS \
-              dist/
+              --output-dir dist/
        if: ${{ !matrix.settings.docker }}
        shell: bash
      - name: Upload artifact
@@ -355,7 +356,8 @@ jobs:
          if [[ $DRY_RUN == "true" ]]; then
            ARGS="$ARGS --dry-run"
          fi
-          if [[ $GITHUB_REF =~ refs/tags/v(.*)-beta.* ]]; then
+          VERSION=$(node -p "require('./package.json').version")
+          if [[ $VERSION == *-* ]]; then
            ARGS="$ARGS --tag preview"
          fi
          npm publish $ARGS
--- a/.github/workflows/python.yml
+++ b/.github/workflows/python.yml
@@ -8,7 +8,12 @@ on:
    paths:
      - Cargo.toml
      - python/**
+      - rust/**
      - .github/workflows/python.yml
+      - .github/workflows/build_linux_wheel/**
+      - .github/workflows/build_mac_wheel/**
+      - .github/workflows/build_windows_wheel/**
+      - .github/workflows/run_tests/**

 concurrency:
  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
--- a/.github/workflows/rust.yml
+++ b/.github/workflows/rust.yml
@@ -100,7 +100,9 @@ jobs:
          lfs: true
      - uses: Swatinem/rust-cache@v2
      - name: Install dependencies
-        run: sudo apt install -y protobuf-compiler libssl-dev
+        run: |
+          sudo apt update
+          sudo apt install -y protobuf-compiler libssl-dev
      - uses: rui314/setup-mold@v1
      - name: Make Swap
        run: |
@@ -183,7 +185,7 @@ jobs:
    runs-on: ubuntu-24.04
    strategy:
      matrix:
-        msrv: ["1.88.0"] # This should match up with rust-version in Cargo.toml
+        msrv: ["1.91.0"] # This should match up with rust-version in Cargo.toml
    env:
      # Need up-to-date compilers for kernels
      CC: clang-18
--- a/Cargo.lock
+++ b/Cargo.lock
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -5,30 +5,30 @@ exclude = ["python"]
 resolver = "2"

 [workspace.package]
-edition = "2021"
+edition = "2024"
 authors = ["LanceDB Devs <dev@lancedb.com>"]
 license = "Apache-2.0"
 repository = "https://github.com/lancedb/lancedb"
 description = "Serverless, low-latency vector database for AI applications"
 keywords = ["lancedb", "lance", "database", "vector", "search"]
 categories = ["database-implementations"]
-rust-version = "1.88.0"
+rust-version = "1.91.0"

 [workspace.dependencies]
-lance = { "version" = "=2.0.1", default-features = false }
-lance-core = "=2.0.1"
-lance-datagen = "=2.0.1"
-lance-file = "=2.0.1"
-lance-io = { "version" = "=2.0.1", default-features = false }
-lance-index = "=2.0.1"
-lance-linalg = "=2.0.1"
-lance-namespace = "=2.0.1"
-lance-namespace-impls = { "version" = "=2.0.1", default-features = false }
-lance-table = "=2.0.1"
-lance-testing = "=2.0.1"
-lance-datafusion = "=2.0.1"
-lance-encoding = "=2.0.1"
-lance-arrow = "=2.0.1"
+lance = { "version" = "=3.0.0-rc.3", default-features = false, "tag" = "v3.0.0-rc.3", "git" = "https://github.com/lance-format/lance.git" }
+lance-core = { "version" = "=3.0.0-rc.3", "tag" = "v3.0.0-rc.3", "git" = "https://github.com/lance-format/lance.git" }
+lance-datagen = { "version" = "=3.0.0-rc.3", "tag" = "v3.0.0-rc.3", "git" = "https://github.com/lance-format/lance.git" }
+lance-file = { "version" = "=3.0.0-rc.3", "tag" = "v3.0.0-rc.3", "git" = "https://github.com/lance-format/lance.git" }
+lance-io = { "version" = "=3.0.0-rc.3", default-features = false, "tag" = "v3.0.0-rc.3", "git" = "https://github.com/lance-format/lance.git" }
+lance-index = { "version" = "=3.0.0-rc.3", "tag" = "v3.0.0-rc.3", "git" = "https://github.com/lance-format/lance.git" }
+lance-linalg = { "version" = "=3.0.0-rc.3", "tag" = "v3.0.0-rc.3", "git" = "https://github.com/lance-format/lance.git" }
+lance-namespace = { "version" = "=3.0.0-rc.3", "tag" = "v3.0.0-rc.3", "git" = "https://github.com/lance-format/lance.git" }
+lance-namespace-impls = { "version" = "=3.0.0-rc.3", default-features = false, "tag" = "v3.0.0-rc.3", "git" = "https://github.com/lance-format/lance.git" }
+lance-table = { "version" = "=3.0.0-rc.3", "tag" = "v3.0.0-rc.3", "git" = "https://github.com/lance-format/lance.git" }
+lance-testing = { "version" = "=3.0.0-rc.3", "tag" = "v3.0.0-rc.3", "git" = "https://github.com/lance-format/lance.git" }
+lance-datafusion = { "version" = "=3.0.0-rc.3", "tag" = "v3.0.0-rc.3", "git" = "https://github.com/lance-format/lance.git" }
+lance-encoding = { "version" = "=3.0.0-rc.3", "tag" = "v3.0.0-rc.3", "git" = "https://github.com/lance-format/lance.git" }
+lance-arrow = { "version" = "=3.0.0-rc.3", "tag" = "v3.0.0-rc.3", "git" = "https://github.com/lance-format/lance.git" }
 ahash = "0.8"
 # Note that this one does not include pyarrow
 arrow = { version = "57.2", optional = false }
@@ -40,13 +40,15 @@ arrow-schema = "57.2"
 arrow-select = "57.2"
 arrow-cast = "57.2"
 async-trait = "0"
-datafusion = { version = "51.0", default-features = false }
-datafusion-catalog = "51.0"
-datafusion-common = { version = "51.0", default-features = false }
-datafusion-execution = "51.0"
-datafusion-expr = "51.0"
-datafusion-physical-plan = "51.0"
-datafusion-physical-expr = "51.0"
+datafusion = { version = "52.1", default-features = false }
+datafusion-catalog = "52.1"
+datafusion-common = { version = "52.1", default-features = false }
+datafusion-execution = "52.1"
+datafusion-expr = "52.1"
+datafusion-functions = "52.1"
+datafusion-physical-plan = "52.1"
+datafusion-physical-expr = "52.1"
+datafusion-sql = "52.1"
 env_logger = "0.11"
 half = { "version" = "2.7.1", default-features = false, features = [
    "num-traits",
--- a/docs/src/java/java.md
+++ b/docs/src/java/java.md
@@ -14,7 +14,7 @@ Add the following dependency to your `pom.xml`:
 <dependency>
    <groupId>com.lancedb</groupId>
    <artifactId>lancedb-core</artifactId>
-    <version>0.26.2</version>
+    <version>0.27.0-beta.3</version>
 </dependency>
 ```

--- a/docs/src/js/interfaces/DeleteResult.md
+++ b/docs/src/js/interfaces/DeleteResult.md
@@ -8,6 +8,14 @@

 ## Properties

+### numDeletedRows
+
+```ts
+numDeletedRows: number;
+```
+
+***
+
 ### version

 ```ts
--- a/java/lancedb-core/pom.xml
+++ b/java/lancedb-core/pom.xml
@@ -8,7 +8,7 @@
    <parent>
      <groupId>com.lancedb</groupId>
      <artifactId>lancedb-parent</artifactId>
-      <version>0.26.2-final.0</version>
+      <version>0.27.0-beta.3</version>
      <relativePath>../pom.xml</relativePath>
    </parent>

--- a/java/pom.xml
+++ b/java/pom.xml
@@ -6,7 +6,7 @@

    <groupId>com.lancedb</groupId>
    <artifactId>lancedb-parent</artifactId>
-    <version>0.26.2-final.0</version>
+    <version>0.27.0-beta.3</version>
    <packaging>pom</packaging>
    <name>${project.artifactId}</name>
    <description>LanceDB Java SDK Parent POM</description>
@@ -28,7 +28,7 @@
    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <arrow.version>15.0.0</arrow.version>
-        <lance-core.version>2.0.1</lance-core.version>
+        <lance-core.version>3.1.0-beta.2</lance-core.version>
        <spotless.skip>false</spotless.skip>
        <spotless.version>2.30.0</spotless.version>
        <spotless.java.googlejavaformat.version>1.7</spotless.java.googlejavaformat.version>
--- a/nodejs/Cargo.toml
+++ b/nodejs/Cargo.toml
@@ -1,7 +1,7 @@
 [package]
 name = "lancedb-nodejs"
 edition.workspace = true
-version = "0.26.2"
+version = "0.27.0-beta.3"
 license.workspace = true
 description.workspace = true
 repository.workspace = true
@@ -19,11 +19,11 @@ arrow-schema.workspace = true
 env_logger.workspace = true
 futures.workspace = true
 lancedb = { path = "../rust/lancedb", default-features = false }
-napi = { version = "2.16.8", default-features = false, features = [
+napi = { version = "3.8.3", default-features = false, features = [
    "napi9",
    "async"
 ] }
-napi-derive = "2.16.4"
+napi-derive = "3.5.2"
 # Prevent dynamic linking of lzma, which comes from datafusion
 lzma-sys = { version = "*", features = ["static"] }
 log.workspace = true
@@ -33,7 +33,7 @@ aws-lc-sys = "=0.28.0"
 aws-lc-rs = "=1.13.0"

 [build-dependencies]
-napi-build = "2.1"
+napi-build = "2.3.1"

 [features]
 default = ["remote", "lancedb/aws", "lancedb/gcs", "lancedb/azure", "lancedb/dynamodb", "lancedb/oss", "lancedb/huggingface"]
--- a/nodejs/test/table.test.ts
+++ b/nodejs/test/table.test.ts
@@ -1697,6 +1697,65 @@ describe.each([arrow15, arrow16, arrow17, arrow18])(
      expect(results2[0].text).toBe(data[1].text);
    });

+    test("full text search fast search", async () => {
+      const db = await connect(tmpDir.name);
+      const data = [{ text: "hello world", vector: [0.1, 0.2, 0.3], id: 1 }];
+      const table = await db.createTable("test", data);
+      await table.createIndex("text", {
+        config: Index.fts(),
+      });
+
+      // Insert unindexed data after creating the index.
+      await table.add([{ text: "xyz", vector: [0.4, 0.5, 0.6], id: 2 }]);
+
+      const withFlatSearch = await table
+        .search("xyz", "fts")
+        .limit(10)
+        .toArray();
+      expect(withFlatSearch.length).toBeGreaterThan(0);
+
+      const fastSearchResults = await table
+        .search("xyz", "fts")
+        .fastSearch()
+        .limit(10)
+        .toArray();
+      expect(fastSearchResults.length).toBe(0);
+
+      const nearestToTextFastSearch = await table
+        .query()
+        .nearestToText("xyz")
+        .fastSearch()
+        .limit(10)
+        .toArray();
+      expect(nearestToTextFastSearch.length).toBe(0);
+
+      // fastSearch should be chainable with other methods.
+      const chainedFastSearch = await table
+        .search("xyz", "fts")
+        .fastSearch()
+        .select(["text"])
+        .limit(5)
+        .toArray();
+      expect(chainedFastSearch.length).toBe(0);
+
+      await table.optimize();
+
+      const indexedFastSearch = await table
+        .search("xyz", "fts")
+        .fastSearch()
+        .limit(10)
+        .toArray();
+      expect(indexedFastSearch.length).toBeGreaterThan(0);
+
+      const indexedNearestToTextFastSearch = await table
+        .query()
+        .nearestToText("xyz")
+        .fastSearch()
+        .limit(10)
+        .toArray();
+      expect(indexedNearestToTextFastSearch.length).toBeGreaterThan(0);
+    });
+
    test("prewarm full text search index", async () => {
      const db = await connect(tmpDir.name);
      const data = [
--- a/nodejs/lancedb/index.ts
+++ b/nodejs/lancedb/index.ts
@@ -273,7 +273,9 @@ export async function connect(
  let nativeProvider: NativeJsHeaderProvider | undefined;
  if (finalHeaderProvider) {
    if (typeof finalHeaderProvider === "function") {
-      nativeProvider = new NativeJsHeaderProvider(finalHeaderProvider);
+      nativeProvider = new NativeJsHeaderProvider(async () =>
+        finalHeaderProvider(),
+      );
    } else if (
      finalHeaderProvider &&
      typeof finalHeaderProvider.getHeaders === "function"
--- a/nodejs/lancedb/query.ts
+++ b/nodejs/lancedb/query.ts
@@ -684,19 +684,17 @@ export class VectorQuery extends StandardQueryBase<NativeVectorQuery> {

  rerank(reranker: Reranker): VectorQuery {
    super.doCall((inner) =>
-      inner.rerank({
-        rerankHybrid: async (_, args) => {
-          const vecResults = await fromBufferToRecordBatch(args.vecResults);
-          const ftsResults = await fromBufferToRecordBatch(args.ftsResults);
-          const result = await reranker.rerankHybrid(
-            args.query,
-            vecResults as RecordBatch,
-            ftsResults as RecordBatch,
-          );
+      inner.rerank(async (args) => {
+        const vecResults = await fromBufferToRecordBatch(args.vecResults);
+        const ftsResults = await fromBufferToRecordBatch(args.ftsResults);
+        const result = await reranker.rerankHybrid(
+          args.query,
+          vecResults as RecordBatch,
+          ftsResults as RecordBatch,
+        );

-          const buffer = fromRecordBatchToBuffer(result);
-          return buffer;
-        },
+        const buffer = fromRecordBatchToBuffer(result);
+        return buffer;
      }),
    );

--- a/nodejs/npm/darwin-arm64/package.json
+++ b/nodejs/npm/darwin-arm64/package.json
@@ -1,6 +1,6 @@
 {
 	"name": "@lancedb/lancedb-darwin-arm64",
-	"version": "0.26.2",
+	"version": "0.27.0-beta.3",
 	"os": ["darwin"],
 	"cpu": ["arm64"],
 	"main": "lancedb.darwin-arm64.node",
--- a/nodejs/npm/linux-arm64-gnu/package.json
+++ b/nodejs/npm/linux-arm64-gnu/package.json
@@ -1,6 +1,6 @@
 {
 	"name": "@lancedb/lancedb-linux-arm64-gnu",
-	"version": "0.26.2",
+	"version": "0.27.0-beta.3",
 	"os": ["linux"],
 	"cpu": ["arm64"],
 	"main": "lancedb.linux-arm64-gnu.node",
--- a/nodejs/npm/linux-arm64-musl/package.json
+++ b/nodejs/npm/linux-arm64-musl/package.json
@@ -1,6 +1,6 @@
 {
 	"name": "@lancedb/lancedb-linux-arm64-musl",
-	"version": "0.26.2",
+	"version": "0.27.0-beta.3",
 	"os": ["linux"],
 	"cpu": ["arm64"],
 	"main": "lancedb.linux-arm64-musl.node",
--- a/nodejs/npm/linux-x64-gnu/package.json
+++ b/nodejs/npm/linux-x64-gnu/package.json
@@ -1,6 +1,6 @@
 {
 	"name": "@lancedb/lancedb-linux-x64-gnu",
-	"version": "0.26.2",
+	"version": "0.27.0-beta.3",
 	"os": ["linux"],
 	"cpu": ["x64"],
 	"main": "lancedb.linux-x64-gnu.node",
--- a/nodejs/npm/linux-x64-musl/package.json
+++ b/nodejs/npm/linux-x64-musl/package.json
@@ -1,6 +1,6 @@
 {
 	"name": "@lancedb/lancedb-linux-x64-musl",
-	"version": "0.26.2",
+	"version": "0.27.0-beta.3",
 	"os": ["linux"],
 	"cpu": ["x64"],
 	"main": "lancedb.linux-x64-musl.node",
--- a/nodejs/npm/win32-arm64-msvc/package.json
+++ b/nodejs/npm/win32-arm64-msvc/package.json
@@ -1,6 +1,6 @@
 {
  "name": "@lancedb/lancedb-win32-arm64-msvc",
-  "version": "0.26.2",
+  "version": "0.27.0-beta.3",
  "os": [
    "win32"
  ],
--- a/nodejs/npm/win32-x64-msvc/package.json
+++ b/nodejs/npm/win32-x64-msvc/package.json
@@ -1,6 +1,6 @@
 {
 	"name": "@lancedb/lancedb-win32-x64-msvc",
-	"version": "0.26.2",
+	"version": "0.27.0-beta.3",
 	"os": ["win32"],
 	"cpu": ["x64"],
 	"main": "lancedb.win32-x64-msvc.node",
--- a/nodejs/package-lock.json
+++ b/nodejs/package-lock.json
--- a/nodejs/package.json
+++ b/nodejs/package.json
@@ -11,7 +11,7 @@
    "ann"
  ],
  "private": false,
-  "version": "0.26.2",
+  "version": "0.27.0-beta.3",
  "main": "dist/index.js",
  "exports": {
    ".": "./dist/index.js",
@@ -21,19 +21,16 @@
  },
  "types": "dist/index.d.ts",
  "napi": {
-    "name": "lancedb",
-    "triples": {
-      "defaults": false,
-      "additional": [
-        "aarch64-apple-darwin",
-        "x86_64-unknown-linux-gnu",
-        "aarch64-unknown-linux-gnu",
-        "x86_64-unknown-linux-musl",
-        "aarch64-unknown-linux-musl",
-        "x86_64-pc-windows-msvc",
-        "aarch64-pc-windows-msvc"
-      ]
-    }
+    "binaryName": "lancedb",
+    "targets": [
+      "aarch64-apple-darwin",
+      "x86_64-unknown-linux-gnu",
+      "aarch64-unknown-linux-gnu",
+      "x86_64-unknown-linux-musl",
+      "aarch64-unknown-linux-musl",
+      "x86_64-pc-windows-msvc",
+      "aarch64-pc-windows-msvc"
+    ]
  },
  "license": "Apache-2.0",
  "repository": {
@@ -46,7 +43,7 @@
    "@aws-sdk/client-s3": "^3.33.0",
    "@biomejs/biome": "^1.7.3",
    "@jest/globals": "^29.7.0",
-    "@napi-rs/cli": "^2.18.3",
+    "@napi-rs/cli": "^3.5.1",
    "@types/axios": "^0.14.0",
    "@types/jest": "^29.1.2",
    "@types/node": "^22.7.4",
@@ -75,9 +72,9 @@
  "os": ["darwin", "linux", "win32"],
  "scripts": {
    "artifacts": "napi artifacts",
-    "build:debug": "napi build --platform --no-const-enum --dts ../lancedb/native.d.ts --js ../lancedb/native.js lancedb",
+    "build:debug": "napi build --platform --dts ../lancedb/native.d.ts --js ../lancedb/native.js --output-dir lancedb",
    "postbuild:debug": "shx mkdir -p dist && shx cp lancedb/*.node dist/",
-    "build:release": "napi build --platform --no-const-enum --release --dts ../lancedb/native.d.ts --js ../lancedb/native.js dist/",
+    "build:release": "napi build --platform --release --dts ../lancedb/native.d.ts --js ../lancedb/native.js --output-dir dist",
    "postbuild:release": "shx mkdir -p dist && shx cp lancedb/*.node dist/",
    "build": "npm run build:debug && npm run tsc",
    "build-release": "npm run build:release && npm run tsc",
@@ -91,7 +88,7 @@
    "prepublishOnly": "napi prepublish -t npm",
    "test": "jest --verbose",
    "integration": "S3_TEST=1 npm run test",
-    "universal": "napi universal",
+    "universal": "napi universalize",
    "version": "napi version"
  },
  "dependencies": {
--- a/nodejs/src/connection.rs
+++ b/nodejs/src/connection.rs
@@ -8,10 +8,10 @@ use lancedb::database::{CreateTableMode, Database};
 use napi::bindgen_prelude::*;
 use napi_derive::*;

+use crate::ConnectionOptions;
 use crate::error::NapiErrorExt;
 use crate::header::JsHeaderProvider;
 use crate::table::Table;
-use crate::ConnectionOptions;
 use lancedb::connection::{ConnectBuilder, Connection as LanceDBConnection};

 use lancedb::ipc::{ipc_file_to_batches, ipc_file_to_schema};
--- a/nodejs/src/header.rs
+++ b/nodejs/src/header.rs
@@ -1,20 +1,19 @@
 // SPDX-License-Identifier: Apache-2.0
 // SPDX-FileCopyrightText: Copyright The LanceDB Authors

-use napi::{
-    bindgen_prelude::*,
-    threadsafe_function::{ErrorStrategy, ThreadsafeFunction},
-};
+use napi::{bindgen_prelude::*, threadsafe_function::ThreadsafeFunction};
 use napi_derive::napi;
 use std::collections::HashMap;
 use std::sync::Arc;

+type GetHeadersFn = ThreadsafeFunction<(), Promise<HashMap<String, String>>, (), Status, false>;
+
 /// JavaScript HeaderProvider implementation that wraps a JavaScript callback.
 /// This is the only native header provider - all header provider implementations
 /// should provide a JavaScript function that returns headers.
 #[napi]
 pub struct JsHeaderProvider {
-    get_headers_fn: Arc<ThreadsafeFunction<(), ErrorStrategy::CalleeHandled>>,
+    get_headers_fn: Arc<GetHeadersFn>,
 }

 impl Clone for JsHeaderProvider {
@@ -29,9 +28,12 @@ impl Clone for JsHeaderProvider {
 impl JsHeaderProvider {
    /// Create a new JsHeaderProvider from a JavaScript callback
    #[napi(constructor)]
-    pub fn new(get_headers_callback: JsFunction) -> Result<Self> {
+    pub fn new(
+        get_headers_callback: Function<(), Promise<HashMap<String, String>>>,
+    ) -> Result<Self> {
        let get_headers_fn = get_headers_callback
-            .create_threadsafe_function(0, |ctx| Ok(vec![ctx.value]))
+            .build_threadsafe_function()
+            .build()
            .map_err(|e| {
                Error::new(
                    Status::GenericFailure,
@@ -51,7 +53,7 @@ impl lancedb::remote::HeaderProvider for JsHeaderProvider {
    async fn get_headers(&self) -> lancedb::error::Result<HashMap<String, String>> {
        // Call the JavaScript function asynchronously
        let promise: Promise<HashMap<String, String>> =
-            self.get_headers_fn.call_async(Ok(())).await.map_err(|e| {
+            self.get_headers_fn.call_async(()).await.map_err(|e| {
                lancedb::error::Error::Runtime {
                    message: format!("Failed to call JavaScript get_headers: {}", e),
                }
--- a/nodejs/src/index.rs
+++ b/nodejs/src/index.rs
@@ -3,12 +3,12 @@

 use std::sync::Mutex;

+use lancedb::index::Index as LanceDbIndex;
 use lancedb::index::scalar::{BTreeIndexBuilder, FtsIndexBuilder};
 use lancedb::index::vector::{
    IvfFlatIndexBuilder, IvfHnswPqIndexBuilder, IvfHnswSqIndexBuilder, IvfPqIndexBuilder,
    IvfRqIndexBuilder,
 };
-use lancedb::index::Index as LanceDbIndex;
 use napi_derive::napi;

 use crate::util::parse_distance_type;
--- a/nodejs/src/lib.rs
+++ b/nodejs/src/lib.rs
@@ -60,7 +60,7 @@ pub struct OpenTableOptions {
    pub storage_options: Option<HashMap<String, String>>,
 }

-#[napi::module_init]
+#[napi_derive::module_init]
 fn init() {
    let env = Env::new()
        .filter_or("LANCEDB_LOG", "warn")
--- a/nodejs/src/query.rs
+++ b/nodejs/src/query.rs
@@ -17,11 +17,11 @@ use lancedb::query::VectorQuery as LanceDbVectorQuery;
 use napi::bindgen_prelude::*;
 use napi_derive::napi;

-use crate::error::convert_error;
 use crate::error::NapiErrorExt;
+use crate::error::convert_error;
 use crate::iterator::RecordBatchIterator;
+use crate::rerankers::RerankHybridCallbackArgs;
 use crate::rerankers::Reranker;
-use crate::rerankers::RerankerCallbacks;
 use crate::util::{parse_distance_type, schema_to_buffer};

 #[napi]
@@ -42,7 +42,7 @@ impl Query {
    }

    #[napi]
-    pub fn full_text_search(&mut self, query: napi::JsObject) -> napi::Result<()> {
+    pub fn full_text_search(&mut self, query: Object) -> napi::Result<()> {
        let query = parse_fts_query(query)?;
        self.inner = self.inner.clone().full_text_search(query);
        Ok(())
@@ -235,7 +235,7 @@ impl VectorQuery {
    }

    #[napi]
-    pub fn full_text_search(&mut self, query: napi::JsObject) -> napi::Result<()> {
+    pub fn full_text_search(&mut self, query: Object) -> napi::Result<()> {
        let query = parse_fts_query(query)?;
        self.inner = self.inner.clone().full_text_search(query);
        Ok(())
@@ -272,11 +272,13 @@ impl VectorQuery {
    }

    #[napi]
-    pub fn rerank(&mut self, callbacks: RerankerCallbacks) {
-        self.inner = self
-            .inner
-            .clone()
-            .rerank(Arc::new(Reranker::new(callbacks)));
+    pub fn rerank(
+        &mut self,
+        rerank_hybrid: Function<RerankHybridCallbackArgs, Promise<Buffer>>,
+    ) -> napi::Result<()> {
+        let reranker = Reranker::new(rerank_hybrid)?;
+        self.inner = self.inner.clone().rerank(Arc::new(reranker));
+        Ok(())
    }

    #[napi(catch_unwind)]
@@ -523,12 +525,12 @@ impl JsFullTextQuery {
    }
 }

-fn parse_fts_query(query: napi::JsObject) -> napi::Result<FullTextSearchQuery> {
-    if let Ok(Some(query)) = query.get::<_, &JsFullTextQuery>("query") {
+fn parse_fts_query(query: Object) -> napi::Result<FullTextSearchQuery> {
+    if let Ok(Some(query)) = query.get::<&JsFullTextQuery>("query") {
        Ok(FullTextSearchQuery::new_query(query.inner.clone()))
-    } else if let Ok(Some(query_text)) = query.get::<_, String>("query") {
+    } else if let Ok(Some(query_text)) = query.get::<String>("query") {
        let mut query_text = query_text;
-        let columns = query.get::<_, Option<Vec<String>>>("columns")?.flatten();
+        let columns = query.get::<Option<Vec<String>>>("columns")?.flatten();

        let is_phrase =
            query_text.len() >= 2 && query_text.starts_with('"') && query_text.ends_with('"');
@@ -549,15 +551,12 @@ fn parse_fts_query(query: napi::JsObject) -> napi::Result<FullTextSearchQuery> {
            }
        };
        let mut query = FullTextSearchQuery::new_query(query);
-        if let Some(cols) = columns {
-            if !cols.is_empty() {
-                query = query.with_columns(&cols).map_err(|e| {
-                    napi::Error::from_reason(format!(
-                        "Failed to set full text search columns: {}",
-                        e
-                    ))
-                })?;
-            }
+        if let Some(cols) = columns
+            && !cols.is_empty()
+        {
+            query = query.with_columns(&cols).map_err(|e| {
+                napi::Error::from_reason(format!("Failed to set full text search columns: {}", e))
+            })?;
        }
        Ok(query)
    } else {
--- a/nodejs/src/remote.rs
+++ b/nodejs/src/remote.rs
@@ -145,6 +145,7 @@ impl From<ClientConfig> for lancedb::remote::ClientConfig {
            id_delimiter: config.id_delimiter,
            tls_config: config.tls_config.map(Into::into),
            header_provider: None, // the header provider is set separately later
+            mem_wal_enabled: None, // mem_wal is set per-operation in merge_insert
        }
    }
 }
--- a/nodejs/src/rerankers.rs
+++ b/nodejs/src/rerankers.rs
@@ -3,10 +3,7 @@

 use arrow_array::RecordBatch;
 use async_trait::async_trait;
-use napi::{
-    bindgen_prelude::*,
-    threadsafe_function::{ErrorStrategy, ThreadsafeFunction},
-};
+use napi::{bindgen_prelude::*, threadsafe_function::ThreadsafeFunction};
 use napi_derive::napi;

 use lancedb::ipc::batches_to_ipc_file;
@@ -15,27 +12,28 @@ use lancedb::{error::Error, ipc::ipc_file_to_batches};

 use crate::error::NapiErrorExt;

+type RerankHybridFn = ThreadsafeFunction<
+    RerankHybridCallbackArgs,
+    Promise<Buffer>,
+    RerankHybridCallbackArgs,
+    Status,
+    false,
+>;
+
 /// Reranker implementation that "wraps" a NodeJS Reranker implementation.
 /// This contains references to the callbacks that can be used to invoke the
 /// reranking methods on the NodeJS implementation and handles serializing the
 /// record batches to Arrow IPC buffers.
-#[napi]
 pub struct Reranker {
-    /// callback to the Javascript which will call the rerankHybrid method of
-    /// some Reranker implementation
-    rerank_hybrid: ThreadsafeFunction<RerankHybridCallbackArgs, ErrorStrategy::CalleeHandled>,
+    rerank_hybrid: RerankHybridFn,
 }

-#[napi]
 impl Reranker {
-    #[napi]
-    pub fn new(callbacks: RerankerCallbacks) -> Self {
-        let rerank_hybrid = callbacks
-            .rerank_hybrid
-            .create_threadsafe_function(0, move |ctx| Ok(vec![ctx.value]))
-            .unwrap();
-
-        Self { rerank_hybrid }
+    pub fn new(
+        rerank_hybrid: Function<RerankHybridCallbackArgs, Promise<Buffer>>,
+    ) -> napi::Result<Self> {
+        let rerank_hybrid = rerank_hybrid.build_threadsafe_function().build()?;
+        Ok(Self { rerank_hybrid })
    }
 }

@@ -49,16 +47,16 @@ impl lancedb::rerankers::Reranker for Reranker {
    ) -> lancedb::error::Result<RecordBatch> {
        let callback_args = RerankHybridCallbackArgs {
            query: query.to_string(),
-            vec_results: batches_to_ipc_file(&[vector_results])?,
-            fts_results: batches_to_ipc_file(&[fts_results])?,
+            vec_results: Buffer::from(batches_to_ipc_file(&[vector_results])?.as_ref()),
+            fts_results: Buffer::from(batches_to_ipc_file(&[fts_results])?.as_ref()),
        };
        let promised_buffer: Promise<Buffer> = self
            .rerank_hybrid
-            .call_async(Ok(callback_args))
+            .call_async(callback_args)
            .await
            .map_err(|e| Error::Runtime {
-                message: format!("napi error status={}, reason={}", e.status, e.reason),
-            })?;
+            message: format!("napi error status={}, reason={}", e.status, e.reason),
+        })?;
        let buffer = promised_buffer.await.map_err(|e| Error::Runtime {
            message: format!("napi error status={}, reason={}", e.status, e.reason),
        })?;
@@ -77,16 +75,11 @@ impl std::fmt::Debug for Reranker {
    }
 }

-#[napi(object)]
-pub struct RerankerCallbacks {
-    pub rerank_hybrid: JsFunction,
-}
-
 #[napi(object)]
 pub struct RerankHybridCallbackArgs {
    pub query: String,
-    pub vec_results: Vec<u8>,
-    pub fts_results: Vec<u8>,
+    pub vec_results: Buffer,
+    pub fts_results: Buffer,
 }

 fn buffer_to_record_batch(buffer: Buffer) -> Result<RecordBatch> {
--- a/nodejs/src/session.rs
+++ b/nodejs/src/session.rs
@@ -95,8 +95,7 @@ impl napi::bindgen_prelude::FromNapiValue for Session {
        napi_val: napi::sys::napi_value,
    ) -> napi::Result<Self> {
        let object: napi::bindgen_prelude::ClassInstance<Self> =
-            napi::bindgen_prelude::ClassInstance::from_napi_value(env, napi_val)?;
-        let copy = object.clone();
-        Ok(copy)
+            unsafe { napi::bindgen_prelude::ClassInstance::from_napi_value(env, napi_val)? };
+        Ok((*object).clone())
    }
 }
--- a/nodejs/src/table.rs
+++ b/nodejs/src/table.rs
@@ -71,6 +71,17 @@ impl Table {
    pub async fn add(&self, buf: Buffer, mode: String) -> napi::Result<AddResult> {
        let batches = ipc_file_to_batches(buf.to_vec())
            .map_err(|e| napi::Error::from_reason(format!("Failed to read IPC file: {}", e)))?;
+        let batches = batches
+            .into_iter()
+            .map(|batch| {
+                batch.map_err(|e| {
+                    napi::Error::from_reason(format!(
+                        "Failed to read record batch from IPC file: {}",
+                        e
+                    ))
+                })
+            })
+            .collect::<Result<Vec<_>>>()?;
        let mut op = self.inner_ref()?.add(batches);

        op = if mode == "append" {
@@ -742,12 +753,14 @@ impl From<lancedb::table::AddResult> for AddResult {

 #[napi(object)]
 pub struct DeleteResult {
+    pub num_deleted_rows: i64,
    pub version: i64,
 }

 impl From<lancedb::table::DeleteResult> for DeleteResult {
    fn from(value: lancedb::table::DeleteResult) -> Self {
        Self {
+            num_deleted_rows: value.num_deleted_rows as i64,
            version: value.version as i64,
        }
    }
--- a/python/.bumpversion.toml
+++ b/python/.bumpversion.toml
@@ -1,5 +1,5 @@
 [tool.bumpversion]
-current_version = "0.30.0-beta.0"
+current_version = "0.30.0-beta.3"
 parse = """(?x)
    (?P<major>0|[1-9]\\d*)\\.
    (?P<minor>0|[1-9]\\d*)\\.
--- a/python/Cargo.toml
+++ b/python/Cargo.toml
@@ -1,13 +1,13 @@
 [package]
 name = "lancedb-python"
-version = "0.30.0-beta.0"
+version = "0.30.0-beta.3"
 edition.workspace = true
 description = "Python bindings for LanceDB"
 license.workspace = true
 repository.workspace = true
 keywords.workspace = true
 categories.workspace = true
-rust-version = "1.88.0"
+rust-version = "1.91.0"

 [lib]
 name = "_lancedb"
--- a/python/pyproject.toml
+++ b/python/pyproject.toml
@@ -59,9 +59,9 @@ tests = [
    "polars>=0.19, <=1.3.0",
    "tantivy",
    "pyarrow-stubs",
-    "pylance>=1.0.0b14",
+    "pylance>=1.0.0b14,<3.0.0",
    "requests",
-    "datafusion",
+    "datafusion<52",
 ]
 dev = [
    "ruff",
--- a/python/python/lancedb/arrow.py
+++ b/python/python/lancedb/arrow.py
@@ -1,8 +1,10 @@
 # SPDX-License-Identifier: Apache-2.0
 # SPDX-FileCopyrightText: Copyright The LanceDB Authors

+from functools import singledispatch
 from typing import List, Optional, Tuple, Union

+from lancedb.pydantic import LanceModel, model_to_dict
 import pyarrow as pa

 from ._lancedb import RecordBatchStream
@@ -80,3 +82,32 @@ def peek_reader(
        yield from reader

    return batch, pa.RecordBatchReader.from_batches(batch.schema, all_batches())
+
+
+@singledispatch
+def to_arrow(data) -> pa.Table:
+    """Convert a single data object to a pa.Table."""
+    raise NotImplementedError(f"to_arrow not implemented for type {type(data)}")
+
+
+@to_arrow.register(pa.RecordBatch)
+def _arrow_from_batch(data: pa.RecordBatch) -> pa.Table:
+    return pa.Table.from_batches([data])
+
+
+@to_arrow.register(pa.Table)
+def _arrow_from_table(data: pa.Table) -> pa.Table:
+    return data
+
+
+@to_arrow.register(list)
+def _arrow_from_list(data: list) -> pa.Table:
+    if not data:
+        raise ValueError("Cannot create table from empty list without a schema")
+
+    if isinstance(data[0], LanceModel):
+        schema = data[0].__class__.to_arrow_schema()
+        dicts = [model_to_dict(d) for d in data]
+        return pa.Table.from_pylist(dicts, schema=schema)
+
+    return pa.Table.from_pylist(data)
--- a/python/python/lancedb/embeddings/gte.py
+++ b/python/python/lancedb/embeddings/gte.py
@@ -2,6 +2,7 @@
 # SPDX-FileCopyrightText: Copyright The LanceDB Authors


+import warnings
 from typing import List, Union

 import numpy as np
@@ -15,6 +16,8 @@ from .utils import weak_lru
@register("gte-text")
 class GteEmbeddings(TextEmbeddingFunction):
    """
+    Deprecated: GTE embeddings should be used through sentence-transformers.
+
    An embedding function that uses GTE-LARGE MLX format(for Apple silicon devices only)
    as well as the standard cpu/gpu version from: https://huggingface.co/thenlper/gte-large.

@@ -61,6 +64,13 @@ class GteEmbeddings(TextEmbeddingFunction):

    def __init__(self, **kwargs):
        super().__init__(**kwargs)
+        warnings.warn(
+            "GTE embeddings as a standalone embedding function are deprecated. "
+            "Use the 'sentence-transformers' embedding function with a GTE model "
+            "instead.",
+            DeprecationWarning,
+            stacklevel=3,
+        )
        self._ndims = None
        if kwargs:
            self.mlx = kwargs.get("mlx", False)
--- a/python/python/lancedb/embeddings/openai.py
+++ b/python/python/lancedb/embeddings/openai.py
@@ -110,6 +110,9 @@ class OpenAIEmbeddings(TextEmbeddingFunction):
            valid_embeddings = {
                idx: v.embedding for v, idx in zip(rs.data, valid_indices)
            }
+        except openai.AuthenticationError:
+            logging.error("Authentication failed: Invalid API key provided")
+            raise
        except openai.BadRequestError:
            logging.exception("Bad request: %s", texts)
            return [None] * len(texts)
--- a/python/python/lancedb/embeddings/siglip.py
+++ b/python/python/lancedb/embeddings/siglip.py
@@ -6,6 +6,7 @@ import io
 import os
 from typing import TYPE_CHECKING, List, Union
 import urllib.parse as urlparse
+import warnings

 import numpy as np
 import pyarrow as pa
@@ -24,6 +25,7 @@ if TYPE_CHECKING:

@register("siglip")
 class SigLipEmbeddings(EmbeddingFunction):
+    # Deprecated: prefer CLIP embeddings via `open-clip`.
    model_name: str = "google/siglip-base-patch16-224"
    device: str = "cpu"
    batch_size: int = 64
@@ -36,6 +38,12 @@ class SigLipEmbeddings(EmbeddingFunction):

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
+        warnings.warn(
+            "SigLip embeddings are deprecated. Use CLIP embeddings via the "
+            "'open-clip' embedding function instead.",
+            DeprecationWarning,
+            stacklevel=3,
+        )
        transformers = attempt_import_or_raise("transformers")
        self._torch = attempt_import_or_raise("torch")

--- a/python/python/lancedb/embeddings/utils.py
+++ b/python/python/lancedb/embeddings/utils.py
@@ -269,6 +269,11 @@ def retry_with_exponential_backoff(
            # and say that it is assumed that if this portion errors out, it's due
            # to rate limit but the user should check the error message to be sure.
            except Exception as e:  # noqa: PERF203
+                # Don't retry on authentication errors (e.g., OpenAI 401)
+                # These are permanent failures that won't be fixed by retrying
+                if _is_non_retryable_error(e):
+                    raise
+
                num_retries += 1

                if num_retries > max_retries:
@@ -289,6 +294,29 @@ def retry_with_exponential_backoff(
    return wrapper


+def _is_non_retryable_error(error: Exception) -> bool:
+    """Check if an error should not be retried.
+
+    Args:
+        error: The exception to check
+
+    Returns:
+        True if the error should not be retried, False otherwise
+    """
+    # Check for OpenAI authentication errors
+    error_type = type(error).__name__
+    if error_type == "AuthenticationError":
+        return True
+
+    # Check for other common non-retryable HTTP status codes
+    # 401 Unauthorized, 403 Forbidden
+    if hasattr(error, "status_code"):
+        if error.status_code in (401, 403):
+            return True
+
+    return False
+
+
 def url_retrieve(url: str):
    """
    Parameters
--- a/python/python/lancedb/merge.py
+++ b/python/python/lancedb/merge.py
@@ -34,6 +34,7 @@ class LanceMergeInsertBuilder(object):
        self._when_not_matched_by_source_condition = None
        self._timeout = None
        self._use_index = True
+        self._mem_wal = False

    def when_matched_update_all(
        self, *, where: Optional[str] = None
@@ -96,6 +97,47 @@ class LanceMergeInsertBuilder(object):
        self._use_index = use_index
        return self

+    def mem_wal(self, enabled: bool = True) -> LanceMergeInsertBuilder:
+        """
+        Enable MemWAL (Memory Write-Ahead Log) mode for this merge insert operation.
+
+        When enabled, the merge insert will route data through a memory node service
+        that buffers writes before flushing to storage. This is only supported for
+        remote (LanceDB Cloud) tables.
+
+        **Important:** MemWAL only supports the upsert pattern. You must use:
+        - `when_matched_update_all()` (without a filter condition)
+        - `when_not_matched_insert_all()`
+
+        MemWAL does NOT support:
+        - `when_matched_update_all(where=...)` with a filter condition
+        - `when_not_matched_by_source_delete()`
+
+        Parameters
+        ----------
+        enabled: bool
+            Whether to enable MemWAL mode. Defaults to `True`.
+
+        Raises
+        ------
+        NotImplementedError
+            If used on a native (local) table, as MemWAL is only supported for
+            remote tables.
+        ValueError
+            If the merge insert pattern is not supported by MemWAL.
+
+        Examples
+        --------
+        >>> # Correct usage with MemWAL
+        >>> table.merge_insert(["id"]) \\
+        ...     .when_matched_update_all() \\
+        ...     .when_not_matched_insert_all() \\
+        ...     .mem_wal() \\
+        ...     .execute(new_data)
+        """
+        self._mem_wal = enabled
+        return self
+
    def execute(
        self,
        new_data: DATA,
--- a/python/python/lancedb/namespace.py
+++ b/python/python/lancedb/namespace.py
@@ -44,7 +44,7 @@ from lance_namespace import (
    ListNamespacesRequest,
    CreateNamespaceRequest,
    DropNamespaceRequest,
-    CreateEmptyTableRequest,
+    DeclareTableRequest,
 )
 from lancedb.table import AsyncTable, LanceTable, Table
 from lancedb.util import validate_table_name
@@ -318,20 +318,20 @@ class LanceNamespaceDBConnection(DBConnection):

        if location is None:
            # Table doesn't exist or mode is "create", reserve a new location
-            create_empty_request = CreateEmptyTableRequest(
+            declare_request = DeclareTableRequest(
                id=table_id,
                location=None,
                properties=self.storage_options if self.storage_options else None,
            )
-            create_empty_response = self._ns.create_empty_table(create_empty_request)
+            declare_response = self._ns.declare_table(declare_request)

-            if not create_empty_response.location:
+            if not declare_response.location:
                raise ValueError(
-                    "Table location is missing from create_empty_table response"
+                    "Table location is missing from declare_table response"
                )

-            location = create_empty_response.location
-            namespace_storage_options = create_empty_response.storage_options
+            location = declare_response.location
+            namespace_storage_options = declare_response.storage_options

        # Merge storage options: self.storage_options < user options < namespace options
        merged_storage_options = dict(self.storage_options)
@@ -759,20 +759,20 @@ class AsyncLanceNamespaceDBConnection:

        if location is None:
            # Table doesn't exist or mode is "create", reserve a new location
-            create_empty_request = CreateEmptyTableRequest(
+            declare_request = DeclareTableRequest(
                id=table_id,
                location=None,
                properties=self.storage_options if self.storage_options else None,
            )
-            create_empty_response = self._ns.create_empty_table(create_empty_request)
+            declare_response = self._ns.declare_table(declare_request)

-            if not create_empty_response.location:
+            if not declare_response.location:
                raise ValueError(
-                    "Table location is missing from create_empty_table response"
+                    "Table location is missing from declare_table response"
                )

-            location = create_empty_response.location
-            namespace_storage_options = create_empty_response.storage_options
+            location = declare_response.location
+            namespace_storage_options = declare_response.storage_options

        # Merge storage options: self.storage_options < user options < namespace options
        merged_storage_options = dict(self.storage_options)
--- a/python/python/lancedb/query.py
+++ b/python/python/lancedb/query.py
@@ -606,6 +606,7 @@ class LanceQueryBuilder(ABC):
                query,
                ordering_field_name=ordering_field_name,
                fts_columns=fts_columns,
+                fast_search=fast_search,
            )

        if isinstance(query, list):
@@ -1456,12 +1457,14 @@ class LanceFtsQueryBuilder(LanceQueryBuilder):
        query: str | FullTextQuery,
        ordering_field_name: Optional[str] = None,
        fts_columns: Optional[Union[str, List[str]]] = None,
+        fast_search: bool = None,
    ):
        super().__init__(table)
        self._query = query
        self._phrase_query = False
        self.ordering_field_name = ordering_field_name
        self._reranker = None
+        self._fast_search = fast_search
        if isinstance(fts_columns, str):
            fts_columns = [fts_columns]
        self._fts_columns = fts_columns
@@ -1483,6 +1486,19 @@ class LanceFtsQueryBuilder(LanceQueryBuilder):
        self._phrase_query = phrase_query
        return self

+    def fast_search(self) -> LanceFtsQueryBuilder:
+        """
+        Skip a flat search of unindexed data. This will improve
+        search performance but search results will not include unindexed data.
+
+        Returns
+        -------
+        LanceFtsQueryBuilder
+            The LanceFtsQueryBuilder object.
+        """
+        self._fast_search = True
+        return self
+
    def to_query_object(self) -> Query:
        return Query(
            columns=self._columns,
@@ -1494,6 +1510,7 @@ class LanceFtsQueryBuilder(LanceQueryBuilder):
                query=self._query, columns=self._fts_columns
            ),
            offset=self._offset,
+            fast_search=self._fast_search,
        )

    def output_schema(self) -> pa.Schema:
@@ -1782,6 +1799,26 @@ class LanceHybridQueryBuilder(LanceQueryBuilder):
            vector_results = LanceHybridQueryBuilder._rank(vector_results, "_distance")
            fts_results = LanceHybridQueryBuilder._rank(fts_results, "_score")

+        # If both result sets are empty (e.g. after hard filtering),
+        # return early to avoid errors in reranking or score restoration.
+        if vector_results.num_rows == 0 and fts_results.num_rows == 0:
+            # Build a minimal empty table with the _relevance_score column
+            combined_schema = pa.unify_schemas(
+                [vector_results.schema, fts_results.schema],
+            )
+            empty = pa.table(
+                {
+                    col: pa.array([], type=combined_schema.field(col).type)
+                    for col in combined_schema.names
+                }
+            )
+            empty = empty.append_column(
+                "_relevance_score", pa.array([], type=pa.float32())
+            )
+            if not with_row_ids and "_rowid" in empty.column_names:
+                empty = empty.drop(["_rowid"])
+            return empty
+
        original_distances = None
        original_scores = None
        original_distance_row_ids = None
--- a/python/python/lancedb/remote/table.py
+++ b/python/python/lancedb/remote/table.py
@@ -218,8 +218,6 @@ class RemoteTable(Table):
        train: bool = True,
    ):
        """Create an index on the table.
-        Currently, the only parameters that matter are
-        the metric and the vector column name.

        Parameters
        ----------
@@ -250,11 +248,6 @@ class RemoteTable(Table):
        >>> table.create_index("l2", "vector") # doctest: +SKIP
        """

-        if num_sub_vectors is not None:
-            logging.warning(
-                "num_sub_vectors is not supported on LanceDB cloud."
-                "This parameter will be tuned automatically."
-            )
        if accelerator is not None:
            logging.warning(
                "GPU accelerator is not yet supported on LanceDB cloud."
--- a/python/python/lancedb/scannable.py
+++ b/python/python/lancedb/scannable.py
@@ -0,0 +1,214 @@
+# SPDX-License-Identifier: Apache-2.0
+# SPDX-FileCopyrightText: Copyright The LanceDB Authors
+
+from dataclasses import dataclass
+from functools import singledispatch
+import sys
+from typing import Callable, Iterator, Optional
+from lancedb.arrow import to_arrow
+import pyarrow as pa
+import pyarrow.dataset as ds
+
+from .pydantic import LanceModel
+
+
+@dataclass
+class Scannable:
+    schema: pa.Schema
+    num_rows: Optional[int]
+    # Factory function to create a new reader each time (supports re-scanning)
+    reader: Callable[[], pa.RecordBatchReader]
+    # Whether reader can be called more than once. For example, an iterator can
+    # only be consumed once, while a DataFrame can be converted to a new reader
+    # each time.
+    rescannable: bool = True
+
+
+@singledispatch
+def to_scannable(data) -> Scannable:
+    # Fallback: try iterable protocol
+    if hasattr(data, "__iter__"):
+        return _from_iterable(iter(data))
+    raise NotImplementedError(f"to_scannable not implemented for type {type(data)}")
+
+
+@to_scannable.register(pa.RecordBatchReader)
+def _from_reader(data: pa.RecordBatchReader) -> Scannable:
+    # RecordBatchReader can only be consumed once - not rescannable
+    return Scannable(
+        schema=data.schema, num_rows=None, reader=lambda: data, rescannable=False
+    )
+
+
+@to_scannable.register(pa.RecordBatch)
+def _from_batch(data: pa.RecordBatch) -> Scannable:
+    return Scannable(
+        schema=data.schema,
+        num_rows=data.num_rows,
+        reader=lambda: pa.RecordBatchReader.from_batches(data.schema, [data]),
+    )
+
+
+@to_scannable.register(pa.Table)
+def _from_table(data: pa.Table) -> Scannable:
+    return Scannable(schema=data.schema, num_rows=data.num_rows, reader=data.to_reader)
+
+
+@to_scannable.register(ds.Dataset)
+def _from_dataset(data: ds.Dataset) -> Scannable:
+    return Scannable(
+        schema=data.schema,
+        num_rows=data.count_rows(),
+        reader=lambda: data.scanner().to_reader(),
+    )
+
+
+@to_scannable.register(ds.Scanner)
+def _from_scanner(data: ds.Scanner) -> Scannable:
+    # Scanner can only be consumed once - not rescannable
+    return Scannable(
+        schema=data.projected_schema,
+        num_rows=None,
+        reader=data.to_reader,
+        rescannable=False,
+    )
+
+
+@to_scannable.register(list)
+def _from_list(data: list) -> Scannable:
+    if not data:
+        raise ValueError("Cannot create table from empty list without a schema")
+    table = to_arrow(data)
+    return Scannable(
+        schema=table.schema, num_rows=table.num_rows, reader=table.to_reader
+    )
+
+
+@to_scannable.register(dict)
+def _from_dict(data: dict) -> Scannable:
+    raise ValueError("Cannot add a single dictionary to a table. Use a list.")
+
+
+@to_scannable.register(LanceModel)
+def _from_lance_model(data: LanceModel) -> Scannable:
+    raise ValueError("Cannot add a single LanceModel to a table. Use a list.")
+
+
+def _from_iterable(data: Iterator) -> Scannable:
+    first_item = next(data, None)
+    if first_item is None:
+        raise ValueError("Cannot create table from empty iterator")
+    first = to_arrow(first_item)
+    schema = first.schema
+
+    def iter():
+        yield from first.to_batches()
+        for item in data:
+            batch = to_arrow(item)
+            if batch.schema != schema:
+                try:
+                    batch = batch.cast(schema)
+                except pa.lib.ArrowInvalid:
+                    raise ValueError(
+                        f"Input iterator yielded a batch with schema that "
+                        f"does not match the schema of other batches.\n"
+                        f"Expected:\n{schema}\nGot:\n{batch.schema}"
+                    )
+            yield from batch.to_batches()
+
+    reader = pa.RecordBatchReader.from_batches(schema, iter())
+    return to_scannable(reader)
+
+
+_registered_modules: set[str] = set()
+
+
+def _register_optional_converters():
+    """Register converters for optional dependencies that are already imported."""
+
+    if "pandas" in sys.modules and "pandas" not in _registered_modules:
+        _registered_modules.add("pandas")
+        import pandas as pd
+
+        @to_arrow.register(pd.DataFrame)
+        def _arrow_from_pandas(data: pd.DataFrame) -> pa.Table:
+            table = pa.Table.from_pandas(data, preserve_index=False)
+            return table.replace_schema_metadata(None)
+
+        @to_scannable.register(pd.DataFrame)
+        def _from_pandas(data: pd.DataFrame) -> Scannable:
+            return to_scannable(_arrow_from_pandas(data))
+
+    if "polars" in sys.modules and "polars" not in _registered_modules:
+        _registered_modules.add("polars")
+        import polars as pl
+
+        @to_arrow.register(pl.DataFrame)
+        def _arrow_from_polars(data: pl.DataFrame) -> pa.Table:
+            return data.to_arrow()
+
+        @to_scannable.register(pl.DataFrame)
+        def _from_polars(data: pl.DataFrame) -> Scannable:
+            arrow = data.to_arrow()
+            return Scannable(
+                schema=arrow.schema, num_rows=len(data), reader=arrow.to_reader
+            )
+
+        @to_scannable.register(pl.LazyFrame)
+        def _from_polars_lazy(data: pl.LazyFrame) -> Scannable:
+            arrow = data.collect().to_arrow()
+            return Scannable(
+                schema=arrow.schema, num_rows=arrow.num_rows, reader=arrow.to_reader
+            )
+
+    if "datasets" in sys.modules and "datasets" not in _registered_modules:
+        _registered_modules.add("datasets")
+        from datasets import Dataset as HFDataset
+        from datasets import DatasetDict as HFDatasetDict
+
+        @to_scannable.register(HFDataset)
+        def _from_hf_dataset(data: HFDataset) -> Scannable:
+            table = data.data.table  # Access underlying Arrow table
+            return Scannable(
+                schema=table.schema, num_rows=len(data), reader=table.to_reader
+            )
+
+        @to_scannable.register(HFDatasetDict)
+        def _from_hf_dataset_dict(data: HFDatasetDict) -> Scannable:
+            # HuggingFace DatasetDict: combine all splits with a 'split' column
+            schema = data[list(data.keys())[0]].features.arrow_schema
+            if "split" not in schema.names:
+                schema = schema.append(pa.field("split", pa.string()))
+
+            def gen():
+                for split_name, dataset in data.items():
+                    for batch in dataset.data.to_batches():
+                        split_arr = pa.array(
+                            [split_name] * len(batch), type=pa.string()
+                        )
+                        yield pa.RecordBatch.from_arrays(
+                            list(batch.columns) + [split_arr], schema=schema
+                        )
+
+            total_rows = sum(len(dataset) for dataset in data.values())
+            return Scannable(
+                schema=schema,
+                num_rows=total_rows,
+                reader=lambda: pa.RecordBatchReader.from_batches(schema, gen()),
+            )
+
+    if "lance" in sys.modules and "lance" not in _registered_modules:
+        _registered_modules.add("lance")
+        import lance
+
+        @to_scannable.register(lance.LanceDataset)
+        def _from_lance(data: lance.LanceDataset) -> Scannable:
+            return Scannable(
+                schema=data.schema,
+                num_rows=data.count_rows(),
+                reader=lambda: data.scanner().to_reader(),
+            )
+
+
+# Register on module load
+_register_optional_converters()
--- a/python/python/lancedb/table.py
+++ b/python/python/lancedb/table.py
@@ -25,6 +25,8 @@ from typing import (
 )
 from urllib.parse import urlparse

+from lancedb.scannable import _register_optional_converters, to_scannable
+
 from . import __version__
 from lancedb.arrow import peek_reader
 from lancedb.background_loop import LOOP
@@ -1329,7 +1331,7 @@ class Table(ABC):
        1  2  [3.0, 4.0]
        2  3  [5.0, 6.0]
        >>> table.delete("x = 2")
-        DeleteResult(version=2)
+        DeleteResult(num_deleted_rows=1, version=2)
        >>> table.to_pandas()
           x      vector
        0  1  [1.0, 2.0]
@@ -1343,7 +1345,7 @@ class Table(ABC):
        >>> to_remove
        '1, 5'
        >>> table.delete(f"x IN ({to_remove})")
-        DeleteResult(version=3)
+        DeleteResult(num_deleted_rows=1, version=3)
        >>> table.to_pandas()
           x      vector
        0  3  [5.0, 6.0]
@@ -3727,18 +3729,31 @@ class AsyncTable:
            on_bad_vectors = "error"
        if fill_value is None:
            fill_value = 0.0
-        data = _sanitize_data(
-            data,
-            schema,
-            metadata=schema.metadata,
-            on_bad_vectors=on_bad_vectors,
-            fill_value=fill_value,
-            allow_subschema=True,
-        )
-        if isinstance(data, pa.Table):
-            data = data.to_reader()

-        return await self._inner.add(data, mode or "append")
+        # _santitize_data is an old code path, but we will use it until the
+        # new code path is ready.
+        if on_bad_vectors != "error" or (
+            schema.metadata is not None and b"embedding_functions" in schema.metadata
+        ):
+            data = _sanitize_data(
+                data,
+                schema,
+                metadata=schema.metadata,
+                on_bad_vectors=on_bad_vectors,
+                fill_value=fill_value,
+                allow_subschema=True,
+            )
+        _register_optional_converters()
+        data = to_scannable(data)
+        try:
+            return await self._inner.add(data, mode or "append")
+        except RuntimeError as e:
+            if "Cast error" in str(e):
+                raise ValueError(e)
+            elif "Vector column contains NaN" in str(e):
+                raise ValueError(e)
+            else:
+                raise

    def merge_insert(self, on: Union[str, Iterable[str]]) -> LanceMergeInsertBuilder:
        """
@@ -4166,6 +4181,7 @@ class AsyncTable:
                when_not_matched_by_source_condition=merge._when_not_matched_by_source_condition,
                timeout=merge._timeout,
                use_index=merge._use_index,
+                mem_wal=merge._mem_wal,
            ),
        )

@@ -4200,7 +4216,7 @@ class AsyncTable:
        1  2  [3.0, 4.0]
        2  3  [5.0, 6.0]
        >>> table.delete("x = 2")
-        DeleteResult(version=2)
+        DeleteResult(num_deleted_rows=1, version=2)
        >>> table.to_pandas()
           x      vector
        0  1  [1.0, 2.0]
@@ -4214,7 +4230,7 @@ class AsyncTable:
        >>> to_remove
        '1, 5'
        >>> table.delete(f"x IN ({to_remove})")
-        DeleteResult(version=3)
+        DeleteResult(num_deleted_rows=1, version=3)
        >>> table.to_pandas()
           x      vector
        0  3  [5.0, 6.0]
--- a/python/python/lancedb/util.py
+++ b/python/python/lancedb/util.py
@@ -324,6 +324,16 @@ def _(value: list):
    return "[" + ", ".join(map(value_to_sql, value)) + "]"


+@value_to_sql.register(dict)
+def _(value: dict):
+    # https://datafusion.apache.org/user-guide/sql/scalar_functions.html#named-struct
+    return (
+        "named_struct("
+        + ", ".join(f"'{k}', {value_to_sql(v)}" for k, v in value.items())
+        + ")"
+    )
+
+
@value_to_sql.register(np.ndarray)
 def _(value: np.ndarray):
    return value_to_sql(value.tolist())
--- a/python/python/tests/test_embeddings.py
+++ b/python/python/tests/test_embeddings.py
@@ -515,3 +515,34 @@ def test_openai_propagates_api_key(monkeypatch):
    query = "greetings"
    actual = table.search(query).limit(1).to_pydantic(Words)[0]
    assert len(actual.text) > 0
+
+
+@patch("time.sleep")
+def test_openai_no_retry_on_401(mock_sleep):
+    """
+    Test that OpenAI embedding function does not retry on 401 authentication
+    errors.
+    """
+    from lancedb.embeddings.utils import retry_with_exponential_backoff
+
+    # Create a mock that raises an AuthenticationError
+    class MockAuthenticationError(Exception):
+        """Mock OpenAI AuthenticationError"""
+
+        pass
+
+    MockAuthenticationError.__name__ = "AuthenticationError"
+
+    mock_func = MagicMock(side_effect=MockAuthenticationError("Invalid API key"))
+
+    # Wrap the function with retry logic
+    wrapped_func = retry_with_exponential_backoff(mock_func, max_retries=3)
+
+    # Should raise without retrying
+    with pytest.raises(MockAuthenticationError):
+        wrapped_func()
+
+    # Verify that the function was only called once (no retries)
+    assert mock_func.call_count == 1
+    # Verify that sleep was never called (no retries)
+    assert mock_sleep.call_count == 0
--- a/python/python/tests/test_fts.py
+++ b/python/python/tests/test_fts.py
@@ -27,6 +27,7 @@ from lancedb.query import (
    PhraseQuery,
    BooleanQuery,
    Occur,
+    LanceFtsQueryBuilder,
 )
 import numpy as np
 import pyarrow as pa
@@ -882,3 +883,109 @@ def test_fts_query_to_json():
        '"must_not":[]}}'
    )
    assert json_str == expected
+
+
+def test_fts_fast_search(table):
+    table.create_fts_index("text", use_tantivy=False)
+
+    # Insert some unindexed data
+    table.add(
+        [
+            {
+                "text": "xyz",
+                "vector": [0 for _ in range(128)],
+                "id": 101,
+                "text2": "xyz",
+                "nested": {"text": "xyz"},
+                "count": 10,
+            }
+        ]
+    )
+
+    # Without fast_search, the query object should not have fast_search set
+    builder = table.search("xyz", query_type="fts").limit(10)
+    query = builder.to_query_object()
+    assert query.fast_search is None
+
+    # With fast_search, the query object should have fast_search=True
+    builder = table.search("xyz", query_type="fts").fast_search().limit(10)
+    query = builder.to_query_object()
+    assert query.fast_search is True
+
+    # fast_search should be chainable with other methods
+    builder = (
+        table.search("xyz", query_type="fts").fast_search().select(["text"]).limit(5)
+    )
+    query = builder.to_query_object()
+    assert query.fast_search is True
+    assert query.limit == 5
+    assert query.columns == ["text"]
+
+    # fast_search should be enabled by keyword argument too
+    query = LanceFtsQueryBuilder(table, "xyz", fast_search=True).to_query_object()
+    assert query.fast_search is True
+
+    # Verify it executes without error and skips unindexed data
+    results = table.search("xyz", query_type="fts").fast_search().limit(5).to_list()
+    assert len(results) == 0
+
+    # Update index and verify it returns results
+    table.optimize()
+    results = table.search("xyz", query_type="fts").fast_search().limit(5).to_list()
+    assert len(results) > 0
+
+
+@pytest.mark.asyncio
+async def test_fts_fast_search_async(async_table):
+    await async_table.create_index("text", config=FTS())
+
+    # Insert some unindexed data
+    await async_table.add(
+        [
+            {
+                "text": "xyz",
+                "vector": [0 for _ in range(128)],
+                "id": 101,
+                "text2": "xyz",
+                "nested": {"text": "xyz"},
+                "count": 10,
+            }
+        ]
+    )
+
+    # Without fast_search, should return results
+    results = await async_table.query().nearest_to_text("xyz").limit(5).to_list()
+    assert len(results) > 0
+
+    # With fast_search, should return no results data unindexed
+    fast_results = (
+        await async_table.query()
+        .nearest_to_text("xyz")
+        .fast_search()
+        .limit(5)
+        .to_list()
+    )
+    assert len(fast_results) == 0
+
+    # Update index and verify it returns results
+    await async_table.optimize()
+
+    fast_results = (
+        await async_table.query()
+        .nearest_to_text("xyz")
+        .fast_search()
+        .limit(5)
+        .to_list()
+    )
+    assert len(fast_results) > 0
+
+    # fast_search should be chainable with other methods
+    results = (
+        await async_table.query()
+        .nearest_to_text("xyz")
+        .fast_search()
+        .select(["text"])
+        .limit(5)
+        .to_list()
+    )
+    assert len(results) > 0
--- a/python/python/tests/test_rerankers.py
+++ b/python/python/tests/test_rerankers.py
@@ -531,6 +531,78 @@ def test_empty_result_reranker():
        )


+def test_empty_hybrid_result_reranker():
+    """Test that hybrid search with empty results after filtering doesn't crash.
+
+    Regression test for https://github.com/lancedb/lancedb/issues/2425
+    """
+    from lancedb.query import LanceHybridQueryBuilder
+
+    # Simulate empty vector and FTS results with the expected schema
+    vector_schema = pa.schema(
+        [
+            ("text", pa.string()),
+            ("vector", pa.list_(pa.float32(), 4)),
+            ("_rowid", pa.uint64()),
+            ("_distance", pa.float32()),
+        ]
+    )
+    fts_schema = pa.schema(
+        [
+            ("text", pa.string()),
+            ("vector", pa.list_(pa.float32(), 4)),
+            ("_rowid", pa.uint64()),
+            ("_score", pa.float32()),
+        ]
+    )
+    empty_vector = pa.table(
+        {
+            "text": pa.array([], type=pa.string()),
+            "vector": pa.array([], type=pa.list_(pa.float32(), 4)),
+            "_rowid": pa.array([], type=pa.uint64()),
+            "_distance": pa.array([], type=pa.float32()),
+        },
+        schema=vector_schema,
+    )
+    empty_fts = pa.table(
+        {
+            "text": pa.array([], type=pa.string()),
+            "vector": pa.array([], type=pa.list_(pa.float32(), 4)),
+            "_rowid": pa.array([], type=pa.uint64()),
+            "_score": pa.array([], type=pa.float32()),
+        },
+        schema=fts_schema,
+    )
+
+    for reranker in [LinearCombinationReranker(), RRFReranker()]:
+        result = LanceHybridQueryBuilder._combine_hybrid_results(
+            fts_results=empty_fts,
+            vector_results=empty_vector,
+            norm="score",
+            fts_query="nonexistent query",
+            reranker=reranker,
+            limit=10,
+            with_row_ids=False,
+        )
+        assert len(result) == 0
+        assert "_relevance_score" in result.column_names
+        assert "_rowid" not in result.column_names
+
+    # Also test with with_row_ids=True
+    result = LanceHybridQueryBuilder._combine_hybrid_results(
+        fts_results=empty_fts,
+        vector_results=empty_vector,
+        norm="score",
+        fts_query="nonexistent query",
+        reranker=LinearCombinationReranker(),
+        limit=10,
+        with_row_ids=True,
+    )
+    assert len(result) == 0
+    assert "_relevance_score" in result.column_names
+    assert "_rowid" in result.column_names
+
+
@pytest.mark.parametrize("use_tantivy", [True, False])
 def test_cross_encoder_reranker_return_all(tmp_path, use_tantivy):
    pytest.importorskip("sentence_transformers")
--- a/python/python/tests/test_table.py
+++ b/python/python/tests/test_table.py
@@ -810,7 +810,7 @@ def test_create_index_name_and_train_parameters(
    )


-def test_add_with_nans(mem_db: DBConnection):
+def test_create_with_nans(mem_db: DBConnection):
    # by default we raise an error on bad input vectors
    bad_data = [
        {"vector": [np.nan], "item": "bar", "price": 20.0},
@@ -854,6 +854,57 @@ def test_add_with_nans(mem_db: DBConnection):
    assert np.allclose(v, np.array([0.0, 0.0]))


+def test_add_with_nans(mem_db: DBConnection):
+    schema = pa.schema(
+        [
+            pa.field("vector", pa.list_(pa.float32(), 2), nullable=True),
+            pa.field("item", pa.string(), nullable=True),
+            pa.field("price", pa.float64(), nullable=False),
+        ],
+    )
+    table = mem_db.create_table("test", schema=schema)
+    # by default we raise an error on bad input vectors
+    bad_data = [
+        {"vector": [np.nan], "item": "bar", "price": 20.0},
+        {"vector": [5], "item": "bar", "price": 20.0},
+        {"vector": [np.nan, np.nan], "item": "bar", "price": 20.0},
+        {"vector": [np.nan, 5.0], "item": "bar", "price": 20.0},
+    ]
+    for row in bad_data:
+        with pytest.raises(ValueError):
+            table.add(
+                data=[row],
+            )
+
+    table.add(
+        [
+            {"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
+            {"vector": [2.1, 4.1], "item": "foo", "price": 9.0},
+            {"vector": [np.nan], "item": "bar", "price": 20.0},
+            {"vector": [5], "item": "bar", "price": 20.0},
+            {"vector": [np.nan, np.nan], "item": "bar", "price": 20.0},
+        ],
+        on_bad_vectors="drop",
+    )
+    assert len(table) == 2
+    table.delete("true")
+
+    # We can fill bad input with some value
+    table.add(
+        data=[
+            {"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
+            {"vector": [np.nan], "item": "bar", "price": 20.0},
+            {"vector": [np.nan, np.nan], "item": "bar", "price": 20.0},
+        ],
+        on_bad_vectors="fill",
+        fill_value=0.0,
+    )
+    assert len(table) == 3
+    arrow_tbl = table.search().where("item == 'bar'").to_arrow()
+    v = arrow_tbl["vector"].to_pylist()[0]
+    assert np.allclose(v, np.array([0.0, 0.0]))
+
+
 def test_restore(mem_db: DBConnection):
    table = mem_db.create_table(
        "my_table",
--- a/python/python/tests/test_util.py
+++ b/python/python/tests/test_util.py
@@ -121,6 +121,32 @@ def test_value_to_sql_string(tmp_path):
        assert table.to_pandas().query("search == @value")["replace"].item() == value


+def test_value_to_sql_dict():
+    # Simple flat struct
+    assert value_to_sql({"a": 1, "b": "hello"}) == "named_struct('a', 1, 'b', 'hello')"
+
+    # Nested struct
+    assert (
+        value_to_sql({"outer": {"inner": 1}})
+        == "named_struct('outer', named_struct('inner', 1))"
+    )
+
+    # List inside struct
+    assert value_to_sql({"a": [1, 2]}) == "named_struct('a', [1, 2])"
+
+    # Mixed types
+    assert (
+        value_to_sql({"name": "test", "count": 42, "rate": 3.14, "active": True})
+        == "named_struct('name', 'test', 'count', 42, 'rate', 3.14, 'active', TRUE)"
+    )
+
+    # Null value inside struct
+    assert value_to_sql({"a": None}) == "named_struct('a', NULL)"
+
+    # Empty dict
+    assert value_to_sql({}) == "named_struct()"
+
+
 def test_append_vector_columns():
    registry = EmbeddingFunctionRegistry.get_instance()
    registry.register("test")(MockTextEmbeddingFunction)
@@ -292,18 +318,14 @@ class TestModel(lancedb.pydantic.LanceModel):
        lambda: pa.table({"a": [1], "b": [2]}),
        lambda: pa.table({"a": [1], "b": [2]}).to_reader(),
        lambda: iter(pa.table({"a": [1], "b": [2]}).to_batches()),
-        lambda: (
-            lance.write_dataset(
-                pa.table({"a": [1], "b": [2]}),
-                "memory://test",
-            )
-        ),
-        lambda: (
-            lance.write_dataset(
-                pa.table({"a": [1], "b": [2]}),
-                "memory://test",
-            ).scanner()
+        lambda: lance.write_dataset(
+            pa.table({"a": [1], "b": [2]}),
+            "memory://test",
        ),
+        lambda: lance.write_dataset(
+            pa.table({"a": [1], "b": [2]}),
+            "memory://test",
+        ).scanner(),
        lambda: pd.DataFrame({"a": [1], "b": [2]}),
        lambda: pl.DataFrame({"a": [1], "b": [2]}),
        lambda: pl.LazyFrame({"a": [1], "b": [2]}),
--- a/python/src/arrow.rs
+++ b/python/src/arrow.rs
@@ -10,7 +10,7 @@ use arrow::{
 use futures::stream::StreamExt;
 use lancedb::arrow::SendableRecordBatchStream;
 use pyo3::{
-    exceptions::PyStopAsyncIteration, pyclass, pymethods, Bound, Py, PyAny, PyRef, PyResult, Python,
+    Bound, Py, PyAny, PyRef, PyResult, Python, exceptions::PyStopAsyncIteration, pyclass, pymethods,
 };
 use pyo3_async_runtimes::tokio::future_into_py;

--- a/python/src/connection.rs
+++ b/python/src/connection.rs
@@ -9,10 +9,10 @@ use lancedb::{
    database::{CreateTableMode, Database, ReadConsistency},
 };
 use pyo3::{
+    Bound, FromPyObject, Py, PyAny, PyRef, PyResult, Python,
    exceptions::{PyRuntimeError, PyValueError},
    pyclass, pyfunction, pymethods,
    types::{PyDict, PyDictMethods},
-    Bound, FromPyObject, Py, PyAny, PyRef, PyResult, Python,
 };
 use pyo3_async_runtimes::tokio::future_into_py;

@@ -506,6 +506,7 @@ pub struct PyClientConfig {
    id_delimiter: Option<String>,
    tls_config: Option<PyClientTlsConfig>,
    header_provider: Option<Py<PyAny>>,
+    mem_wal_enabled: Option<bool>,
 }

 #[derive(FromPyObject)]
@@ -590,6 +591,7 @@ impl From<PyClientConfig> for lancedb::remote::ClientConfig {
            id_delimiter: value.id_delimiter,
            tls_config: value.tls_config.map(Into::into),
            header_provider,
+            mem_wal_enabled: value.mem_wal_enabled,
        }
    }
 }
--- a/python/src/error.rs
+++ b/python/src/error.rs
@@ -2,10 +2,10 @@
 // SPDX-FileCopyrightText: Copyright The LanceDB Authors

 use pyo3::{
+    PyErr, PyResult, Python,
    exceptions::{PyIOError, PyNotImplementedError, PyOSError, PyRuntimeError, PyValueError},
    intern,
    types::{PyAnyMethods, PyNone},
-    PyErr, PyResult, Python,
 };

 use lancedb::error::Error as LanceError;
--- a/python/src/index.rs
+++ b/python/src/index.rs
@@ -3,17 +3,17 @@

 use lancedb::index::vector::{IvfFlatIndexBuilder, IvfRqIndexBuilder, IvfSqIndexBuilder};
 use lancedb::index::{
+    Index as LanceDbIndex,
    scalar::{BTreeIndexBuilder, FtsIndexBuilder},
    vector::{IvfHnswPqIndexBuilder, IvfHnswSqIndexBuilder, IvfPqIndexBuilder},
-    Index as LanceDbIndex,
 };
-use pyo3::types::PyStringMethods;
 use pyo3::IntoPyObject;
+use pyo3::types::PyStringMethods;
 use pyo3::{
+    Bound, FromPyObject, PyAny, PyResult, Python,
    exceptions::{PyKeyError, PyValueError},
    intern, pyclass, pymethods,
    types::PyAnyMethods,
-    Bound, FromPyObject, PyAny, PyResult, Python,
 };

 use crate::util::parse_distance_type;
@@ -41,7 +41,12 @@ pub fn extract_index_params(source: &Option<Bound<'_, PyAny>>) -> PyResult<Lance
                let inner_opts = FtsIndexBuilder::default()
                    .base_tokenizer(params.base_tokenizer)
                    .language(&params.language)
-                    .map_err(|_| PyValueError::new_err(format!("LanceDB does not support the requested language: '{}'", params.language)))?
+                    .map_err(|_| {
+                        PyValueError::new_err(format!(
+                            "LanceDB does not support the requested language: '{}'",
+                            params.language
+                        ))
+                    })?
                    .with_position(params.with_position)
                    .lower_case(params.lower_case)
                    .max_token_length(params.max_token_length)
@@ -52,7 +57,7 @@ pub fn extract_index_params(source: &Option<Bound<'_, PyAny>>) -> PyResult<Lance
                    .ngram_max_length(params.ngram_max_length)
                    .ngram_prefix_only(params.prefix_only);
                Ok(LanceDbIndex::FTS(inner_opts))
-            },
+            }
            "IvfFlat" => {
                let params = source.extract::<IvfFlatParams>()?;
                let distance_type = parse_distance_type(params.distance_type)?;
@@ -64,10 +69,11 @@ pub fn extract_index_params(source: &Option<Bound<'_, PyAny>>) -> PyResult<Lance
                    ivf_flat_builder = ivf_flat_builder.num_partitions(num_partitions);
                }
                if let Some(target_partition_size) = params.target_partition_size {
-                    ivf_flat_builder = ivf_flat_builder.target_partition_size(target_partition_size);
+                    ivf_flat_builder =
+                        ivf_flat_builder.target_partition_size(target_partition_size);
                }
                Ok(LanceDbIndex::IvfFlat(ivf_flat_builder))
-            },
+            }
            "IvfPq" => {
                let params = source.extract::<IvfPqParams>()?;
                let distance_type = parse_distance_type(params.distance_type)?;
@@ -86,7 +92,7 @@ pub fn extract_index_params(source: &Option<Bound<'_, PyAny>>) -> PyResult<Lance
                    ivf_pq_builder = ivf_pq_builder.num_sub_vectors(num_sub_vectors);
                }
                Ok(LanceDbIndex::IvfPq(ivf_pq_builder))
-            },
+            }
            "IvfSq" => {
                let params = source.extract::<IvfSqParams>()?;
                let distance_type = parse_distance_type(params.distance_type)?;
@@ -101,7 +107,7 @@ pub fn extract_index_params(source: &Option<Bound<'_, PyAny>>) -> PyResult<Lance
                    ivf_sq_builder = ivf_sq_builder.target_partition_size(target_partition_size);
                }
                Ok(LanceDbIndex::IvfSq(ivf_sq_builder))
-            },
+            }
            "IvfRq" => {
                let params = source.extract::<IvfRqParams>()?;
                let distance_type = parse_distance_type(params.distance_type)?;
@@ -117,7 +123,7 @@ pub fn extract_index_params(source: &Option<Bound<'_, PyAny>>) -> PyResult<Lance
                    ivf_rq_builder = ivf_rq_builder.target_partition_size(target_partition_size);
                }
                Ok(LanceDbIndex::IvfRq(ivf_rq_builder))
-            },
+            }
            "HnswPq" => {
                let params = source.extract::<IvfHnswPqParams>()?;
                let distance_type = parse_distance_type(params.distance_type)?;
@@ -138,7 +144,7 @@ pub fn extract_index_params(source: &Option<Bound<'_, PyAny>>) -> PyResult<Lance
                    hnsw_pq_builder = hnsw_pq_builder.num_sub_vectors(num_sub_vectors);
                }
                Ok(LanceDbIndex::IvfHnswPq(hnsw_pq_builder))
-            },
+            }
            "HnswSq" => {
                let params = source.extract::<IvfHnswSqParams>()?;
                let distance_type = parse_distance_type(params.distance_type)?;
@@ -155,7 +161,7 @@ pub fn extract_index_params(source: &Option<Bound<'_, PyAny>>) -> PyResult<Lance
                    hnsw_sq_builder = hnsw_sq_builder.target_partition_size(target_partition_size);
                }
                Ok(LanceDbIndex::IvfHnswSq(hnsw_sq_builder))
-            },
+            }
            not_supported => Err(PyValueError::new_err(format!(
                "Invalid index type '{}'.  Must be one of BTree, Bitmap, LabelList, FTS, IvfPq, IvfSq, IvfHnswPq, or IvfHnswSq",
                not_supported
--- a/python/src/lib.rs
+++ b/python/src/lib.rs
@@ -2,14 +2,14 @@
 // SPDX-FileCopyrightText: Copyright The LanceDB Authors

 use arrow::RecordBatchStream;
-use connection::{connect, Connection};
+use connection::{Connection, connect};
 use env_logger::Env;
 use index::IndexConfig;
 use permutation::{PyAsyncPermutationBuilder, PyPermutationReader};
 use pyo3::{
-    pymodule,
+    Bound, PyResult, Python, pymodule,
    types::{PyModule, PyModuleMethods},
-    wrap_pyfunction, Bound, PyResult, Python,
+    wrap_pyfunction,
 };
 use query::{FTSQuery, HybridQuery, Query, VectorQuery};
 use session::Session;
--- a/python/src/permutation.rs
+++ b/python/src/permutation.rs
@@ -16,17 +16,32 @@ use lancedb::{
    query::Select,
 };
 use pyo3::{
+    Bound, PyAny, PyRef, PyRefMut, PyResult, Python,
    exceptions::PyRuntimeError,
    pyclass, pymethods,
    types::{PyAnyMethods, PyDict, PyDictMethods, PyType},
-    Bound, PyAny, PyRef, PyRefMut, PyResult, Python,
 };
 use pyo3_async_runtimes::tokio::future_into_py;

+fn table_from_py<'a>(table: Bound<'a, PyAny>) -> PyResult<Bound<'a, Table>> {
+    if table.hasattr("_inner")? {
+        Ok(table.getattr("_inner")?.downcast_into::<Table>()?)
+    } else if table.hasattr("_table")? {
+        Ok(table
+            .getattr("_table")?
+            .getattr("_inner")?
+            .downcast_into::<Table>()?)
+    } else {
+        Err(PyRuntimeError::new_err(
+            "Provided table does not appear to be a Table or RemoteTable instance",
+        ))
+    }
+}
+
 /// Create a permutation builder for the given table
 #[pyo3::pyfunction]
 pub fn async_permutation_builder(table: Bound<'_, PyAny>) -> PyResult<PyAsyncPermutationBuilder> {
-    let table = table.getattr("_inner")?.downcast_into::<Table>()?;
+    let table = table_from_py(table)?;
    let inner_table = table.borrow().inner_ref()?.clone();
    let inner_builder = LancePermutationBuilder::new(inner_table);

@@ -250,10 +265,8 @@ impl PyPermutationReader {
        permutation_table: Option<Bound<'py, PyAny>>,
        split: u64,
    ) -> PyResult<Bound<'py, PyAny>> {
-        let base_table = base_table.getattr("_inner")?.downcast_into::<Table>()?;
-        let permutation_table = permutation_table
-            .map(|p| PyResult::Ok(p.getattr("_inner")?.downcast_into::<Table>()?))
-            .transpose()?;
+        let base_table = table_from_py(base_table)?;
+        let permutation_table = permutation_table.map(table_from_py).transpose()?;

        let base_table = base_table.borrow().inner_ref()?.base_table().clone();
        let permutation_table = permutation_table
--- a/python/src/query.rs
+++ b/python/src/query.rs
@@ -4,9 +4,9 @@
 use std::sync::Arc;
 use std::time::Duration;

-use arrow::array::make_array;
 use arrow::array::Array;
 use arrow::array::ArrayData;
+use arrow::array::make_array;
 use arrow::pyarrow::FromPyArrow;
 use arrow::pyarrow::IntoPyArrow;
 use arrow::pyarrow::ToPyArrow;
@@ -22,23 +22,23 @@ use lancedb::query::{
    VectorQuery as LanceDbVectorQuery,
 };
 use lancedb::table::AnyQuery;
-use pyo3::prelude::{PyAnyMethods, PyDictMethods};
-use pyo3::pyfunction;
-use pyo3::pymethods;
-use pyo3::types::PyList;
-use pyo3::types::{PyDict, PyString};
 use pyo3::Bound;
 use pyo3::IntoPyObject;
 use pyo3::PyAny;
 use pyo3::PyRef;
 use pyo3::PyResult;
 use pyo3::Python;
-use pyo3::{exceptions::PyRuntimeError, FromPyObject};
+use pyo3::prelude::{PyAnyMethods, PyDictMethods};
+use pyo3::pyfunction;
+use pyo3::pymethods;
+use pyo3::types::PyList;
+use pyo3::types::{PyDict, PyString};
+use pyo3::{FromPyObject, exceptions::PyRuntimeError};
+use pyo3::{PyErr, pyclass};
 use pyo3::{
    exceptions::{PyNotImplementedError, PyValueError},
    intern,
 };
-use pyo3::{pyclass, PyErr};
 use pyo3_async_runtimes::tokio::future_into_py;

 use crate::util::parse_distance_type;
--- a/python/src/session.rs
+++ b/python/src/session.rs
@@ -4,7 +4,7 @@
 use std::sync::Arc;

 use lancedb::{ObjectStoreRegistry, Session as LanceSession};
-use pyo3::{pyclass, pymethods, PyResult};
+use pyo3::{PyResult, pyclass, pymethods};

 /// A session for managing caches and object stores across LanceDB operations.
 ///
--- a/python/src/storage_options.rs
+++ b/python/src/storage_options.rs
@@ -66,13 +66,10 @@ impl StorageOptionsProvider for PyStorageOptionsProviderWrapper {
                    .inner
                    .bind(py)
                    .call_method0("fetch_storage_options")
-                    .map_err(|e| lance_core::Error::IO {
-                        source: Box::new(std::io::Error::other(format!(
-                            "Failed to call fetch_storage_options: {}",
-                            e
-                        ))),
-                        location: snafu::location!(),
-                    })?;
+                    .map_err(|e| lance_core::Error::io_source(Box::new(std::io::Error::other(format!(
+                        "Failed to call fetch_storage_options: {}",
+                        e
+                    )))))?;

                // If result is None, return None
                if result.is_none() {
@@ -81,26 +78,19 @@ impl StorageOptionsProvider for PyStorageOptionsProviderWrapper {

                // Extract the result dict - should be a flat Map<String, String>
                let result_dict = result.downcast::<PyDict>().map_err(|_| {
-                    lance_core::Error::InvalidInput {
-                        source: "fetch_storage_options() must return None or a dict of string key-value pairs".into(),
-                        location: snafu::location!(),
-                    }
+                    lance_core::Error::invalid_input(
+                        "fetch_storage_options() must return a dict of string key-value pairs or None",
+                    )
                })?;

                // Convert all entries to HashMap<String, String>
                let mut storage_options = HashMap::new();
                for (key, value) in result_dict.iter() {
                    let key_str: String = key.extract().map_err(|e| {
-                        lance_core::Error::InvalidInput {
-                            source: format!("Storage option key must be a string: {}", e).into(),
-                            location: snafu::location!(),
-                        }
+                        lance_core::Error::invalid_input(format!("Storage option key must be a string: {}", e))
                    })?;
                    let value_str: String = value.extract().map_err(|e| {
-                        lance_core::Error::InvalidInput {
-                            source: format!("Storage option value must be a string: {}", e).into(),
-                            location: snafu::location!(),
-                        }
+                        lance_core::Error::invalid_input(format!("Storage option value must be a string: {}", e))
                    })?;
                    storage_options.insert(key_str, value_str);
                }
@@ -109,13 +99,10 @@ impl StorageOptionsProvider for PyStorageOptionsProviderWrapper {
            })
        })
        .await
-        .map_err(|e| lance_core::Error::IO {
-            source: Box::new(std::io::Error::other(format!(
-                "Task join error: {}",
-                e
-            ))),
-            location: snafu::location!(),
-        })?
+        .map_err(|e| lance_core::Error::io_source(Box::new(std::io::Error::other(format!(
+            "Task join error: {}",
+            e
+        )))))?
    }

    fn provider_id(&self) -> String {
--- a/python/src/table.rs
+++ b/python/src/table.rs
@@ -5,8 +5,9 @@ use std::{collections::HashMap, sync::Arc};
 use crate::{
    connection::Connection,
    error::PythonErrorExt,
-    index::{extract_index_params, IndexConfig},
+    index::{IndexConfig, extract_index_params},
    query::{Query, TakeQuery},
+    table::scannable::PyScannable,
 };
 use arrow::{
    datatypes::{DataType, Schema},
@@ -18,13 +19,15 @@ use lancedb::table::{
    Table as LanceDbTable,
 };
 use pyo3::{
+    Bound, FromPyObject, PyAny, PyRef, PyResult, Python,
    exceptions::{PyKeyError, PyRuntimeError, PyValueError},
    pyclass, pymethods,
    types::{IntoPyDict, PyAnyMethods, PyDict, PyDictMethods},
-    Bound, FromPyObject, PyAny, PyRef, PyResult, Python,
 };
 use pyo3_async_runtimes::tokio::future_into_py;

+mod scannable;
+
 /// Statistics about a compaction operation.
 #[pyclass(get_all)]
 #[derive(Clone, Debug)]
@@ -109,19 +112,24 @@ impl From<lancedb::table::AddResult> for AddResult {
 #[pyclass(get_all)]
 #[derive(Clone, Debug)]
 pub struct DeleteResult {
+    pub num_deleted_rows: u64,
    pub version: u64,
 }

 #[pymethods]
 impl DeleteResult {
    pub fn __repr__(&self) -> String {
-        format!("DeleteResult(version={})", self.version)
+        format!(
+            "DeleteResult(num_deleted_rows={}, version={})",
+            self.num_deleted_rows, self.version
+        )
    }
 }

 impl From<lancedb::table::DeleteResult> for DeleteResult {
    fn from(result: lancedb::table::DeleteResult) -> Self {
        Self {
+            num_deleted_rows: result.num_deleted_rows,
            version: result.version,
        }
    }
@@ -293,12 +301,10 @@ impl Table {

    pub fn add<'a>(
        self_: PyRef<'a, Self>,
-        data: Bound<'_, PyAny>,
+        data: PyScannable,
        mode: String,
    ) -> PyResult<Bound<'a, PyAny>> {
-        let batches: Box<dyn arrow::array::RecordBatchReader + Send> =
-            Box::new(ArrowArrayStreamReader::from_pyarrow_bound(&data)?);
-        let mut op = self_.inner_ref()?.add(batches);
+        let mut op = self_.inner_ref()?.add(data);
        if mode == "append" {
            op = op.mode(AddDataMode::Append);
        } else if mode == "overwrite" {
@@ -536,7 +542,7 @@ impl Table {
        let inner = self_.inner_ref()?.clone();
        future_into_py(self_.py(), async move {
            let versions = inner.list_versions().await.infer_error()?;
-            let versions_as_dict = Python::attach(|py| {
+            Python::attach(|py| {
                versions
                    .iter()
                    .map(|v| {
@@ -553,9 +559,7 @@ impl Table {
                        Ok(dict.unbind())
                    })
                    .collect::<PyResult<Vec<_>>>()
-            });
-
-            versions_as_dict
+            })
        })
    }

@@ -706,6 +710,9 @@ impl Table {
        if let Some(use_index) = parameters.use_index {
            builder.use_index(use_index);
        }
+        if let Some(mem_wal) = parameters.mem_wal {
+            builder.mem_wal(mem_wal);
+        }

        future_into_py(self_.py(), async move {
            let res = builder.execute(Box::new(batches)).await.infer_error()?;
@@ -866,6 +873,7 @@ pub struct MergeInsertParams {
    when_not_matched_by_source_condition: Option<String>,
    timeout: Option<std::time::Duration>,
    use_index: Option<bool>,
+    mem_wal: Option<bool>,
 }

 #[pyclass]
--- a/python/src/table/scannable.rs
+++ b/python/src/table/scannable.rs
@@ -0,0 +1,145 @@
+// SPDX-License-Identifier: Apache-2.0
+// SPDX-FileCopyrightText: Copyright The LanceDB Authors
+
+use std::sync::Arc;
+
+use arrow::{
+    datatypes::{Schema, SchemaRef},
+    ffi_stream::ArrowArrayStreamReader,
+    pyarrow::{FromPyArrow, PyArrowType},
+};
+use futures::StreamExt;
+use lancedb::{
+    Error,
+    arrow::{SendableRecordBatchStream, SimpleRecordBatchStream},
+    data::scannable::Scannable,
+};
+use pyo3::{FromPyObject, Py, PyAny, Python, types::PyAnyMethods};
+
+/// Adapter that implements Scannable for a Python reader factory callable.
+///
+/// This holds a Python callable that returns a RecordBatchReader when called.
+/// For rescannable sources, the callable can be invoked multiple times to
+/// get fresh readers.
+pub struct PyScannable {
+    /// Python callable that returns a RecordBatchReader
+    reader_factory: Py<PyAny>,
+    schema: SchemaRef,
+    num_rows: Option<usize>,
+    rescannable: bool,
+}
+
+impl std::fmt::Debug for PyScannable {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        f.debug_struct("PyScannable")
+            .field("schema", &self.schema)
+            .field("num_rows", &self.num_rows)
+            .field("rescannable", &self.rescannable)
+            .finish()
+    }
+}
+
+impl Scannable for PyScannable {
+    fn schema(&self) -> SchemaRef {
+        self.schema.clone()
+    }
+
+    fn scan_as_stream(&mut self) -> SendableRecordBatchStream {
+        let reader: Result<ArrowArrayStreamReader, Error> = {
+            Python::attach(|py| {
+                let result =
+                    self.reader_factory
+                        .call0(py)
+                        .map_err(|e| lancedb::Error::Runtime {
+                            message: format!("Python reader factory failed: {}", e),
+                        })?;
+                ArrowArrayStreamReader::from_pyarrow_bound(result.bind(py)).map_err(|e| {
+                    lancedb::Error::Runtime {
+                        message: format!("Failed to create Arrow reader from Python: {}", e),
+                    }
+                })
+            })
+        };
+
+        // Reader is blocking but stream is non-blocking, so we need to spawn a task to pull.
+        let (tx, rx) = tokio::sync::mpsc::channel(1);
+
+        let join_handle = tokio::task::spawn_blocking(move || {
+            let reader = match reader {
+                Ok(reader) => reader,
+                Err(e) => {
+                    let _ = tx.blocking_send(Err(e));
+                    return;
+                }
+            };
+            for batch in reader {
+                match batch {
+                    Ok(batch) => {
+                        if tx.blocking_send(Ok(batch)).is_err() {
+                            // Receiver dropped, stop processing
+                            break;
+                        }
+                    }
+                    Err(source) => {
+                        let _ = tx.blocking_send(Err(Error::Arrow { source }));
+                        break;
+                    }
+                }
+            }
+        });
+
+        let schema = self.schema.clone();
+        let stream = futures::stream::unfold(
+            (rx, Some(join_handle)),
+            |(mut rx, join_handle)| async move {
+                match rx.recv().await {
+                    Some(Ok(batch)) => Some((Ok(batch), (rx, join_handle))),
+                    Some(Err(e)) => Some((Err(e), (rx, join_handle))),
+                    None => {
+                        // Channel closed. Check if the task panicked — a panic
+                        // drops the sender without sending an error, so without
+                        // this check we'd silently return a truncated stream.
+                        if let Some(handle) = join_handle
+                            && let Err(join_err) = handle.await
+                        {
+                            return Some((
+                                Err(Error::Runtime {
+                                    message: format!("Reader task panicked: {}", join_err),
+                                }),
+                                (rx, None),
+                            ));
+                        }
+                        None
+                    }
+                }
+            },
+        );
+        Box::pin(SimpleRecordBatchStream::new(stream.fuse(), schema))
+    }
+
+    fn num_rows(&self) -> Option<usize> {
+        self.num_rows
+    }
+
+    fn rescannable(&self) -> bool {
+        self.rescannable
+    }
+}
+
+impl<'py> FromPyObject<'py> for PyScannable {
+    fn extract_bound(ob: &pyo3::Bound<'py, PyAny>) -> pyo3::PyResult<Self> {
+        // Convert from Scannable dataclass.
+        let schema: PyArrowType<Schema> = ob.getattr("schema")?.extract()?;
+        let schema = Arc::new(schema.0);
+        let num_rows: Option<usize> = ob.getattr("num_rows")?.extract()?;
+        let rescannable: bool = ob.getattr("rescannable")?.extract()?;
+        let reader_factory: Py<PyAny> = ob.getattr("reader")?.unbind();
+
+        Ok(Self {
+            schema,
+            reader_factory,
+            num_rows,
+            rescannable,
+        })
+    }
+}
--- a/python/src/util.rs
+++ b/python/src/util.rs
@@ -5,8 +5,9 @@ use std::sync::Mutex;

 use lancedb::DistanceType;
 use pyo3::{
+    PyResult,
    exceptions::{PyRuntimeError, PyValueError},
-    pyfunction, PyResult,
+    pyfunction,
 };

 /// A wrapper around a rust builder
--- a/rust-toolchain.toml
+++ b/rust-toolchain.toml
@@ -1,2 +1,2 @@
 [toolchain]
-channel = "1.90.0"
+channel = "1.91.0"
--- a/rust/lancedb/Cargo.toml
+++ b/rust/lancedb/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "lancedb"
-version = "0.26.2"
+version = "0.27.0-beta.3"
 edition.workspace = true
 description = "LanceDB: A serverless, low-latency vector database for AI applications"
 license.workspace = true
@@ -25,7 +25,9 @@ datafusion-catalog.workspace = true
 datafusion-common.workspace = true
 datafusion-execution.workspace = true
 datafusion-expr.workspace = true
+datafusion-functions.workspace = true
 datafusion-physical-expr.workspace = true
+datafusion-sql.workspace = true
 datafusion-physical-plan.workspace = true
 datafusion.workspace = true
 object_store = { workspace = true }
--- a/rust/lancedb/examples/bedrock.rs
+++ b/rust/lancedb/examples/bedrock.rs
@@ -9,10 +9,9 @@ use aws_config::Region;
 use aws_sdk_bedrockruntime::Client;
 use futures::StreamExt;
 use lancedb::{
-    connect,
-    embeddings::{bedrock::BedrockEmbeddingFunction, EmbeddingDefinition, EmbeddingFunction},
+    Result, connect,
+    embeddings::{EmbeddingDefinition, EmbeddingFunction, bedrock::BedrockEmbeddingFunction},
    query::{ExecutableQuery, QueryBase},
-    Result,
 };

 #[tokio::main]
--- a/rust/lancedb/examples/full_text_search.rs
+++ b/rust/lancedb/examples/full_text_search.rs
@@ -10,10 +10,10 @@ use futures::TryStreamExt;
 use lance_index::scalar::FullTextSearchQuery;
 use lancedb::connection::Connection;

-use lancedb::index::scalar::FtsIndexBuilder;
 use lancedb::index::Index;
+use lancedb::index::scalar::FtsIndexBuilder;
 use lancedb::query::{ExecutableQuery, QueryBase};
-use lancedb::{connect, Result, Table};
+use lancedb::{Result, Table, connect};
 use rand::random;

 #[tokio::main]
@@ -46,19 +46,21 @@ fn create_some_records() -> Result<Box<dyn arrow_array::RecordBatchReader + Send
        .collect::<Vec<_>>();
    let n_terms = 3;
    let batches = RecordBatchIterator::new(
-        vec![RecordBatch::try_new(
-            schema.clone(),
-            vec![
-                Arc::new(Int32Array::from_iter_values(0..TOTAL as i32)),
-                Arc::new(StringArray::from_iter_values((0..TOTAL).map(|_| {
-                    (0..n_terms)
-                        .map(|_| words[random::<u32>() as usize % words.len()])
-                        .collect::<Vec<_>>()
-                        .join(" ")
-                }))),
-            ],
-        )
-        .unwrap()]
+        vec![
+            RecordBatch::try_new(
+                schema.clone(),
+                vec![
+                    Arc::new(Int32Array::from_iter_values(0..TOTAL as i32)),
+                    Arc::new(StringArray::from_iter_values((0..TOTAL).map(|_| {
+                        (0..n_terms)
+                            .map(|_| words[random::<u32>() as usize % words.len()])
+                            .collect::<Vec<_>>()
+                            .join(" ")
+                    }))),
+                ],
+            )
+            .unwrap(),
+        ]
        .into_iter()
        .map(Ok),
        schema.clone(),
--- a/rust/lancedb/examples/hybrid_search.rs
+++ b/rust/lancedb/examples/hybrid_search.rs
@@ -5,16 +5,15 @@ use arrow_array::{RecordBatch, StringArray};
 use arrow_schema::{DataType, Field, Schema};
 use futures::TryStreamExt;
 use lance_index::scalar::FullTextSearchQuery;
-use lancedb::index::scalar::FtsIndexBuilder;
 use lancedb::index::Index;
+use lancedb::index::scalar::FtsIndexBuilder;
 use lancedb::{
-    connect,
+    Result, Table, connect,
    embeddings::{
-        sentence_transformers::SentenceTransformersEmbeddings, EmbeddingDefinition,
-        EmbeddingFunction,
+        EmbeddingDefinition, EmbeddingFunction,
+        sentence_transformers::SentenceTransformersEmbeddings,
    },
    query::{QueryBase, QueryExecutionOptions},
-    Result, Table,
 };
 use std::{iter::once, sync::Arc};

--- a/rust/lancedb/examples/ivf_pq.rs
+++ b/rust/lancedb/examples/ivf_pq.rs
@@ -14,10 +14,10 @@ use arrow_schema::{DataType, Field, Schema};
 use futures::TryStreamExt;
 use lancedb::connection::Connection;

-use lancedb::index::vector::IvfPqIndexBuilder;
 use lancedb::index::Index;
+use lancedb::index::vector::IvfPqIndexBuilder;
 use lancedb::query::{ExecutableQuery, QueryBase};
-use lancedb::{connect, DistanceType, Result, Table};
+use lancedb::{DistanceType, Result, Table, connect};

 #[tokio::main]
 async fn main() -> Result<()> {
@@ -51,19 +51,21 @@ fn create_some_records() -> Result<Box<dyn arrow_array::RecordBatchReader + Send

    // Create a RecordBatch stream.
    let batches = RecordBatchIterator::new(
-        vec![RecordBatch::try_new(
-            schema.clone(),
-            vec![
-                Arc::new(Int32Array::from_iter_values(0..TOTAL as i32)),
-                Arc::new(
-                    FixedSizeListArray::from_iter_primitive::<Float32Type, _, _>(
-                        (0..TOTAL).map(|_| Some(vec![Some(1.0); DIM])),
-                        DIM as i32,
+        vec![
+            RecordBatch::try_new(
+                schema.clone(),
+                vec![
+                    Arc::new(Int32Array::from_iter_values(0..TOTAL as i32)),
+                    Arc::new(
+                        FixedSizeListArray::from_iter_primitive::<Float32Type, _, _>(
+                            (0..TOTAL).map(|_| Some(vec![Some(1.0); DIM])),
+                            DIM as i32,
+                        ),
                    ),
-                ),
-            ],
-        )
-        .unwrap()]
+                ],
+            )
+            .unwrap(),
+        ]
        .into_iter()
        .map(Ok),
        schema.clone(),
--- a/rust/lancedb/examples/openai.rs
+++ b/rust/lancedb/examples/openai.rs
@@ -8,10 +8,9 @@ use std::{iter::once, sync::Arc};
 use arrow_array::{RecordBatch, StringArray};
 use futures::StreamExt;
 use lancedb::{
-    connect,
-    embeddings::{openai::OpenAIEmbeddingFunction, EmbeddingDefinition, EmbeddingFunction},
+    Result, connect,
+    embeddings::{EmbeddingDefinition, EmbeddingFunction, openai::OpenAIEmbeddingFunction},
    query::{ExecutableQuery, QueryBase},
-    Result,
 };

 // --8<-- [end:imports]
--- a/rust/lancedb/examples/sentence_transformers.rs
+++ b/rust/lancedb/examples/sentence_transformers.rs
@@ -7,13 +7,12 @@ use arrow_array::{RecordBatch, StringArray};
 use arrow_schema::{DataType, Field, Schema};
 use futures::StreamExt;
 use lancedb::{
-    connect,
+    Result, connect,
    embeddings::{
-        sentence_transformers::SentenceTransformersEmbeddings, EmbeddingDefinition,
-        EmbeddingFunction,
+        EmbeddingDefinition, EmbeddingFunction,
+        sentence_transformers::SentenceTransformersEmbeddings,
    },
    query::{ExecutableQuery, QueryBase},
-    Result,
 };

 #[tokio::main]
--- a/rust/lancedb/examples/simple.rs
+++ b/rust/lancedb/examples/simple.rs
@@ -14,7 +14,7 @@ use futures::TryStreamExt;
 use lancedb::connection::Connection;
 use lancedb::index::Index;
 use lancedb::query::{ExecutableQuery, QueryBase};
-use lancedb::{connect, Result, Table as LanceDbTable};
+use lancedb::{Result, Table as LanceDbTable, connect};

 #[tokio::main]
 async fn main() -> Result<()> {
--- a/rust/lancedb/src/arrow.rs
+++ b/rust/lancedb/src/arrow.rs
@@ -12,7 +12,7 @@ use lance_datagen::{BatchCount, BatchGeneratorBuilder, RowCount};
 #[cfg(feature = "polars")]
 use {crate::polars_arrow_convertors, polars::frame::ArrowChunk, polars::prelude::DataFrame};

-use crate::{error::Result, Error};
+use crate::{Error, error::Result};

 /// An iterator of batches that also has a schema
 pub trait RecordBatchReader: Iterator<Item = Result<arrow_array::RecordBatch>> {
@@ -155,9 +155,7 @@ impl IntoArrowStream for SendableRecordBatchStream {
 impl IntoArrowStream for datafusion_physical_plan::SendableRecordBatchStream {
    fn into_arrow(self) -> Result<SendableRecordBatchStream> {
        let schema = self.schema();
-        let stream = self.map_err(|df_err| Error::Runtime {
-            message: df_err.to_string(),
-        });
+        let stream = self.map_err(|df_err| df_err.into());
        Ok(Box::pin(SimpleRecordBatchStream::new(stream, schema)))
    }
 }
--- a/rust/lancedb/src/connection.rs
+++ b/rust/lancedb/src/connection.rs
@@ -17,6 +17,7 @@ use lance_namespace::models::{
 #[cfg(feature = "aws")]
 use object_store::aws::AwsCredential;

+use crate::Table;
 use crate::connection::create_table::CreateTableBuilder;
 use crate::data::scannable::Scannable;
 use crate::database::listing::ListingDatabase;
@@ -31,7 +32,6 @@ use crate::remote::{
    client::ClientConfig,
    db::{OPT_REMOTE_API_KEY, OPT_REMOTE_HOST_OVERRIDE, OPT_REMOTE_REGION},
 };
-use crate::Table;
 use lance::io::ObjectStoreParams;
 pub use lance_encoding::version::LanceFileVersion;
 #[cfg(feature = "remote")]
@@ -566,8 +566,11 @@ pub struct ConnectBuilder {
 }

 #[cfg(feature = "remote")]
-const ENV_VARS_TO_STORAGE_OPTS: [(&str, &str); 1] =
-    [("AZURE_STORAGE_ACCOUNT_NAME", "azure_storage_account_name")];
+const ENV_VARS_TO_STORAGE_OPTS: [(&str, &str); 3] = [
+    ("AZURE_STORAGE_ACCOUNT_NAME", "azure_storage_account_name"),
+    ("AZURE_CLIENT_ID", "azure_client_id"),
+    ("AZURE_TENANT_ID", "azure_tenant_id"),
+];

 impl ConnectBuilder {
    /// Create a new [`ConnectOptions`] with the given database URI.
@@ -758,10 +761,10 @@ impl ConnectBuilder {
        options: &mut HashMap<String, String>,
    ) {
        for (env_key, opt_key) in env_var_to_remote_storage_option {
-            if let Ok(env_value) = std::env::var(env_key) {
-                if !options.contains_key(*opt_key) {
-                    options.insert((*opt_key).to_string(), env_value);
-                }
+            if let Ok(env_value) = std::env::var(env_key)
+                && !options.contains_key(*opt_key)
+            {
+                options.insert((*opt_key).to_string(), env_value);
            }
        }
    }
@@ -781,13 +784,19 @@ impl ConnectBuilder {
            message: "An api_key is required when connecting to LanceDb Cloud".to_string(),
        })?;

+        // Propagate mem_wal_enabled from options to client_config
+        let mut client_config = self.request.client_config;
+        if options.mem_wal_enabled.is_some() {
+            client_config.mem_wal_enabled = options.mem_wal_enabled;
+        }
+
        let storage_options = StorageOptions(options.storage_options.clone());
        let internal = Arc::new(crate::remote::db::RemoteDatabase::try_new(
            &self.request.uri,
            &api_key,
            &region,
            options.host_override,
-            self.request.client_config,
+            client_config,
            storage_options.into(),
        )?);
        Ok(Connection {
@@ -1011,14 +1020,13 @@ mod tests {
    #[cfg(feature = "remote")]
    #[test]
    fn test_apply_env_defaults() {
-        let env_key = "TEST_APPLY_ENV_DEFAULTS_ENVIRONMENT_VARIABLE_ENV_KEY";
-        let env_val = "TEST_APPLY_ENV_DEFAULTS_ENVIRONMENT_VARIABLE_ENV_VAL";
+        let env_key = "PATH";
+        let env_val = std::env::var(env_key).expect("PATH should be set in test environment");
        let opts_key = "test_apply_env_defaults_environment_variable_opts_key";
-        std::env::set_var(env_key, env_val);

        let mut options = HashMap::new();
        ConnectBuilder::apply_env_defaults(&[(env_key, opts_key)], &mut options);
-        assert_eq!(Some(&env_val.to_string()), options.get(opts_key));
+        assert_eq!(Some(&env_val), options.get(opts_key));

        options.insert(opts_key.to_string(), "EXPLICIT-VALUE".to_string());
        ConnectBuilder::apply_env_defaults(&[(env_key, opts_key)], &mut options);
--- a/rust/lancedb/src/connection/create_table.rs
+++ b/rust/lancedb/src/connection/create_table.rs
@@ -6,12 +6,12 @@ use std::sync::Arc;
 use lance_io::object_store::StorageOptionsProvider;

 use crate::{
+    Error, Result, Table,
    connection::{merge_storage_options, set_storage_options_provider},
    data::scannable::{Scannable, WithEmbeddingsScannable},
    database::{CreateTableMode, CreateTableRequest, Database},
    embeddings::{EmbeddingDefinition, EmbeddingFunction, EmbeddingRegistry},
    table::WriteOptions,
-    Error, Result, Table,
 };

 pub struct CreateTableBuilder {
@@ -167,7 +167,7 @@ impl CreateTableBuilder {
 #[cfg(test)]
 mod tests {
    use arrow_array::{
-        record_batch, Array, FixedSizeListArray, Float32Array, RecordBatch, RecordBatchIterator,
+        Array, FixedSizeListArray, Float32Array, RecordBatch, RecordBatchIterator, record_batch,
    };
    use arrow_schema::{ArrowError, DataType, Field, Schema};
    use futures::TryStreamExt;
@@ -380,11 +380,12 @@ mod tests {
            .await
            .unwrap();
        let other_schema = Arc::new(Schema::new(vec![Field::new("y", DataType::Int32, false)]));
-        assert!(db
-            .create_empty_table("test", other_schema.clone())
-            .execute()
-            .await
-            .is_err()); // TODO: assert what this error is
+        assert!(
+            db.create_empty_table("test", other_schema.clone())
+                .execute()
+                .await
+                .is_err()
+        ); // TODO: assert what this error is
        let overwritten = db
            .create_empty_table("test", other_schema.clone())
            .mode(CreateTableMode::Overwrite)
--- a/rust/lancedb/src/data/inspect.rs
+++ b/rust/lancedb/src/data/inspect.rs
@@ -5,9 +5,9 @@ use std::collections::HashMap;

 use arrow::compute::kernels::{aggregate::bool_and, length::length};
 use arrow_array::{
+    Array, GenericListArray, OffsetSizeTrait, PrimitiveArray, RecordBatchReader,
    cast::AsArray,
    types::{ArrowPrimitiveType, Int32Type, Int64Type},
-    Array, GenericListArray, OffsetSizeTrait, PrimitiveArray, RecordBatchReader,
 };
 use arrow_ord::cmp::eq;
 use arrow_schema::DataType;
@@ -78,7 +78,7 @@ pub fn infer_vector_columns(
                _ => {
                    return Err(Error::Schema {
                        message: format!("Column {} is not a list", col_name),
-                    })
+                    });
                }
            } {
                if let Some(Some(prev_dim)) = columns_to_infer.get(&col_name) {
@@ -102,8 +102,8 @@ mod tests {
    use super::*;

    use arrow_array::{
-        types::{Float32Type, Float64Type},
        FixedSizeListArray, Float32Array, ListArray, RecordBatch, RecordBatchIterator, StringArray,
+        types::{Float32Type, Float64Type},
    };
    use arrow_schema::{DataType, Field, Schema};
    use std::{sync::Arc, vec};
--- a/rust/lancedb/src/data/sanitize.rs
+++ b/rust/lancedb/src/data/sanitize.rs
@@ -4,10 +4,10 @@
 use std::{iter::repeat_with, sync::Arc};

 use arrow_array::{
-    cast::AsArray,
-    types::{Float16Type, Float32Type, Float64Type, Int32Type, Int64Type},
    Array, ArrowNumericType, FixedSizeListArray, PrimitiveArray, RecordBatch, RecordBatchIterator,
    RecordBatchReader,
+    cast::AsArray,
+    types::{Float16Type, Float32Type, Float64Type, Int32Type, Int64Type},
 };
 use arrow_cast::{can_cast_types, cast};
 use arrow_schema::{ArrowError, DataType, Field, Schema};
@@ -184,7 +184,7 @@ mod tests {
    use std::sync::Arc;

    use arrow_array::{
-        FixedSizeListArray, Float16Array, Float32Array, Float64Array, Int32Array, Int8Array,
+        FixedSizeListArray, Float16Array, Float32Array, Float64Array, Int8Array, Int32Array,
        RecordBatch, RecordBatchIterator, StringArray,
    };
    use arrow_schema::Field;
--- a/rust/lancedb/src/data/scannable.rs
+++ b/rust/lancedb/src/data/scannable.rs
@@ -9,22 +9,21 @@

 use std::sync::Arc;

-use arrow_array::{RecordBatch, RecordBatchIterator, RecordBatchReader};
-use arrow_schema::{ArrowError, SchemaRef};
-use async_trait::async_trait;
-use futures::stream::once;
-use futures::StreamExt;
-use lance_datafusion::utils::StreamingWriteSource;
-
 use crate::arrow::{
    SendableRecordBatchStream, SendableRecordBatchStreamExt, SimpleRecordBatchStream,
 };
 use crate::embeddings::{
-    compute_embeddings_for_batch, compute_output_schema, EmbeddingDefinition, EmbeddingFunction,
-    EmbeddingRegistry,
+    EmbeddingDefinition, EmbeddingFunction, EmbeddingRegistry, compute_embeddings_for_batch,
+    compute_output_schema,
 };
 use crate::table::{ColumnDefinition, ColumnKind, TableDefinition};
 use crate::{Error, Result};
+use arrow_array::{ArrayRef, RecordBatch, RecordBatchIterator, RecordBatchReader};
+use arrow_schema::{ArrowError, SchemaRef};
+use async_trait::async_trait;
+use futures::StreamExt;
+use futures::stream::once;
+use lance_datafusion::utils::StreamingWriteSource;

 pub trait Scannable: Send {
    /// Returns the schema of the data.
@@ -228,6 +227,19 @@ impl WithEmbeddingsScannable {
        let table_definition = TableDefinition::new(output_schema, column_definitions);
        let output_schema = table_definition.into_rich_schema();

+        Self::with_schema(inner, embeddings, output_schema)
+    }
+
+    /// Create a WithEmbeddingsScannable with a specific output schema.
+    ///
+    /// Use this when the table schema is already known (e.g. during add) to
+    /// avoid nullability mismatches between the embedding function's declared
+    /// type and the table's stored type.
+    pub fn with_schema(
+        inner: Box<dyn Scannable>,
+        embeddings: Vec<(EmbeddingDefinition, Arc<dyn EmbeddingFunction>)>,
+        output_schema: SchemaRef,
+    ) -> Result<Self> {
        Ok(Self {
            inner,
            embeddings,
@@ -245,9 +257,11 @@ impl Scannable for WithEmbeddingsScannable {
        let inner_stream = self.inner.scan_as_stream();
        let embeddings = self.embeddings.clone();
        let output_schema = self.output_schema.clone();
+        let stream_schema = output_schema.clone();

        let mapped_stream = inner_stream.then(move |batch_result| {
            let embeddings = embeddings.clone();
+            let output_schema = output_schema.clone();
            async move {
                let batch = batch_result?;
                let result = tokio::task::spawn_blocking(move || {
@@ -257,12 +271,29 @@ impl Scannable for WithEmbeddingsScannable {
                .map_err(|e| Error::Runtime {
                    message: format!("Task panicked during embedding computation: {}", e),
                })??;
+                // Cast columns to match the declared output schema. The data is
+                // identical but field metadata (e.g. nested nullability) may
+                // differ between the embedding function output and the table.
+                let columns: Vec<ArrayRef> = result
+                    .columns()
+                    .iter()
+                    .enumerate()
+                    .map(|(i, col)| {
+                        let target_type = output_schema.field(i).data_type();
+                        if col.data_type() == target_type {
+                            Ok(col.clone())
+                        } else {
+                            arrow_cast::cast(col, target_type).map_err(Error::from)
+                        }
+                    })
+                    .collect::<Result<_>>()?;
+                let result = RecordBatch::try_new(output_schema, columns)?;
                Ok(result)
            }
        });

        Box::pin(SimpleRecordBatchStream {
-            schema: output_schema,
+            schema: stream_schema,
            stream: mapped_stream,
        })
    }
@@ -303,8 +334,13 @@ pub fn scannable_with_embeddings(
        }

        if !embeddings.is_empty() {
-            return Ok(Box::new(WithEmbeddingsScannable::try_new(
-                inner, embeddings,
+            // Use the table's schema so embedding column types (including nested
+            // nullability) match what's stored, avoiding mismatches with the
+            // embedding function's declared dest_type.
+            return Ok(Box::new(WithEmbeddingsScannable::with_schema(
+                inner,
+                embeddings,
+                table_definition.schema.clone(),
            )?));
        }
    }
@@ -312,6 +348,133 @@ pub fn scannable_with_embeddings(
    Ok(inner)
 }

+/// A wrapper that buffers the first RecordBatch from a Scannable so we can
+/// inspect it (e.g. to estimate data size) without losing it.
+pub(crate) struct PeekedScannable {
+    inner: Box<dyn Scannable>,
+    peeked: Option<RecordBatch>,
+    /// The first item from the stream, if it was an error. Stored so we can
+    /// re-emit it from `scan_as_stream` instead of silently dropping it.
+    first_error: Option<crate::Error>,
+    stream: Option<SendableRecordBatchStream>,
+}
+
+impl PeekedScannable {
+    pub fn new(inner: Box<dyn Scannable>) -> Self {
+        Self {
+            inner,
+            peeked: None,
+            first_error: None,
+            stream: None,
+        }
+    }
+
+    /// Reads and buffers the first batch from the inner scannable.
+    /// Returns a clone of it. Subsequent calls return the same batch.
+    ///
+    /// Returns `None` if the stream is empty or the first item is an error.
+    /// Errors are preserved and re-emitted by `scan_as_stream`.
+    pub async fn peek(&mut self) -> Option<RecordBatch> {
+        if self.peeked.is_some() {
+            return self.peeked.clone();
+        }
+        // Already peeked and got an error or empty stream.
+        if self.stream.is_some() || self.first_error.is_some() {
+            return None;
+        }
+        let mut stream = self.inner.scan_as_stream();
+        match stream.next().await {
+            Some(Ok(batch)) => {
+                self.peeked = Some(batch.clone());
+                self.stream = Some(stream);
+                Some(batch)
+            }
+            Some(Err(e)) => {
+                self.first_error = Some(e);
+                self.stream = Some(stream);
+                None
+            }
+            None => {
+                self.stream = Some(stream);
+                None
+            }
+        }
+    }
+}
+
+impl Scannable for PeekedScannable {
+    fn schema(&self) -> SchemaRef {
+        self.inner.schema()
+    }
+
+    fn num_rows(&self) -> Option<usize> {
+        self.inner.num_rows()
+    }
+
+    fn rescannable(&self) -> bool {
+        self.inner.rescannable()
+    }
+
+    fn scan_as_stream(&mut self) -> SendableRecordBatchStream {
+        let schema = self.inner.schema();
+
+        // If peek() hit an error, prepend it so downstream sees the error.
+        let error_item = self.first_error.take().map(Err);
+
+        match (self.peeked.take(), self.stream.take()) {
+            (Some(batch), Some(rest)) => {
+                let prepend = futures::stream::once(std::future::ready(Ok(batch)));
+                Box::pin(SimpleRecordBatchStream {
+                    schema,
+                    stream: prepend.chain(rest),
+                })
+            }
+            (Some(batch), None) => Box::pin(SimpleRecordBatchStream {
+                schema,
+                stream: futures::stream::once(std::future::ready(Ok(batch))),
+            }),
+            (None, Some(rest)) => {
+                if let Some(err) = error_item {
+                    let stream = futures::stream::once(std::future::ready(err));
+                    Box::pin(SimpleRecordBatchStream { schema, stream })
+                } else {
+                    rest
+                }
+            }
+            (None, None) => {
+                // peek() was never called — just delegate
+                self.inner.scan_as_stream()
+            }
+        }
+    }
+}
+
+/// Compute the number of write partitions based on data size estimates.
+///
+/// `sample_bytes` and `sample_rows` come from a representative batch and are
+/// used to estimate per-row size. `total_rows_hint` is the total row count
+/// when known; otherwise `sample_rows` row count is used as a lower bound
+/// estimate.
+///
+/// Targets roughly 1 million rows or 2 GB per partition, capped at
+/// `max_partitions` (typically the number of available CPU cores).
+pub(crate) fn estimate_write_partitions(
+    sample_bytes: usize,
+    sample_rows: usize,
+    total_rows_hint: Option<usize>,
+    max_partitions: usize,
+) -> usize {
+    if sample_rows == 0 {
+        return 1;
+    }
+    let bytes_per_row = sample_bytes / sample_rows;
+    let total_rows = total_rows_hint.unwrap_or(sample_rows);
+    let total_bytes = total_rows * bytes_per_row;
+    let by_rows = total_rows.div_ceil(1_000_000);
+    let by_bytes = total_bytes.div_ceil(2 * 1024 * 1024 * 1024);
+    by_rows.max(by_bytes).max(1).min(max_partitions)
+}
+
 #[cfg(test)]
 mod tests {
    use super::*;
@@ -408,6 +571,231 @@ mod tests {
        assert!(result2.unwrap().is_err());
    }

+    mod peeked_scannable_tests {
+        use crate::test_utils::TestCustomError;
+
+        use super::*;
+
+        #[tokio::test]
+        async fn test_peek_returns_first_batch() {
+            let batch = record_batch!(("id", Int64, [1, 2, 3])).unwrap();
+            let mut peeked = PeekedScannable::new(Box::new(batch.clone()));
+
+            let first = peeked.peek().await.unwrap();
+            assert_eq!(first, batch);
+        }
+
+        #[tokio::test]
+        async fn test_peek_is_idempotent() {
+            let batch = record_batch!(("id", Int64, [1, 2, 3])).unwrap();
+            let mut peeked = PeekedScannable::new(Box::new(batch.clone()));
+
+            let first = peeked.peek().await.unwrap();
+            let second = peeked.peek().await.unwrap();
+            assert_eq!(first, second);
+        }
+
+        #[tokio::test]
+        async fn test_scan_after_peek_returns_all_data() {
+            let batches = vec![
+                record_batch!(("id", Int64, [1, 2])).unwrap(),
+                record_batch!(("id", Int64, [3, 4, 5])).unwrap(),
+            ];
+            let mut peeked = PeekedScannable::new(Box::new(batches.clone()));
+
+            let first = peeked.peek().await.unwrap();
+            assert_eq!(first, batches[0]);
+
+            let result: Vec<RecordBatch> = peeked.scan_as_stream().try_collect().await.unwrap();
+            assert_eq!(result.len(), 2);
+            assert_eq!(result[0], batches[0]);
+            assert_eq!(result[1], batches[1]);
+        }
+
+        #[tokio::test]
+        async fn test_scan_without_peek_passes_through() {
+            let batch = record_batch!(("id", Int64, [1, 2, 3])).unwrap();
+            let mut peeked = PeekedScannable::new(Box::new(batch.clone()));
+
+            let result: Vec<RecordBatch> = peeked.scan_as_stream().try_collect().await.unwrap();
+            assert_eq!(result.len(), 1);
+            assert_eq!(result[0], batch);
+        }
+
+        #[tokio::test]
+        async fn test_delegates_num_rows() {
+            let batches = vec![
+                record_batch!(("id", Int64, [1, 2])).unwrap(),
+                record_batch!(("id", Int64, [3])).unwrap(),
+            ];
+            let peeked = PeekedScannable::new(Box::new(batches));
+            assert_eq!(peeked.num_rows(), Some(3));
+        }
+
+        #[tokio::test]
+        async fn test_non_rescannable_stream_data_preserved() {
+            let batches = vec![
+                record_batch!(("id", Int64, [1, 2])).unwrap(),
+                record_batch!(("id", Int64, [3])).unwrap(),
+            ];
+            let schema = batches[0].schema();
+            let inner = futures::stream::iter(batches.clone().into_iter().map(Ok));
+            let stream: SendableRecordBatchStream = Box::pin(SimpleRecordBatchStream {
+                schema,
+                stream: inner,
+            });
+
+            let mut peeked = PeekedScannable::new(Box::new(stream));
+            assert!(!peeked.rescannable());
+            assert_eq!(peeked.num_rows(), None);
+
+            let first = peeked.peek().await.unwrap();
+            assert_eq!(first, batches[0]);
+
+            // All data is still available via scan_as_stream
+            let result: Vec<RecordBatch> = peeked.scan_as_stream().try_collect().await.unwrap();
+            assert_eq!(result.len(), 2);
+            assert_eq!(result[0], batches[0]);
+            assert_eq!(result[1], batches[1]);
+        }
+
+        #[tokio::test]
+        async fn test_error_in_first_batch_propagates() {
+            let schema = Arc::new(arrow_schema::Schema::new(vec![arrow_schema::Field::new(
+                "id",
+                arrow_schema::DataType::Int64,
+                false,
+            )]));
+            let inner = futures::stream::iter(vec![Err(Error::External {
+                source: Box::new(TestCustomError),
+            })]);
+            let stream: SendableRecordBatchStream = Box::pin(SimpleRecordBatchStream {
+                schema,
+                stream: inner,
+            });
+
+            let mut peeked = PeekedScannable::new(Box::new(stream));
+
+            // peek returns None for errors
+            assert!(peeked.peek().await.is_none());
+
+            // But the error should come through when scanning
+            let mut stream = peeked.scan_as_stream();
+            let first = stream.next().await.unwrap();
+            assert!(first.is_err());
+            let err = first.unwrap_err();
+            assert!(
+                matches!(&err, Error::External { source } if source.downcast_ref::<TestCustomError>().is_some()),
+                "Expected TestCustomError to be preserved, got: {err}"
+            );
+        }
+
+        #[tokio::test]
+        async fn test_error_in_later_batch_propagates() {
+            let good_batch = record_batch!(("id", Int64, [1, 2])).unwrap();
+            let schema = good_batch.schema();
+            let inner = futures::stream::iter(vec![
+                Ok(good_batch.clone()),
+                Err(Error::External {
+                    source: Box::new(TestCustomError),
+                }),
+            ]);
+            let stream: SendableRecordBatchStream = Box::pin(SimpleRecordBatchStream {
+                schema,
+                stream: inner,
+            });
+
+            let mut peeked = PeekedScannable::new(Box::new(stream));
+
+            // peek succeeds with the first batch
+            let first = peeked.peek().await.unwrap();
+            assert_eq!(first, good_batch);
+
+            // scan_as_stream should yield the first batch, then the error
+            let mut stream = peeked.scan_as_stream();
+            let batch1 = stream.next().await.unwrap().unwrap();
+            assert_eq!(batch1, good_batch);
+
+            let batch2 = stream.next().await.unwrap();
+            assert!(batch2.is_err());
+            let err = batch2.unwrap_err();
+            assert!(
+                matches!(&err, Error::External { source } if source.downcast_ref::<TestCustomError>().is_some()),
+                "Expected TestCustomError to be preserved, got: {err}"
+            );
+        }
+
+        #[tokio::test]
+        async fn test_empty_stream_returns_none() {
+            let schema = Arc::new(arrow_schema::Schema::new(vec![arrow_schema::Field::new(
+                "id",
+                arrow_schema::DataType::Int64,
+                false,
+            )]));
+            let inner = futures::stream::empty();
+            let stream: SendableRecordBatchStream = Box::pin(SimpleRecordBatchStream {
+                schema,
+                stream: inner,
+            });
+
+            let mut peeked = PeekedScannable::new(Box::new(stream));
+            assert!(peeked.peek().await.is_none());
+
+            // Scanning an empty (post-peek) stream should yield nothing
+            let result: Vec<RecordBatch> = peeked.scan_as_stream().try_collect().await.unwrap();
+            assert!(result.is_empty());
+        }
+    }
+
+    mod estimate_write_partitions_tests {
+        use super::*;
+
+        #[test]
+        fn test_small_data_single_partition() {
+            // 100 rows * 24 bytes/row = 2400 bytes — well under both thresholds
+            assert_eq!(estimate_write_partitions(2400, 100, Some(100), 8), 1);
+        }
+
+        #[test]
+        fn test_scales_by_row_count() {
+            // 2.5M rows at 24 bytes/row — row threshold dominates
+            // ceil(2_500_000 / 1_000_000) = 3
+            assert_eq!(estimate_write_partitions(72, 3, Some(2_500_000), 8), 3);
+        }
+
+        #[test]
+        fn test_scales_by_byte_size() {
+            // 100k rows at 40KB/row = ~4GB total → ceil(4GB / 2GB) = 2
+            let sample_bytes = 40_000 * 10;
+            assert_eq!(
+                estimate_write_partitions(sample_bytes, 10, Some(100_000), 8),
+                2
+            );
+        }
+
+        #[test]
+        fn test_capped_at_max_partitions() {
+            // 10M rows would want 10 partitions, but capped at 4
+            assert_eq!(estimate_write_partitions(72, 3, Some(10_000_000), 4), 4);
+        }
+
+        #[test]
+        fn test_zero_sample_rows_returns_one() {
+            assert_eq!(estimate_write_partitions(0, 0, Some(1_000_000), 8), 1);
+        }
+
+        #[test]
+        fn test_no_row_hint_uses_sample_size() {
+            // Without a hint, uses sample_rows (3), which is small
+            assert_eq!(estimate_write_partitions(72, 3, None, 8), 1);
+        }
+
+        #[test]
+        fn test_always_at_least_one() {
+            assert_eq!(estimate_write_partitions(24, 1, Some(1), 8), 1);
+        }
+    }
+
    mod embedding_tests {
        use super::*;
        use crate::embeddings::MemoryRegistry;
--- a/rust/lancedb/src/database.rs
+++ b/rust/lancedb/src/database.rs
@@ -19,12 +19,12 @@ use std::sync::Arc;
 use std::time::Duration;

 use lance::dataset::ReadParams;
+use lance_namespace::LanceNamespace;
 use lance_namespace::models::{
    CreateNamespaceRequest, CreateNamespaceResponse, DescribeNamespaceRequest,
    DescribeNamespaceResponse, DropNamespaceRequest, DropNamespaceResponse, ListNamespacesRequest,
    ListNamespacesResponse, ListTablesRequest, ListTablesResponse,
 };
-use lance_namespace::LanceNamespace;

 use crate::data::scannable::Scannable;
 use crate::error::Result;
@@ -85,8 +85,10 @@ pub type TableBuilderCallback = Box<dyn FnOnce(OpenTableRequest) -> OpenTableReq

 /// Describes what happens when creating a table and a table with
 /// the same name already exists
+#[derive(Default)]
 pub enum CreateTableMode {
    /// If the table already exists, an error is returned
+    #[default]
    Create,
    /// If the table already exists, it is opened.  Any provided data is
    /// ignored.  The function will be passed an OpenTableBuilder to customize
@@ -104,12 +106,6 @@ impl CreateTableMode {
    }
 }

-impl Default for CreateTableMode {
-    fn default() -> Self {
-        Self::Create
-    }
-}
-
 /// A request to create a table
 pub struct CreateTableRequest {
    /// The name of the new table
--- a/rust/lancedb/src/database/listing.rs
+++ b/rust/lancedb/src/database/listing.rs
@@ -8,7 +8,7 @@ use std::path::Path;
 use std::{collections::HashMap, sync::Arc};

 use lance::dataset::refs::Ref;
-use lance::dataset::{builder::DatasetBuilder, ReadParams, WriteMode};
+use lance::dataset::{ReadParams, WriteMode, builder::DatasetBuilder};
 use lance::io::{ObjectStore, ObjectStoreParams, WrappingObjectStore};
 use lance_datafusion::utils::StreamingWriteSource;
 use lance_encoding::version::LanceFileVersion;
@@ -1097,11 +1097,11 @@ impl Database for ListingDatabase {
 #[cfg(test)]
 mod tests {
    use super::*;
+    use crate::Table;
    use crate::connection::ConnectRequest;
    use crate::data::scannable::Scannable;
    use crate::database::{CreateTableMode, CreateTableRequest};
    use crate::table::WriteOptions;
-    use crate::Table;
    use arrow_array::{Int32Array, RecordBatch, StringArray};
    use arrow_schema::{DataType, Field, Schema};
    use std::path::PathBuf;
--- a/rust/lancedb/src/database/namespace.rs
+++ b/rust/lancedb/src/database/namespace.rs
@@ -7,17 +7,17 @@ use std::collections::HashMap;
 use std::sync::Arc;

 use async_trait::async_trait;
+use lance_io::object_store::{ObjectStoreParams, StorageOptionsAccessor};
 use lance_namespace::{
-    models::{
-        CreateEmptyTableRequest, CreateNamespaceRequest, CreateNamespaceResponse,
-        DeclareTableRequest, DescribeNamespaceRequest, DescribeNamespaceResponse,
-        DescribeTableRequest, DropNamespaceRequest, DropNamespaceResponse, DropTableRequest,
-        ListNamespacesRequest, ListNamespacesResponse, ListTablesRequest, ListTablesResponse,
-    },
    LanceNamespace,
+    models::{
+        CreateNamespaceRequest, CreateNamespaceResponse, DeclareTableRequest,
+        DescribeNamespaceRequest, DescribeNamespaceResponse, DescribeTableRequest,
+        DropNamespaceRequest, DropNamespaceResponse, DropTableRequest, ListNamespacesRequest,
+        ListNamespacesResponse, ListTablesRequest, ListTablesResponse,
+    },
 };
 use lance_namespace_impls::ConnectBuilder;
-use log::warn;

 use crate::database::ReadConsistency;
 use crate::error::{Error, Result};
@@ -212,45 +212,30 @@ impl Database for LanceNamespaceDatabase {
            ..Default::default()
        };

-        let location = match self.namespace.declare_table(declare_request).await {
-            Ok(response) => response.location.ok_or_else(|| Error::Runtime {
+        let (location, initial_storage_options) = {
+            let response = self.namespace.declare_table(declare_request).await?;
+            let loc = response.location.ok_or_else(|| Error::Runtime {
                message: "Table location is missing from declare_table response".to_string(),
-            })?,
-            Err(e) => {
-                // Check if the error is "not supported" and try create_empty_table as fallback
-                let err_str = e.to_string().to_lowercase();
-                if err_str.contains("not supported") || err_str.contains("not implemented") {
-                    warn!(
-                        "declare_table is not supported by the namespace client, \
-                        falling back to deprecated create_empty_table. \
-                        create_empty_table is deprecated and will be removed in Lance 3.0.0. \
-                        Please upgrade your namespace client to support declare_table."
-                    );
-                    #[allow(deprecated)]
-                    let create_empty_request = CreateEmptyTableRequest {
-                        id: Some(table_id.clone()),
-                        ..Default::default()
-                    };
+            })?;
+            // Use storage options from response, fall back to self.storage_options
+            let opts = response
+                .storage_options
+                .or_else(|| Some(self.storage_options.clone()))
+                .filter(|o| !o.is_empty());
+            (loc, opts)
+        };

-                    #[allow(deprecated)]
-                    let create_response = self
-                        .namespace
-                        .create_empty_table(create_empty_request)
-                        .await
-                        .map_err(|e| Error::Runtime {
-                            message: format!("Failed to create empty table: {}", e),
-                        })?;
-
-                    create_response.location.ok_or_else(|| Error::Runtime {
-                        message: "Table location is missing from create_empty_table response"
-                            .to_string(),
-                    })?
-                } else {
-                    return Err(Error::Runtime {
-                        message: format!("Failed to declare table: {}", e),
-                    });
-                }
-            }
+        let write_params = if let Some(storage_opts) = initial_storage_options {
+            let mut params = request.write_options.lance_write_params.unwrap_or_default();
+            let store_params = params
+                .store_params
+                .get_or_insert_with(ObjectStoreParams::default);
+            store_params.storage_options_accessor = Some(Arc::new(
+                StorageOptionsAccessor::with_static_options(storage_opts),
+            ));
+            Some(params)
+        } else {
+            request.write_options.lance_write_params
        };

        let native_table = NativeTable::create_from_namespace(
@@ -260,7 +245,7 @@ impl Database for LanceNamespaceDatabase {
            request.namespace.clone(),
            request.data,
            None, // write_store_wrapper not used for namespace connections
-            request.write_options.lance_write_params,
+            write_params,
            self.read_consistency_interval,
            self.server_side_query_enabled,
            self.session.clone(),
--- a/rust/lancedb/src/dataloader/permutation/builder.rs
+++ b/rust/lancedb/src/dataloader/permutation/builder.rs
@@ -11,16 +11,16 @@ use lance_core::ROW_ID;
 use lance_datafusion::exec::SessionContextExt;

 use crate::{
+    Error, Result, Table,
    arrow::{SendableRecordBatchStream, SendableRecordBatchStreamExt, SimpleRecordBatchStream},
    connect,
    database::{CreateTableRequest, Database},
    dataloader::permutation::{
        shuffle::{Shuffler, ShufflerConfig},
-        split::{SplitStrategy, Splitter, SPLIT_ID_COLUMN},
-        util::{rename_column, TemporaryDirectory},
+        split::{SPLIT_ID_COLUMN, SplitStrategy, Splitter},
+        util::{TemporaryDirectory, rename_column},
    },
    query::{ExecutableQuery, QueryBase, Select},
-    Error, Result, Table,
 };

 pub const SRC_ROW_ID_COL: &str = "row_id";
@@ -57,7 +57,7 @@ pub struct PermutationConfig {
 }

 /// Strategy for shuffling the data.
-#[derive(Debug, Clone)]
+#[derive(Debug, Clone, Default)]
 pub enum ShuffleStrategy {
    /// The data is randomly shuffled
    ///
@@ -78,15 +78,10 @@ pub enum ShuffleStrategy {
    /// The data is not shuffled
    ///
    /// This is useful for debugging and testing.
+    #[default]
    None,
 }

-impl Default for ShuffleStrategy {
-    fn default() -> Self {
-        Self::None
-    }
-}
-
 /// Builder for creating a permutation table.
 ///
 /// A permutation table is a table that stores split assignments and a shuffled order of rows.  This
--- a/rust/lancedb/src/dataloader/permutation/reader.rs
+++ b/rust/lancedb/src/dataloader/permutation/reader.rs
@@ -25,8 +25,8 @@ use futures::{StreamExt, TryStreamExt};
 use lance::dataset::scanner::DatasetRecordBatchStream;
 use lance::io::RecordBatchStream;
 use lance_arrow::RecordBatchExt;
-use lance_core::error::LanceOptionExt;
 use lance_core::ROW_ID;
+use lance_core::error::LanceOptionExt;
 use std::collections::HashMap;
 use std::sync::Arc;

@@ -426,6 +426,7 @@ impl PermutationReader {
            row_ids_query = row_ids_query.limit(limit as usize);
        }
        let mut row_ids = row_ids_query.execute().await?;
+        let mut idx_offset = 0;
        while let Some(batch) = row_ids.try_next().await? {
            let row_ids = batch
                .column(0)
@@ -433,8 +434,9 @@ impl PermutationReader {
                .values()
                .to_vec();
            for (i, row_id) in row_ids.iter().enumerate() {
-                offset_map.insert(i as u64, *row_id);
+                offset_map.insert(i as u64 + idx_offset, *row_id);
            }
+            idx_offset += batch.num_rows() as u64;
        }
        let offset_map = Arc::new(offset_map);
        *offset_map_ref = Some(offset_map.clone());
@@ -498,10 +500,10 @@ mod tests {
    use rand::seq::SliceRandom;

    use crate::{
+        Table,
        arrow::SendableRecordBatchStream,
        query::{ExecutableQuery, QueryBase},
-        test_utils::datagen::{virtual_table, LanceDbDatagenExt},
-        Table,
+        test_utils::datagen::{LanceDbDatagenExt, virtual_table},
    };

    use super::*;
@@ -845,4 +847,106 @@ mod tests {
            .to_vec();
        assert_eq!(idx_values, vec![row_ids[2] as i32]);
    }
+
+    #[tokio::test]
+    async fn test_filtered_permutation_full_iteration() {
+        use crate::dataloader::permutation::builder::PermutationBuilder;
+
+        // Create a base table with 10000 rows where idx goes 0..10000.
+        // Filter to even values only, giving 5000 rows in the permutation.
+        let base_table = lance_datagen::gen_batch()
+            .col("idx", lance_datagen::array::step::<Int32Type>())
+            .into_mem_table("tbl", RowCount::from(10000), BatchCount::from(1))
+            .await;
+
+        let permutation_table = PermutationBuilder::new(base_table.clone())
+            .with_filter("idx % 2 = 0".to_string())
+            .build()
+            .await
+            .unwrap();
+
+        assert_eq!(permutation_table.count_rows(None).await.unwrap(), 5000);
+
+        let reader = PermutationReader::try_from_tables(
+            base_table.base_table().clone(),
+            permutation_table.base_table().clone(),
+            0,
+        )
+        .await
+        .unwrap();
+
+        assert_eq!(reader.count_rows(), 5000);
+
+        // Iterate through all batches using a batch size that doesn't evenly divide
+        // the row count (5000 / 128 = 39 full batches + 1 batch of 8 rows).
+        let batch_size = 128;
+        let mut stream = reader
+            .read(
+                Select::All,
+                QueryExecutionOptions {
+                    max_batch_length: batch_size,
+                    ..Default::default()
+                },
+            )
+            .await
+            .unwrap();
+
+        let mut total_rows = 0u64;
+        let mut all_idx_values = Vec::new();
+        while let Some(batch) = stream.try_next().await.unwrap() {
+            assert!(batch.num_rows() <= batch_size as usize);
+            total_rows += batch.num_rows() as u64;
+            let idx_col = batch.column(0).as_primitive::<Int32Type>().values();
+            all_idx_values.extend(idx_col.iter().copied());
+        }
+
+        assert_eq!(total_rows, 5000);
+        assert_eq!(all_idx_values.len(), 5000);
+
+        // Every value should be even (from the filter)
+        assert!(all_idx_values.iter().all(|v| v % 2 == 0));
+
+        // Should have 5000 unique values
+        let unique: std::collections::HashSet<i32> = all_idx_values.iter().copied().collect();
+        assert_eq!(unique.len(), 5000);
+
+        // Use take_offsets to fetch rows from the beginning, middle, and end
+        // of the permutation. The values should match what we saw during iteration.
+
+        // Beginning
+        let batch = reader.take_offsets(&[0, 1, 2], Select::All).await.unwrap();
+        assert_eq!(batch.num_rows(), 3);
+        let idx_values = batch
+            .column(0)
+            .as_primitive::<Int32Type>()
+            .values()
+            .to_vec();
+        assert_eq!(idx_values, &all_idx_values[0..3]);
+
+        // Middle
+        let batch = reader
+            .take_offsets(&[2499, 2500, 2501], Select::All)
+            .await
+            .unwrap();
+        assert_eq!(batch.num_rows(), 3);
+        let idx_values = batch
+            .column(0)
+            .as_primitive::<Int32Type>()
+            .values()
+            .to_vec();
+        assert_eq!(idx_values, &all_idx_values[2499..2502]);
+
+        // End (last 3 rows)
+        let batch = reader
+            .take_offsets(&[4997, 4998, 4999], Select::All)
+            .await
+            .unwrap();
+        assert_eq!(batch.num_rows(), 3);
+        let idx_values = batch
+            .column(0)
+            .as_primitive::<Int32Type>()
+            .values()
+            .to_vec();
+        assert_eq!(idx_values, &all_idx_values[4997..5000]);
+    }
 }
--- a/rust/lancedb/src/dataloader/permutation/shuffle.rs
+++ b/rust/lancedb/src/dataloader/permutation/shuffle.rs
@@ -18,12 +18,12 @@ use lance_io::{
    scheduler::{ScanScheduler, SchedulerConfig},
    utils::CachedFileSize,
 };
-use rand::{seq::SliceRandom, Rng, RngCore};
+use rand::{Rng, RngCore, seq::SliceRandom};

 use crate::{
-    arrow::{SendableRecordBatchStream, SimpleRecordBatchStream},
-    dataloader::permutation::util::{non_crypto_rng, TemporaryDirectory},
    Error, Result,
+    arrow::{SendableRecordBatchStream, SimpleRecordBatchStream},
+    dataloader::permutation::util::{TemporaryDirectory, non_crypto_rng},
 };

 #[derive(Debug, Clone)]
@@ -281,7 +281,7 @@ mod tests {
    use datafusion_expr::col;
    use futures::TryStreamExt;
    use lance_datagen::{BatchCount, BatchGeneratorBuilder, ByteCount, RowCount, Seed};
-    use rand::{rngs::SmallRng, SeedableRng};
+    use rand::{SeedableRng, rngs::SmallRng};

    fn test_gen() -> BatchGeneratorBuilder {
        lance_datagen::gen_batch()
--- a/rust/lancedb/src/dataloader/permutation/split.rs
+++ b/rust/lancedb/src/dataloader/permutation/split.rs
@@ -2,8 +2,8 @@
 // SPDX-FileCopyrightText: Copyright The LanceDB Authors

 use std::sync::{
-    atomic::{AtomicBool, AtomicU64, AtomicUsize, Ordering},
    Arc,
+    atomic::{AtomicBool, AtomicU64, AtomicUsize, Ordering},
 };

 use arrow_array::{Array, BooleanArray, RecordBatch, UInt64Array};
@@ -15,21 +15,22 @@ use lance_arrow::SchemaExt;
 use lance_core::ROW_ID;

 use crate::{
+    Error, Result,
    arrow::{SendableRecordBatchStream, SimpleRecordBatchStream},
    dataloader::{
        permutation::shuffle::{Shuffler, ShufflerConfig},
        permutation::util::TemporaryDirectory,
    },
    query::{Query, QueryBase, Select},
-    Error, Result,
 };

 pub const SPLIT_ID_COLUMN: &str = "split_id";

 /// Strategy for assigning rows to splits
-#[derive(Debug, Clone)]
+#[derive(Debug, Clone, Default)]
 pub enum SplitStrategy {
    /// All rows will have split id 0
+    #[default]
    NoSplit,
    /// Rows will be randomly assigned to splits
    ///
@@ -73,15 +74,6 @@ pub enum SplitStrategy {
    Calculated { calculation: String },
 }

-// The default is not to split the data
-//
-// All data will be assigned to a single split.
-impl Default for SplitStrategy {
-    fn default() -> Self {
-        Self::NoSplit
-    }
-}
-
 impl SplitStrategy {
    pub fn validate(&self, num_rows: u64) -> Result<()> {
        match self {
--- a/rust/lancedb/src/dataloader/permutation/util.rs
+++ b/rust/lancedb/src/dataloader/permutation/util.rs
@@ -7,12 +7,12 @@ use arrow_array::RecordBatch;
 use arrow_schema::{Fields, Schema};
 use datafusion_execution::disk_manager::DiskManagerMode;
 use futures::TryStreamExt;
-use rand::{rngs::SmallRng, RngCore, SeedableRng};
+use rand::{RngCore, SeedableRng, rngs::SmallRng};
 use tempfile::TempDir;

 use crate::{
-    arrow::{SendableRecordBatchStream, SimpleRecordBatchStream},
    Error, Result,
+    arrow::{SendableRecordBatchStream, SimpleRecordBatchStream},
 };

 /// Directory to use for temporary files
--- a/rust/lancedb/src/embeddings.rs
+++ b/rust/lancedb/src/embeddings.rs
@@ -23,9 +23,9 @@ use arrow_schema::{DataType, Field, SchemaBuilder, SchemaRef};
 use serde::{Deserialize, Serialize};

 use crate::{
+    Error,
    error::Result,
    table::{ColumnDefinition, ColumnKind, TableDefinition},
-    Error,
 };

 /// Trait for embedding functions
--- a/rust/lancedb/src/embeddings/bedrock.rs
+++ b/rust/lancedb/src/embeddings/bedrock.rs
@@ -8,7 +8,7 @@ use arrow::array::{AsArray, Float32Builder};
 use arrow_array::{Array, ArrayRef, FixedSizeListArray, Float32Array};
 use arrow_data::ArrayData;
 use arrow_schema::DataType;
-use serde_json::{json, Value};
+use serde_json::{Value, json};

 use super::EmbeddingFunction;
 use crate::{Error, Result};
--- a/rust/lancedb/src/embeddings/openai.rs
+++ b/rust/lancedb/src/embeddings/openai.rs
@@ -8,9 +8,9 @@ use arrow_array::{Array, ArrayRef, FixedSizeListArray, Float32Array};
 use arrow_data::ArrayData;
 use arrow_schema::DataType;
 use async_openai::{
+    Client,
    config::OpenAIConfig,
    types::{CreateEmbeddingRequest, Embedding, EmbeddingInput, EncodingFormat},
-    Client,
 };
 use tokio::{runtime::Handle, task};

--- a/rust/lancedb/src/embeddings/sentence_transformers.rs
+++ b/rust/lancedb/src/embeddings/sentence_transformers.rs
@@ -7,7 +7,7 @@ use super::EmbeddingFunction;
 use arrow::{
    array::{AsArray, PrimitiveBuilder},
    datatypes::{
-        ArrowPrimitiveType, Float16Type, Float32Type, Float64Type, Int64Type, UInt32Type, UInt8Type,
+        ArrowPrimitiveType, Float16Type, Float32Type, Float64Type, Int64Type, UInt8Type, UInt32Type,
    },
 };
 use arrow_array::{Array, FixedSizeListArray, PrimitiveArray};
@@ -16,8 +16,8 @@ use arrow_schema::DataType;
 use candle_core::{CpuStorage, Device, Layout, Storage, Tensor};
 use candle_nn::VarBuilder;
 use candle_transformers::models::bert::{BertModel, DTYPE};
-use hf_hub::{api::sync::Api, Repo, RepoType};
-use tokenizers::{tokenizer::Tokenizer, PaddingParams};
+use hf_hub::{Repo, RepoType, api::sync::Api};
+use tokenizers::{PaddingParams, tokenizer::Tokenizer};

 /// Compute embeddings using huggingface sentence-transformers.
 pub struct SentenceTransformersEmbeddingsBuilder {
@@ -230,7 +230,7 @@ impl SentenceTransformersEmbeddings {
            Storage::Cpu(CpuStorage::BF16(_)) => {
                return Err(crate::Error::Runtime {
                    message: "unsupported data type".to_string(),
-                })
+                });
            }
            _ => unreachable!("we already moved the tensor to the CPU device"),
        };
@@ -298,12 +298,12 @@ impl SentenceTransformersEmbeddings {
            DataType::Utf8View => {
                return Err(crate::Error::Runtime {
                    message: "Utf8View not yet implemented".to_string(),
-                })
+                });
            }
            _ => {
                return Err(crate::Error::Runtime {
                    message: "invalid type".to_string(),
-                })
+                });
            }
        };

--- a/rust/lancedb/src/error.rs
+++ b/rust/lancedb/src/error.rs
@@ -4,6 +4,7 @@
 use std::sync::PoisonError;

 use arrow_schema::ArrowError;
+use datafusion_common::DataFusionError;
 use snafu::Snafu;

 pub(crate) type BoxError = Box<dyn std::error::Error + Send + Sync>;
@@ -96,28 +97,74 @@ pub type Result<T> = std::result::Result<T, Error>;
 impl From<ArrowError> for Error {
    fn from(source: ArrowError) -> Self {
        match source {
-            ArrowError::ExternalError(source) => match source.downcast::<Self>() {
-                Ok(e) => *e,
-                Err(source) => Self::External { source },
-            },
+            ArrowError::ExternalError(source) => Self::from_box_error(source),
            _ => Self::Arrow { source },
        }
    }
 }

+impl From<DataFusionError> for Error {
+    fn from(source: DataFusionError) -> Self {
+        match source {
+            DataFusionError::ArrowError(source, _) => (*source).into(),
+            DataFusionError::External(source) => Self::from_box_error(source),
+            other => Self::External {
+                source: Box::new(other),
+            },
+        }
+    }
+}
+
 impl From<lance::Error> for Error {
    fn from(source: lance::Error) -> Self {
        // Try to unwrap external errors that were wrapped by lance
        match source {
-            lance::Error::Wrapped { error, .. } => match error.downcast::<Self>() {
-                Ok(e) => *e,
-                Err(source) => Self::External { source },
-            },
+            lance::Error::Wrapped { error, .. } => Self::from_box_error(error),
+            lance::Error::External { source } => Self::from_box_error(source),
            _ => Self::Lance { source },
        }
    }
 }

+impl Error {
+    fn from_box_error(mut source: Box<dyn std::error::Error + Send + Sync>) -> Self {
+        source = match source.downcast::<Self>() {
+            Ok(e) => match *e {
+                Self::External { source } => return Self::from_box_error(source),
+                other => return other,
+            },
+            Err(source) => source,
+        };
+
+        source = match source.downcast::<lance::Error>() {
+            Ok(e) => match *e {
+                lance::Error::Wrapped { error, .. } => return Self::from_box_error(error),
+                other => return other.into(),
+            },
+            Err(source) => source,
+        };
+
+        source = match source.downcast::<ArrowError>() {
+            Ok(e) => match *e {
+                ArrowError::ExternalError(source) => return Self::from_box_error(source),
+                other => return other.into(),
+            },
+            Err(source) => source,
+        };
+
+        source = match source.downcast::<DataFusionError>() {
+            Ok(e) => match *e {
+                DataFusionError::ArrowError(source, _) => return (*source).into(),
+                DataFusionError::External(source) => return Self::from_box_error(source),
+                other => return other.into(),
+            },
+            Err(source) => source,
+        };
+
+        Self::External { source }
+    }
+}
+
 impl From<object_store::Error> for Error {
    fn from(source: object_store::Error) -> Self {
        Self::ObjectStore { source }
--- a/rust/lancedb/src/expr.rs
+++ b/rust/lancedb/src/expr.rs
@@ -0,0 +1,131 @@
+// SPDX-License-Identifier: Apache-2.0
+// SPDX-FileCopyrightText: Copyright The LanceDB Authors
+
+//! Expression builder API for type-safe query construction
+//!
+//! This module provides a fluent API for building expressions that can be used
+//! in filters and projections. It wraps DataFusion's expression system.
+//!
+//! # Examples
+//!
+//! ```rust
+//! use std::ops::Mul;
+//! use lancedb::expr::{col, lit};
+//!
+//! let expr = col("age").gt(lit(18));
+//! let expr = col("age").gt(lit(18)).and(col("status").eq(lit("active")));
+//! let expr = col("price") * lit(1.1);
+//! ```
+
+mod sql;
+
+pub use sql::expr_to_sql_string;
+
+use std::sync::Arc;
+
+use arrow_schema::DataType;
+use datafusion_expr::{Expr, ScalarUDF, expr_fn::cast};
+use datafusion_functions::string::expr_fn as string_expr_fn;
+
+pub use datafusion_expr::{col, lit};
+
+pub use datafusion_expr::Expr as DfExpr;
+
+pub fn lower(expr: Expr) -> Expr {
+    string_expr_fn::lower(expr)
+}
+
+pub fn upper(expr: Expr) -> Expr {
+    string_expr_fn::upper(expr)
+}
+
+pub fn contains(expr: Expr, search: Expr) -> Expr {
+    string_expr_fn::contains(expr, search)
+}
+
+pub fn expr_cast(expr: Expr, data_type: DataType) -> Expr {
+    cast(expr, data_type)
+}
+
+lazy_static::lazy_static! {
+    static ref FUNC_REGISTRY: std::sync::RwLock<std::collections::HashMap<String, Arc<ScalarUDF>>> = {
+        let mut m = std::collections::HashMap::new();
+        m.insert("lower".to_string(), datafusion_functions::string::lower());
+        m.insert("upper".to_string(), datafusion_functions::string::upper());
+        m.insert("contains".to_string(), datafusion_functions::string::contains());
+        m.insert("btrim".to_string(), datafusion_functions::string::btrim());
+        m.insert("ltrim".to_string(), datafusion_functions::string::ltrim());
+        m.insert("rtrim".to_string(), datafusion_functions::string::rtrim());
+        m.insert("concat".to_string(), datafusion_functions::string::concat());
+        m.insert("octet_length".to_string(), datafusion_functions::string::octet_length());
+        std::sync::RwLock::new(m)
+    };
+}
+
+pub fn func(name: impl AsRef<str>, args: Vec<Expr>) -> crate::Result<Expr> {
+    let name = name.as_ref();
+    let registry = FUNC_REGISTRY
+        .read()
+        .map_err(|e| crate::Error::InvalidInput {
+            message: format!("lock poisoned: {}", e),
+        })?;
+    let udf = registry
+        .get(name)
+        .ok_or_else(|| crate::Error::InvalidInput {
+            message: format!("unknown function: {}", name),
+        })?;
+    Ok(Expr::ScalarFunction(
+        datafusion_expr::expr::ScalarFunction::new_udf(udf.clone(), args),
+    ))
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[test]
+    fn test_col_lit_comparisons() {
+        let expr = col("age").gt(lit(18));
+        let sql = expr_to_sql_string(&expr).unwrap();
+        assert!(sql.contains("age") && sql.contains("18"));
+
+        let expr = col("name").eq(lit("Alice"));
+        let sql = expr_to_sql_string(&expr).unwrap();
+        assert!(sql.contains("name") && sql.contains("Alice"));
+    }
+
+    #[test]
+    fn test_compound_expression() {
+        let expr = col("age").gt(lit(18)).and(col("status").eq(lit("active")));
+        let sql = expr_to_sql_string(&expr).unwrap();
+        assert!(sql.contains("age") && sql.contains("status"));
+    }
+
+    #[test]
+    fn test_string_functions() {
+        let expr = lower(col("name"));
+        let sql = expr_to_sql_string(&expr).unwrap();
+        assert!(sql.to_lowercase().contains("lower"));
+
+        let expr = contains(col("text"), lit("search"));
+        let sql = expr_to_sql_string(&expr).unwrap();
+        assert!(sql.to_lowercase().contains("contains"));
+    }
+
+    #[test]
+    fn test_func() {
+        let expr = func("lower", vec![col("x")]).unwrap();
+        let sql = expr_to_sql_string(&expr).unwrap();
+        assert!(sql.to_lowercase().contains("lower"));
+
+        let result = func("unknown_func", vec![col("x")]);
+        assert!(result.is_err());
+    }
+
+    #[test]
+    fn test_arithmetic() {
+        let expr = col("price") * lit(1.1);
+        let sql = expr_to_sql_string(&expr).unwrap();
+        assert!(sql.contains("price"));
+    }
+}
--- a/rust/lancedb/src/expr/sql.rs
+++ b/rust/lancedb/src/expr/sql.rs
@@ -0,0 +1,12 @@
+// SPDX-License-Identifier: Apache-2.0
+// SPDX-FileCopyrightText: Copyright The LanceDB Authors
+
+use datafusion_expr::Expr;
+use datafusion_sql::unparser;
+
+pub fn expr_to_sql_string(expr: &Expr) -> crate::Result<String> {
+    let ast = unparser::expr_to_sql(expr).map_err(|e| crate::Error::InvalidInput {
+        message: format!("failed to serialize expression to SQL: {}", e),
+    })?;
+    Ok(ast.to_string())
+}
--- a/rust/lancedb/src/index.rs
+++ b/rust/lancedb/src/index.rs
@@ -9,7 +9,7 @@ use std::time::Duration;
 use vector::IvfFlatIndexBuilder;

 use crate::index::vector::IvfRqIndexBuilder;
-use crate::{table::BaseTable, DistanceType, Error, Result};
+use crate::{DistanceType, Error, Result, table::BaseTable};

 use self::{
    scalar::{BTreeIndexBuilder, BitmapIndexBuilder, LabelListIndexBuilder},
--- a/rust/lancedb/src/index/scalar.rs
+++ b/rust/lancedb/src/index/scalar.rs
@@ -27,7 +27,7 @@
 ///
 /// The btree index does not currently have any parameters though parameters such as the
 /// block size may be added in the future.
-#[derive(Default, Debug, Clone)]
+#[derive(Default, Debug, Clone, serde::Serialize)]
 pub struct BTreeIndexBuilder {}

 impl BTreeIndexBuilder {}
@@ -39,7 +39,7 @@ impl BTreeIndexBuilder {}
 /// This index works best for low-cardinality (i.e., less than 1000 unique values) columns,
 /// where the number of unique values is small.
 /// The bitmap stores a list of row ids where the value is present.
-#[derive(Debug, Clone, Default)]
+#[derive(Debug, Clone, Default, serde::Serialize)]
 pub struct BitmapIndexBuilder {}

 /// Builder for LabelList index.
@@ -48,10 +48,10 @@ pub struct BitmapIndexBuilder {}
 /// support queries with `array_contains_all` and `array_contains_any`
 /// using an underlying bitmap index.
 ///
-#[derive(Debug, Clone, Default)]
+#[derive(Debug, Clone, Default, serde::Serialize)]
 pub struct LabelListIndexBuilder {}

-pub use lance_index::scalar::inverted::query::*;
 pub use lance_index::scalar::FullTextSearchQuery;
 pub use lance_index::scalar::InvertedIndexParams as FtsIndexBuilder;
 pub use lance_index::scalar::InvertedIndexParams;
+pub use lance_index::scalar::inverted::query::*;
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Jack Ye	01c6b9dcb8	feat: add mem_wal flag to merge insert for MemWAL write path Add support for enabling MemWAL (Memory Write-Ahead Log) mode on merge insert operations. This allows streaming writes to route through memory nodes for high-performance buffered writes. Changes: - Add `mem_wal` field to MergeInsertBuilder with validation - Add `x-lancedb-mem-wal-enabled` header for remote requests - Add Python `mem_wal()` method to LanceMergeInsertBuilder - Add validation to ensure only upsert pattern is supported: - when_matched_update_all() without filter - when_not_matched_insert_all() - Throw NotSupported error for native tables - Add mem_wal_enabled to ClientConfig for Python/Node bindings Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>	2026-03-16 01:04:04 -07:00
Will Jones	33a13f0738	fixes for breaking changes	2026-03-04 15:27:32 -08:00
Will Jones	cabc75f167	feat: upgrade Lance to 3.0.0-rc.3	2026-03-04 14:57:17 -08:00
Wyatt Alt	97ca9bb943	feat: allow passing azure client/tenant ID through remote SDK (#3102 ) Prior to this commit we supported passing the azure storage account name to the lancedb remote SDK through headers. This adds support for client ID and tenant ID as well.	2026-03-04 11:11:36 -08:00
Xuanwo	fa1b04f341	chore: migrate Rust crates to edition 2024 and fix clippy warnings (#3098 ) This PR migrates all Rust crates in the workspace to Rust 2024 edition and addresses the resulting compatibility updates. It also fixes all clippy warnings surfaced by the workspace checks so the codebase remains warning-free under the current lint configuration. Context: - Scope: workspace edition bump (`2021` -> `2024`) plus follow-up refactors required by new edition and clippy rules. - Validation: `cargo fmt --all` and `cargo clippy --quiet --features remote --tests --examples -- -D warnings` both pass.	2026-03-03 16:23:29 -08:00
mrncstt	367abe99d2	feat(python): support dict to SQL struct conversion in table.update() (#3089 ) ## Summary - Add `@value_to_sql.register(dict)` handler that converts Python dicts to DataFusion's `named_struct()` SQL syntax - Enables updating struct-typed columns via `table.update(values={"col": {"field_a": 1, "field_b": "hello"}})` - Recursively handles nested structs, lists, nulls, and all existing scalar types Closes #1363 ## Details The `named_struct` function was introduced in DataFusion 38 and is now available (LanceDB uses DataFusion 52.1). The implementation follows the existing `singledispatch` pattern in `util.py`. Example conversion: ```python value_to_sql({"field_a": 1, "field_b": "hello"}) # => "named_struct('field_a', 1, 'field_b', 'hello')" ``` ## Test plan - [x] Unit tests for flat struct, nested struct, list inside struct, mixed types, null values, and empty dict - [ ] CI integration tests with actual table.update() on struct columns 🔗 [DataFusion named_struct docs](https://datafusion.apache.org/user-guide/sql/scalar_functions.html#named-struct)	2026-03-03 13:36:08 -08:00
Xuanwo	52ce2c995c	fix(ci): only run npm publish on release tags (#3093 ) This PR fixes the npm publish dry-run failure for prerelease versions without changing the existing workflow trigger behavior. The publish step now detects prerelease versions from `nodejs/package.json` and always appends `--tag preview` when needed. Context: - On `main` pushes, the workflow still runs `npm publish --dry-run` by design. - Recent failures were caused by prerelease versions (for example `0.27.0-beta.3`) running without `--tag`, which npm rejects. - The previous `refs/tags/v...-beta...` check did not apply on branch pushes, so dry-run could fail even though release tags worked.	2026-03-04 01:35:10 +08:00
Sean Mackrory	e71a00998c	ci: add regression test for fastSearch in FTS queries in TypeScript (#3090 ) We recently added support for this for the Python bindings, and wanted to confirm this already worked as expected in the TS bindings.	2026-03-03 07:09:09 -08:00
Sean Mackrory	39a2ac0a1c	feat: add parity between fast_search keyword argument between vector and FTS searches (#3091 ) We don't necessarily need to do this, but one user was confused having used `fast_search=True` as a keyword argument for vector searches, but being unable to do so for FTS, even after the most recent changes. I think this is the only discrepancy in where that is possible.	2026-03-03 05:21:36 -08:00
Wyatt Alt	bc7b344fa4	feat: add support for remote index params (#3087 ) Prior to this commit the remote SDK did not support the full set of index parameters. This extends the SDK to support them.	2026-03-02 11:14:28 -08:00
Will Jones	f91d2f5fec	ci(python): pin maturin to work around bug (#3088 ) Work around for https://github.com/PyO3/maturin/issues/3059	2026-03-02 09:38:54 -08:00
Wyatt Alt	cf81b6419f	feat: add `num_deleted_rows` to delete result (#3077 )	2026-03-02 08:37:14 -08:00
Lance Release	0498ac1f2f	Bump version: 0.27.0-beta.2 → 0.27.0-beta.3	2026-02-28 01:31:51 +00:00
Lance Release	aeb1c3ee6a	Bump version: 0.30.0-beta.2 → 0.30.0-beta.3	2026-02-28 01:29:53 +00:00
Weston Pace	f9ae46c0e7	feat: upgrade lance to 3.0.0-rc.2 and add bindings for fast_search (#3083 )	2026-02-27 17:27:01 -08:00
Will Jones	84bf022fb1	fix(python): pin pylance to make datafusion table provider match version (#3080 )	2026-02-27 13:34:05 -08:00
Will Jones	310967eceb	ci(rust): fix linux job (#3076 )	2026-02-26 19:25:46 -08:00
Jack Ye	154dbeee2a	chore: fix clippy for PreprocessingOutput without remote feature (#3070 ) Fix clippy: ``` error: fields `overwrite` and `rescannable` are never read Error: --> /home/runner/work/xxxx/xxxx/src/lancedb/rust/lancedb/src/table/add_data.rs:158:9 \| 156 \| pub struct PreprocessingOutput { \| ------------------- fields in this struct 157 \| pub plan: Arc<dyn datafusion_physical_plan::ExecutionPlan>, 158 \| pub overwrite: bool, \| ^^^^^^^^^ 159 \| pub rescannable: bool, \| ^^^^^^^^^^^ \| = note: `-D dead-code` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(dead_code)]` ```	2026-02-25 14:59:32 -08:00
Lance Release	c9c08ac8b9	Bump version: 0.27.0-beta.1 → 0.27.0-beta.2	2026-02-25 07:47:54 +00:00
Lance Release	e253f5d9b6	Bump version: 0.30.0-beta.1 → 0.30.0-beta.2	2026-02-25 07:46:06 +00:00
LanceDB Robot	05b4fb0990	chore: update lance dependency to v3.1.0-beta.2 (#3068 ) ## Summary - Bump Lance Rust workspace dependencies to `v3.1.0-beta.2` via `ci/set_lance_version.py`. - Update Java `lance-core.version` to `3.1.0-beta.2`. ## Verification - `cargo clippy --workspace --tests --all-features -- -D warnings` - `cargo fmt --all` ## Release Reference - refs/tags/v3.1.0-beta.2	2026-02-24 23:02:22 -08:00
Mesut-Doner	613b9c1099	feat(rust): add expression builder API for type-safe query filters (#3032 ) ## Summary Adds a Rust expression builder API as a type-safe alternative to SQL strings for query filters. ## Motivation Filtering with raw SQL strings can be awkward when using variables and special types: Closes #3038 --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2026-02-24 18:44:03 -08:00
Will Jones	d5948576b9	feat: parallel inserts for local tables (#3062 ) When input data is sufficiently large, we automatically split up into parallel writes using a round-robin exchange operator. We sample the first batch to determine data width, and target size of 1 million rows or 2GB, whichever is smaller.	2026-02-24 12:26:51 -08:00
Will Jones	0d3fc7860a	ci: fix python DataFusion test (#3060 )	2026-02-24 07:59:12 -08:00
Weston Pace	531cec075c	fix: don't expect all offsets to fit in one batch in permutation reader (#3065 ) This would cause takes against large permutations to fail	2026-02-24 06:32:54 -08:00
Will Jones	0e486511fa	feat: hook up new writer for insert (#3029 ) This hooks up a new writer implementation for the `add()` method. The main immediate benefit is it allows streaming requests to remote tables, and at the same time allowing retries for most inputs. In NodeJS, we always convert the data to `Vec<RecordBatch>`, so it's always retry-able. For Python, all are retry-able, except `Iterator` and `pa.RecordBatchReader`, which can only be consumed once. Some, like `pa.datasets.Dataset` are retry-able and streaming. A lot of the changes here are to make the new DataFusion write pipeline maintain the same behavior as the existing Python-based preprocessing, such as: * casting input data to target schema * rejecting NaN values if `on_bad_vectors="error"` * applying embedding functions. In future PRs, we'll enhance these by moving the embedding calls into DataFusion and making sure we parallelize them. See: https://github.com/lancedb/lancedb/issues/3048 --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 14:43:31 -08:00
Will Jones	367262662d	feat(nodejs): upgrade napi-rs from v2 to v3 (#3057 ) ## Summary - Upgrades `@napi-rs/cli` from v2 to v3, `napi`/`napi-derive` Rust crates to 3.x - Fixes a bug ([napi-rs#1170](https://github.com/napi-rs/napi-rs/issues/1170)) where the CLI failed to locate the built `.node` binary when a custom Cargo target directory is set (via `config.toml`) ## Changes package.json / CLI: - `napi.name` → `napi.binaryName`, `napi.triples` → `napi.targets` - Removed `--no-const-enum` flag and fixed output dir arg - `napi universal` → `napi universalize` Rust API migration: - `#[napi::module_init]` → `#[napi_derive::module_init]` - `napi::JsObject` → `Object`, `.get::<_, T>()` → `.get::<T>()` - `ErrorStrategy` removed; `ThreadsafeFunction` now takes an explicit `Return` type with `CalleeHandled = false` const generic - `JsFunction` + `create_threadsafe_function` replaced by typed `Function<Args, Return>` + `build_threadsafe_function().build()` - `RerankerCallbacks` struct removed (`Function<'env,...>` can't be stored in structs); `VectorQuery::rerank` now accepts the function directly - `ClassInstance::clone()` now returns `ClassInstance`, fixed with explicit deref - `Vec<u8>` in `#[napi(object)]` now maps to `Array<number>` in v3; changed to `Buffer` to preserve the TypeScript `Buffer` type TypeScript: - `inner.rerank({ rerankHybrid: async (_, args) => ... })` → `inner.rerank(async (args) => ...)` - Header provider callback wrapped in `async` to match stricter typed constructor signature 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-23 14:42:55 -08:00
Lance Release	11efaf46ae	Bump version: 0.27.0-beta.0 → 0.27.0-beta.1	2026-02-23 18:34:48 +00:00
Lance Release	1ea22ee5ef	Bump version: 0.30.0-beta.0 → 0.30.0-beta.1	2026-02-23 18:33:28 +00:00
LanceDB Robot	8cef8806e9	chore: update lance dependency to v3.0.0-beta.5 (#3058 ) ## Summary - Bump Lance Rust dependencies and Java `lance-core` to v3.0.0-beta.5 (refs/tags/v3.0.0-beta.5). - Update workspace toolchain and dependency defaults needed for the new Lance release. - Resolve new clippy lint defaults introduced by the toolchain update. ## Validation - `cargo clippy --workspace --tests --all-features -- -D warnings` - `cargo fmt --all` --------- Co-authored-by: Jack Ye <yezhaoqin@gmail.com>	2026-02-23 00:39:30 -08:00
Will Jones	a3cd7fce69	fix: update DatasetConsistencyWrapper to accept same-version updates (#3055 ) ## Summary `DatasetConsistencyWrapper::update()` only stored datasets with a strictly newer version. This caused `migrate_manifest_paths_v2` to silently drop its update since the migration renames files without bumping the dataset version. The subsequent `uses_v2_manifest_paths()` call would then return the stale cached dataset. Changed the version check from `>` to `>=` so same-version updates are accepted. ## Test plan - [x] Existing `test_create_table_v2_manifest_paths_async` Python test should pass - [x] Existing `should be able to migrate tables to the V2 manifest paths` NodeJS test should pass - [x] All dataset wrapper unit tests pass locally 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 16:01:15 -08:00
Will Jones	48ddc833dd	feat: check for dataset updates in the background (#3021 ) This updates `DatasetConsistencyWrapper` to block less: 1. `DatasetConsistencyWrapper::get()` just returns `Arc<Dataset>` now, instead of a guard that blocks writes. `DatasetConsistencyWrapper::get_mut()` is gone; now write methods just use `get()` and then later call `update()` with the new version. This means a given table handle can do concurrent reads and writes. 2. In weak consistency mode, will check for dataset updates in the background, instead of blocking calls to `get()`. --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-20 11:18:33 -08:00
Varun Chawla	2802764092	fix(embeddings): stop retrying OpenAI 401 authentication errors (#2995 ) ## Summary Fixes #1679 This PR prevents the OpenAI embedding function from retrying when receiving a 401 Unauthorized error. Authentication errors are permanent failures that won't be fixed by retrying, yet the current implementation retries all exceptions up to 7 times by default. ## Changes - Modified `retry_with_exponential_backoff` in `utils.py` to check for non-retryable errors before retrying - Added `_is_non_retryable_error` helper function that detects: - Exceptions with name `AuthenticationError` (OpenAI's 401 error) - Exceptions with `status_code` attribute of 401 or 403 - Enhanced OpenAI embeddings to explicitly catch and re-raise `AuthenticationError` with better logging - Added unit test `test_openai_no_retry_on_401` to verify authentication errors don't trigger retries ## Test Plan - Added test that verifies: 1. A function raising `AuthenticationError` is only called once 2. No retry delays occur (sleep is never called) - Existing tests continue to pass - Formatting applied via `make format` ## Example Behavior Before: With an invalid API key, users would see 7 retry attempts over ~2 minutes: ``` WARNING:root:Error occurred: Error code: 401 - {'error': {'message': 'Incorrect API key provided...'}} Retrying in 3.97 seconds (retry 1 of 7) WARNING:root:Error occurred: Error code: 401... Retrying in 7.94 seconds (retry 2 of 7) ... ``` After: With an invalid API key, the error is raised immediately: ``` ERROR:root:Authentication failed: Invalid API key provided AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided...'}} ``` This provides better UX and prevents unnecessary API calls that would fail anyway. --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2026-02-19 09:20:54 -08:00
Weston Pace	37bbb0dba1	fix: allow permutation reader to work with remote tables as well (#3047 ) Fixed one more spot that was relying on `_inner`.	2026-02-19 00:41:41 +05:30
Prashanth Rao	155ec16161	fix: deprecate outdated files for embedding registry (#3037 ) There are old and outdated files in our embedding registry that can confuse coding agents. This PR deprecates the following files that have newer, more modern methods to generate such embeddings. - Deprecate `embeddings/siglip.py` - Deprecate `embeddings/gte.py` ## Why this change? Per a discussion with @AyushExel, the [embedding registry directory ](`1840aa7edc/python/python/lancedb/embeddings`) in the LanceDB repo has a number of outdated files that need to be deprecated. See https://github.com/lancedb/docs/issues/85 for the docs gaps that identified this. - Add note in `openclip` docs that it can be used for SigLip embeddings, which it now supports - Add note in the `sentence-transformers` page that ALL text embedding models on Hugging Face can be used	2026-02-18 12:04:39 -05:00
Weston Pace	636b8b5bbd	fix: allow permutation reader to be used with remote tables (#3019 ) There were two issues: 1. The python code needs to get access to the underlying rust table to setup the permutation reader and the attributes involved in this differ between the python local table and remote table objects. ~~2. The remote table was sending projection dictionaries as arrays of tuples and (on LanceDB cloud at least) it does not appear this is how rest servers are setup to receive them.~~ (this is now fixed as #3023) ~~Leaving as draft as this is built on https://github.com/lancedb/lancedb/pull/3016~~	2026-02-18 05:44:08 -08:00
Omair Afzal	715b81c86b	fix(python): graceful handling of empty result sets in hybrid search (#3030 ) ## Problem When applying hard filters that result in zero matches, hybrid search crashes with `IndexError: list index out of range` during reranking. This happens because empty result tables are passed through the full reranker pipeline, which expects at least one result. Traceback from the issue: ``` lancedb/query.py: in _combine_hybrid_results results = reranker.rerank_hybrid(fts_query, vector_results, fts_results) lancedb/rerankers/answerdotai.py: in rerank_hybrid combined_results = self._rerank(combined_results, query) ... IndexError: list index out of range ``` ## Fix Added an early return in `_combine_hybrid_results` when both vector and FTS results are empty. Instead of passing empty tables through normalization, reranking, and score restoration (which can fail in various ways), we now build a properly-typed empty result table with the `_relevance_score` column and return it directly. ## Test Added `test_empty_hybrid_result_reranker` that exercises `_combine_hybrid_results` directly with empty vector and FTS tables, verifying: - Returns empty table with correct schema - Includes `_relevance_score` column - Respects `with_row_ids` flag Closes #2425	2026-02-17 11:37:10 -08:00
Omair Afzal	7e1616376e	refactor: extract merge_insert into table/merge.rs submodule (#3031 ) Completes the merge_insert.rs checklist item from #2949. ## Changes - Moved `MergeResult` struct from `table.rs` to `table/merge.rs` - Moved the `NativeTable::merge_insert` implementation into `merge::execute_merge_insert()`, with the trait impl now delegating to it (same pattern as `delete.rs`) - Moved `test_merge_insert` and `test_merge_insert_use_index` tests into `table/merge.rs` - Improved moved tests to use `memory://` URIs instead of temporary directories - Cleaned up unused imports from `table.rs` (`FutureExt`, `TryFutureExt`, `Either`, `WhenMatched`, `WhenNotMatchedBySource`, `LanceMergeInsertBuilder`) - `MergeResult` is re-exported from `table.rs` so the public API is unchanged ## Testing `cargo build -p lancedb` compiles cleanly with no warnings.	2026-02-17 11:36:53 -08:00
ChinmayGowda71	d5ac5b949a	refactor(rust): extract query logic to src/table/query.rs (#3035 ) References #2949 Moved query logic and helpers from table.rs to query.rs. Refactored tests using guidelines and added coverage for multi vector plan structure.	2026-02-17 09:04:21 -08:00
Lance Release	7be6f45e0b	Bump version: 0.26.2 → 0.27.0-beta.0	2026-02-17 00:28:24 +00:00