Compare commits

..

7 Commits

Author SHA1 Message Date
Will Jones
c0a9a4d48a ci: fix pypi publish on mac/windows/arm (#3449)
The python-v0.32.0 publish run failed on every build matrix entry. Three
independent issues:

1. **Mac and Windows**: `pypa/gh-action-pypi-publish` only runs on
Linux, but was being called inline from each build job.
2. **Linux (all arches)**: `pypa/gh-action-pypi-publish` derives its
docker image name from `github.action_repository`, which is empty when
the action is invoked from inside a composite action
(actions/runner#2473 — pypa's own `action.yml` references this bug). It
falls back to `github.repository`, generating
`docker://ghcr.io/lancedb/lancedb:<tag>`, which doesn't exist →
`denied`. Only the ARM matrix entry surfaced this because it failed
first and cancel-cascaded the rest.
3. **Windows**: `upload-artifact` in `build_windows_wheel` pointed at
`python\target\wheels`, but maturin writes to the workspace-root
`target/wheels`. The artifact was always empty. Also, `pypi-publish.yml`
passed a `vcpkg_token` input that the composite doesn't declare.

## Changes

- Build jobs (linux/mac/windows) now upload their wheels as
`actions/upload-artifact` artifacts.
- New Linux `publish` job downloads all wheel artifacts and runs the
Fury or PyPA publish step directly (not via a composite), so
`github.action_repository` resolves correctly.
- Delete the unused `upload_wheel` composite action.
- Drop the broken upload-artifact step inside `build_windows_wheel`.
- Remove the bogus `vcpkg_token` input.
- Fury upload now loops over all wheels instead of just the first.
- Bump `actions/checkout`, `actions/upload-artifact`,
`actions/download-artifact` to current major versions (Node 24) to clear
deprecation warnings.
- Bump Windows job timeout 60 → 90 minutes; previous run was
cancel-timing-out on a 60m cap.
- Use `rust-lld` as the Windows MSVC linker via
`CARGO_TARGET_X86_64_PC_WINDOWS_MSVC_LINKER`. `link.exe` is
single-threaded and the long pole on Windows builds.

Fixes #3445

## Test plan

- [x] Open this PR — `paths` filter triggers a dry-run build on all
three platforms.
- [x] Verify all three builds produce wheels.
- [x] Confirm the `pypa/gh-action-pypi-publish` container actually
starts (the actions/runner#2473 bug) via the `publish-dry-run` job
pointed at TestPyPI.
- [x] **REMOVE BEFORE MERGE**: drop the `publish-dry-run` job and the
now-redundant `actions/upload-artifact` runs on PRs (currently always-on
so the dry-run has wheels to publish).
- [ ] After merge, cherry-pick onto `python-v0.32.0` and force-push the
tag to re-trigger the publish.
2026-05-28 12:35:58 -07:00
Lance Release
32c3d39f2a Bump version: 0.30.0-beta.2 → 0.30.0 2026-05-28 19:03:15 +00:00
Lance Release
7a41f4a5eb Bump version: 0.30.0-beta.1 → 0.30.0-beta.2 2026-05-28 19:02:41 +00:00
Lance Release
28fc8f0f26 Bump version: 0.33.0-beta.2 → 0.33.0 2026-05-28 19:02:05 +00:00
Lance Release
7677a5279c Bump version: 0.33.0-beta.1 → 0.33.0-beta.2 2026-05-28 19:02:03 +00:00
Will Jones
b6c645592a chore: update lance dependency to v7.0.0
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 10:24:12 -07:00
Will Jones
a9f49c8150 fix: allow appending arrow.json data into lance.json tables (#3429)
When a table is created with `pa.json_()` (PyArrow's JSON extension
type),
it is stored internally as `lance.json` (LargeBinary with `lance.json`
extension metadata). Calling `table.add()` with `pa.json_()` data failed
with:

```
RuntimeError: lance error: Append with different schema:
  `data` should have type json but type was large_binary
```

`build_field_exprs` in `rust/lancedb/src/table/datafusion/cast.rs` saw
that
the input field (`Utf8` with `arrow.json` metadata) differed from the
table
field (`LargeBinary` with `lance.json` metadata). Since
`can_cast_types(Utf8, LargeBinary)` is true, it inserted a DataFusion
`Utf8 → LargeBinary` cast. That cast preserved the input field's
`arrow.json`
extension metadata instead of adopting the table's `lance.json`
metadata, so
lance-core detected a schema mismatch and rejected the append.

This adds a special case in `build_field_exprs`: when the input is
`arrow.json` and the table field is `lance.json`, the expression is
passed
through unchanged. Lance-core's write path already handles the
`arrow.json → lance.json` conversion (including JSONB encoding), so no
DataFusion cast is needed.

Fixes #3144

Continues #3291 from a fork (the original author's branch could not be
pushed to). The original commits are preserved; an additional commit
fixes
the CI failures on that PR — formatting, a missing trait import, and
read-back assertions that assumed binary storage when a lance.json
column
is read back as `Utf8`.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: yunju.lly <yunju.lly@antgroup.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 19:24:28 -07:00
24 changed files with 370 additions and 158 deletions

View File

@@ -1,5 +1,5 @@
[tool.bumpversion]
current_version = "0.30.0-beta.1"
current_version = "0.30.0"
parse = """(?x)
(?P<major>0|[1-9]\\d*)\\.
(?P<minor>0|[1-9]\\d*)\\.

View File

@@ -29,7 +29,3 @@ runs:
args: ${{ inputs.args }}
docker-options: "-e PIP_EXTRA_INDEX_URL='https://pypi.fury.io/lance-format/ https://pypi.fury.io/lancedb/'"
working-directory: python
- uses: actions/upload-artifact@v4
with:
name: windows-wheels
path: python\target\wheels

View File

@@ -8,6 +8,9 @@ on:
# This should trigger a dry run (we skip the final publish step)
paths:
- .github/workflows/pypi-publish.yml
- .github/workflows/build_linux_wheel/action.yml
- .github/workflows/build_mac_wheel/action.yml
- .github/workflows/build_windows_wheel/action.yml
- Cargo.toml # Change in dependency frequently breaks builds
- Cargo.lock
@@ -21,9 +24,6 @@ jobs:
linux:
name: Python ${{ matrix.config.platform }} manylinux${{ matrix.config.manylinux }}
timeout-minutes: 60
permissions:
id-token: write
contents: read
strategy:
matrix:
config:
@@ -46,7 +46,7 @@ jobs:
runner: ubuntu-2404-8x-arm64
runs-on: ${{ matrix.config.runner }}
steps:
- uses: actions/checkout@v4
- uses: actions/checkout@v6
with:
fetch-depth: 0
lfs: true
@@ -60,15 +60,14 @@ jobs:
args: "--release --strip ${{ matrix.config.extra_args }}"
arm-build: ${{ matrix.config.platform == 'aarch64' }}
manylinux: ${{ matrix.config.manylinux }}
- uses: ./.github/workflows/upload_wheel
- uses: actions/upload-artifact@v7
if: startsWith(github.ref, 'refs/tags/python-v')
with:
fury_token: ${{ secrets.FURY_TOKEN }}
name: wheels-linux-${{ matrix.config.platform }}-${{ matrix.config.manylinux }}
path: target/wheels/lancedb-*.whl
if-no-files-found: error
mac:
timeout-minutes: 90
permissions:
id-token: write
contents: read
runs-on: ${{ matrix.config.runner }}
strategy:
matrix:
@@ -78,7 +77,7 @@ jobs:
env:
MACOSX_DEPLOYMENT_TARGET: 10.15
steps:
- uses: actions/checkout@v4
- uses: actions/checkout@v6
with:
fetch-depth: 0
lfs: true
@@ -90,18 +89,21 @@ jobs:
with:
python-minor-version: 10
args: "--release --strip --target ${{ matrix.config.target }} --features fp16kernels"
- uses: ./.github/workflows/upload_wheel
- uses: actions/upload-artifact@v7
if: startsWith(github.ref, 'refs/tags/python-v')
with:
fury_token: ${{ secrets.FURY_TOKEN }}
name: wheels-mac-${{ matrix.config.target }}
path: target/wheels/lancedb-*.whl
if-no-files-found: error
windows:
timeout-minutes: 60
permissions:
id-token: write
contents: read
timeout-minutes: 90
runs-on: windows-latest
env:
# link.exe is single-threaded and the long pole on Windows builds. Use
# rustc's bundled lld-link instead.
CARGO_TARGET_X86_64_PC_WINDOWS_MSVC_LINKER: rust-lld
steps:
- uses: actions/checkout@v4
- uses: actions/checkout@v6
with:
fetch-depth: 0
lfs: true
@@ -113,18 +115,70 @@ jobs:
with:
python-minor-version: 10
args: "--release --strip"
vcpkg_token: ${{ secrets.VCPKG_GITHUB_PACKAGES }}
- uses: ./.github/workflows/upload_wheel
- uses: actions/upload-artifact@v7
if: startsWith(github.ref, 'refs/tags/python-v')
with:
fury_token: ${{ secrets.FURY_TOKEN }}
name: wheels-windows
path: target/wheels/lancedb-*.whl
if-no-files-found: error
publish:
name: Publish wheels
if: startsWith(github.ref, 'refs/tags/python-v')
needs: [linux, mac, windows]
runs-on: ubuntu-latest
permissions:
id-token: write
contents: read
steps:
- uses: actions/checkout@v6
- name: Download wheel artifacts
uses: actions/download-artifact@v8
with:
pattern: wheels-*
path: target/wheels
merge-multiple: true
- name: List wheels
run: ls -la target/wheels
- name: Choose repo
id: choose_repo
run: |
if [[ ${{ github.ref }} == *beta* ]]; then
echo "repo=fury" >> $GITHUB_OUTPUT
else
echo "repo=pypi" >> $GITHUB_OUTPUT
fi
- name: Publish to Fury
if: steps.choose_repo.outputs.repo == 'fury'
env:
FURY_TOKEN: ${{ secrets.FURY_TOKEN }}
run: |
shopt -s nullglob
WHEELS=(target/wheels/lancedb-*.whl)
if [[ ${#WHEELS[@]} -eq 0 ]]; then
echo "No wheels found in target/wheels/" >&2
exit 1
fi
for WHEEL in "${WHEELS[@]}"; do
echo "Uploading $WHEEL to Fury"
curl -f -F package=@"$WHEEL" "https://$FURY_TOKEN@push.fury.io/lancedb/"
done
# NOTE: pypa/gh-action-pypi-publish must be invoked directly from a
# workflow file, not from inside a composite action. When called from a
# composite, `github.action_repository` is empty (actions/runner#2473)
# and the action falls back to `github.repository`, producing a bogus
# `docker://ghcr.io/<repo>:<ref>` image reference that GHA tries to pull.
- name: Publish to PyPI
if: steps.choose_repo.outputs.repo == 'pypi'
uses: pypa/gh-action-pypi-publish@release/v1
with:
packages-dir: target/wheels/
gh-release:
if: startsWith(github.ref, 'refs/tags/python-v')
runs-on: ubuntu-latest
permissions:
contents: write
steps:
- uses: actions/checkout@v4
- uses: actions/checkout@v6
with:
fetch-depth: 0
lfs: true
@@ -187,13 +241,13 @@ jobs:
report-failure:
name: Report Workflow Failure
runs-on: ubuntu-latest
needs: [linux, mac, windows]
needs: [linux, mac, windows, publish]
permissions:
contents: read
issues: write
if: always() && failure() && startsWith(github.ref, 'refs/tags/python-v')
steps:
- uses: actions/checkout@v4
- uses: actions/checkout@v6
- uses: ./.github/actions/create-failure-issue
with:
job-results: ${{ toJSON(needs) }}

View File

@@ -1,34 +0,0 @@
name: upload-wheel
description: "Upload wheels to Pypi"
inputs:
fury_token:
required: true
description: "release token for the fury repo"
runs:
using: "composite"
steps:
- name: Choose repo
shell: bash
id: choose_repo
run: |
if [[ ${{ github.ref }} == *beta* ]]; then
echo "repo=fury" >> $GITHUB_OUTPUT
else
echo "repo=pypi" >> $GITHUB_OUTPUT
fi
- name: Publish to Fury
if: steps.choose_repo.outputs.repo == 'fury'
shell: bash
env:
FURY_TOKEN: ${{ inputs.fury_token }}
run: |
WHEEL=$(ls target/wheels/lancedb-*.whl 2> /dev/null | head -n 1)
echo "Uploading $WHEEL to Fury"
curl -f -F package=@$WHEEL https://$FURY_TOKEN@push.fury.io/lancedb/
- name: Publish to PyPI
if: steps.choose_repo.outputs.repo == 'pypi'
uses: pypa/gh-action-pypi-publish@release/v1
with:
packages-dir: target/wheels/

139
Cargo.lock generated
View File

@@ -976,15 +976,16 @@ dependencies = [
[[package]]
name = "aws-smithy-runtime"
version = "1.11.1"
version = "1.11.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0504b1ab12debb5959e5165ee5fe97dd387e7aa7ea6a477bfd7635dfe769a4f5"
checksum = "b8e6f5caf6fea86f8c2206541ab5857cfcda9013426cdbe8fa0098b9e2d32182"
dependencies = [
"aws-smithy-async",
"aws-smithy-http",
"aws-smithy-http-client",
"aws-smithy-observability",
"aws-smithy-runtime-api",
"aws-smithy-schema",
"aws-smithy-types",
"bytes",
"fastrand",
@@ -1001,9 +1002,9 @@ dependencies = [
[[package]]
name = "aws-smithy-runtime-api"
version = "1.12.0"
version = "1.12.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b71a13df6ada0aafbf21a73bdfcdf9324cfa9df77d96b8446045be3cde61b42e"
checksum = "dc117c179ecf39a62a0a3f49f600e9ac26a7ad7dd172177999f83933af776c32"
dependencies = [
"aws-smithy-async",
"aws-smithy-runtime-api-macros",
@@ -1029,10 +1030,21 @@ dependencies = [
]
[[package]]
name = "aws-smithy-types"
version = "1.4.7"
name = "aws-smithy-schema"
version = "0.1.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9d73dbfbaa8e4bc57b9045137680b958d274823509a360abfd8e1d514d40c95c"
checksum = "7442cb268338f0eb8278140a107c046756aa01093d8ef5e99628d34ae09c94f5"
dependencies = [
"aws-smithy-runtime-api",
"aws-smithy-types",
"http 1.4.0",
]
[[package]]
name = "aws-smithy-types"
version = "1.4.8"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "056b66dbce2f81cc0c1e2b05bb402eb58f8a3530479d650efadd5bbae9a4050b"
dependencies = [
"base64-simd",
"bytes",
@@ -3284,8 +3296,9 @@ checksum = "42703706b716c37f96a77aea830392ad231f44c9e9a67872fa5548707e11b11c"
[[package]]
name = "fsst"
version = "7.1.0-beta.4"
source = "git+https://github.com/lance-format/lance.git?tag=v7.1.0-beta.4#0c0b3e18c0a4c75bda1dd6ec9d6247ef75bd29d9"
version = "7.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "bcd0ce0249ac12fd44fcde62d435c36d881952c2f0df4d1de24b45e1dbba5ddb"
dependencies = [
"arrow-array",
"rand 0.9.4",
@@ -4506,8 +4519,9 @@ checksum = "e037a2e1d8d5fdbd49b16a4ea09d5d6401c1f29eca5ff29d03d3824dba16256a"
[[package]]
name = "lance"
version = "7.1.0-beta.4"
source = "git+https://github.com/lance-format/lance.git?tag=v7.1.0-beta.4#0c0b3e18c0a4c75bda1dd6ec9d6247ef75bd29d9"
version = "7.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3944aca86f4c78f4da04af1c2bf33e664a2826b7af72972ad200d6b9de59019f"
dependencies = [
"arc-swap",
"arrow",
@@ -4552,7 +4566,6 @@ dependencies = [
"lance-io",
"lance-linalg",
"lance-namespace",
"lance-select",
"lance-table",
"lance-tokenizer",
"log",
@@ -4580,8 +4593,9 @@ dependencies = [
[[package]]
name = "lance-arrow"
version = "7.1.0-beta.4"
source = "git+https://github.com/lance-format/lance.git?tag=v7.1.0-beta.4#0c0b3e18c0a4c75bda1dd6ec9d6247ef75bd29d9"
version = "7.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "253f4a0a70580c985b91e65e9ca6cad644825a4078de28d8efbacf3ffbd7ecdc"
dependencies = [
"arrow-array",
"arrow-buffer",
@@ -4601,8 +4615,9 @@ dependencies = [
[[package]]
name = "lance-bitpacking"
version = "7.1.0-beta.4"
source = "git+https://github.com/lance-format/lance.git?tag=v7.1.0-beta.4#0c0b3e18c0a4c75bda1dd6ec9d6247ef75bd29d9"
version = "7.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "80c4d12521b1945041dd515a56aa0854973138e7ac12111c92572e33e4ecb593"
dependencies = [
"arrayref",
"paste",
@@ -4611,8 +4626,9 @@ dependencies = [
[[package]]
name = "lance-core"
version = "7.1.0-beta.4"
source = "git+https://github.com/lance-format/lance.git?tag=v7.1.0-beta.4#0c0b3e18c0a4c75bda1dd6ec9d6247ef75bd29d9"
version = "7.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "13f84020da5a484e2f07dd1796e09785ed7cd889857ebc4cb77e32ef214ee594"
dependencies = [
"arrow-array",
"arrow-buffer",
@@ -4647,8 +4663,9 @@ dependencies = [
[[package]]
name = "lance-datafusion"
version = "7.1.0-beta.4"
source = "git+https://github.com/lance-format/lance.git?tag=v7.1.0-beta.4#0c0b3e18c0a4c75bda1dd6ec9d6247ef75bd29d9"
version = "7.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7460597a66534a75987993d4dac5bc330586d99c5b79ae73367dbcbd4e29e576"
dependencies = [
"arrow",
"arrow-array",
@@ -4678,8 +4695,9 @@ dependencies = [
[[package]]
name = "lance-datagen"
version = "7.1.0-beta.4"
source = "git+https://github.com/lance-format/lance.git?tag=v7.1.0-beta.4#0c0b3e18c0a4c75bda1dd6ec9d6247ef75bd29d9"
version = "7.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "046f5506ed2271cd941a050de7bf535dd3aedc291aadec836a63fa56c5926e3b"
dependencies = [
"arrow",
"arrow-array",
@@ -4697,8 +4715,9 @@ dependencies = [
[[package]]
name = "lance-encoding"
version = "7.1.0-beta.4"
source = "git+https://github.com/lance-format/lance.git?tag=v7.1.0-beta.4#0c0b3e18c0a4c75bda1dd6ec9d6247ef75bd29d9"
version = "7.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7af54edf43dcf9d6a56cc636eb35d457e68373c6448dca3f0891b3325b4a24e6"
dependencies = [
"arrow-arith",
"arrow-array",
@@ -4733,8 +4752,9 @@ dependencies = [
[[package]]
name = "lance-file"
version = "7.1.0-beta.4"
source = "git+https://github.com/lance-format/lance.git?tag=v7.1.0-beta.4#0c0b3e18c0a4c75bda1dd6ec9d6247ef75bd29d9"
version = "7.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0772ae2d6207995dc1eb28aff9507f78e90b3362b58f311da001e9dc25f3d736"
dependencies = [
"arrow-arith",
"arrow-array",
@@ -4765,8 +4785,9 @@ dependencies = [
[[package]]
name = "lance-index"
version = "7.1.0-beta.4"
source = "git+https://github.com/lance-format/lance.git?tag=v7.1.0-beta.4#0c0b3e18c0a4c75bda1dd6ec9d6247ef75bd29d9"
version = "7.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e71fbfb51096a903cb524fe0da716f5f15fbc4a6b6f84cd6dec21abf319c5e84"
dependencies = [
"arc-swap",
"arrow",
@@ -4803,7 +4824,6 @@ dependencies = [
"lance-file",
"lance-io",
"lance-linalg",
"lance-select",
"lance-table",
"lance-tokenizer",
"libm",
@@ -4831,8 +4851,9 @@ dependencies = [
[[package]]
name = "lance-io"
version = "7.1.0-beta.4"
source = "git+https://github.com/lance-format/lance.git?tag=v7.1.0-beta.4#0c0b3e18c0a4c75bda1dd6ec9d6247ef75bd29d9"
version = "7.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "bab8c98ef1b870b20541d27f3ca4efdf7c9f5c25214233be07d231ba88900219"
dependencies = [
"arrow",
"arrow-arith",
@@ -4874,8 +4895,9 @@ dependencies = [
[[package]]
name = "lance-linalg"
version = "7.1.0-beta.4"
source = "git+https://github.com/lance-format/lance.git?tag=v7.1.0-beta.4#0c0b3e18c0a4c75bda1dd6ec9d6247ef75bd29d9"
version = "7.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6b4c51cad0ac780b02dc4da48528479e7693c03e8d05390510bbc69ca2a9a1f1"
dependencies = [
"arrow-array",
"arrow-buffer",
@@ -4891,8 +4913,9 @@ dependencies = [
[[package]]
name = "lance-namespace"
version = "7.1.0-beta.4"
source = "git+https://github.com/lance-format/lance.git?tag=v7.1.0-beta.4#0c0b3e18c0a4c75bda1dd6ec9d6247ef75bd29d9"
version = "7.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "014e8332ca0615506342e0d3af608639864b68396973be14239f09c9f21f1fc2"
dependencies = [
"arrow",
"async-trait",
@@ -4904,8 +4927,9 @@ dependencies = [
[[package]]
name = "lance-namespace-impls"
version = "7.1.0-beta.4"
source = "git+https://github.com/lance-format/lance.git?tag=v7.1.0-beta.4#0c0b3e18c0a4c75bda1dd6ec9d6247ef75bd29d9"
version = "7.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e8d1231906a3cf92dd3dcda7d14a09c4835af6cd2bcd76dfd2481e87f20a282d"
dependencies = [
"arrow",
"arrow-ipc",
@@ -4952,25 +4976,11 @@ dependencies = [
"url",
]
[[package]]
name = "lance-select"
version = "7.1.0-beta.4"
source = "git+https://github.com/lance-format/lance.git?tag=v7.1.0-beta.4#0c0b3e18c0a4c75bda1dd6ec9d6247ef75bd29d9"
dependencies = [
"arrow-array",
"arrow-buffer",
"byteorder",
"bytes",
"deepsize",
"itertools 0.13.0",
"lance-core",
"roaring",
]
[[package]]
name = "lance-table"
version = "7.1.0-beta.4"
source = "git+https://github.com/lance-format/lance.git?tag=v7.1.0-beta.4#0c0b3e18c0a4c75bda1dd6ec9d6247ef75bd29d9"
version = "7.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b16f1355904aea4ebb04ffc70c58c97901e10bde44452b4b021de4a1f329250d"
dependencies = [
"arrow",
"arrow-array",
@@ -4989,7 +4999,6 @@ dependencies = [
"lance-core",
"lance-file",
"lance-io",
"lance-select",
"log",
"object_store",
"prost",
@@ -5010,8 +5019,9 @@ dependencies = [
[[package]]
name = "lance-testing"
version = "7.1.0-beta.4"
source = "git+https://github.com/lance-format/lance.git?tag=v7.1.0-beta.4#0c0b3e18c0a4c75bda1dd6ec9d6247ef75bd29d9"
version = "7.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3094c2aacbd1fa093d809fc54fa911e3498671ba451041341d7caaa18460e6b2"
dependencies = [
"arrow-array",
"arrow-schema",
@@ -5022,8 +5032,9 @@ dependencies = [
[[package]]
name = "lance-tokenizer"
version = "7.1.0-beta.4"
source = "git+https://github.com/lance-format/lance.git?tag=v7.1.0-beta.4#0c0b3e18c0a4c75bda1dd6ec9d6247ef75bd29d9"
version = "7.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b39b7f5ed9d0c0b716bf599b559d888267ed1dfe4c4e29d3648b51d2a28940cf"
dependencies = [
"jieba-rs",
"lindera",
@@ -5034,7 +5045,7 @@ dependencies = [
[[package]]
name = "lancedb"
version = "0.30.0-beta.1"
version = "0.30.0"
dependencies = [
"ahash",
"anyhow",
@@ -5117,7 +5128,7 @@ dependencies = [
[[package]]
name = "lancedb-nodejs"
version = "0.30.0-beta.1"
version = "0.30.0"
dependencies = [
"arrow-array",
"arrow-buffer",
@@ -5140,7 +5151,7 @@ dependencies = [
[[package]]
name = "lancedb-python"
version = "0.33.0-beta.1"
version = "0.33.0"
dependencies = [
"arrow",
"async-trait",
@@ -8321,9 +8332,9 @@ dependencies = [
[[package]]
name = "serde_json"
version = "1.0.149"
version = "1.0.150"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "83fc039473c5595ace860d8c4fafa220ff474b3fc6bfdb4293327f1a37e94d86"
checksum = "e8014e44b4736ed0538adeecded0fce2a272f22dc9578a7eb6b2d9993c74cfb9"
dependencies = [
"itoa",
"memchr",

View File

@@ -13,20 +13,20 @@ categories = ["database-implementations"]
rust-version = "1.91.0"
[workspace.dependencies]
lance = { "version" = "=7.1.0-beta.4", default-features = false, "tag" = "v7.1.0-beta.4", "git" = "https://github.com/lance-format/lance.git" }
lance-core = { "version" = "=7.1.0-beta.4", "tag" = "v7.1.0-beta.4", "git" = "https://github.com/lance-format/lance.git" }
lance-datagen = { "version" = "=7.1.0-beta.4", "tag" = "v7.1.0-beta.4", "git" = "https://github.com/lance-format/lance.git" }
lance-file = { "version" = "=7.1.0-beta.4", "tag" = "v7.1.0-beta.4", "git" = "https://github.com/lance-format/lance.git" }
lance-io = { "version" = "=7.1.0-beta.4", default-features = false, "tag" = "v7.1.0-beta.4", "git" = "https://github.com/lance-format/lance.git" }
lance-index = { "version" = "=7.1.0-beta.4", "tag" = "v7.1.0-beta.4", "git" = "https://github.com/lance-format/lance.git" }
lance-linalg = { "version" = "=7.1.0-beta.4", "tag" = "v7.1.0-beta.4", "git" = "https://github.com/lance-format/lance.git" }
lance-namespace = { "version" = "=7.1.0-beta.4", "tag" = "v7.1.0-beta.4", "git" = "https://github.com/lance-format/lance.git" }
lance-namespace-impls = { "version" = "=7.1.0-beta.4", default-features = false, "tag" = "v7.1.0-beta.4", "git" = "https://github.com/lance-format/lance.git" }
lance-table = { "version" = "=7.1.0-beta.4", "tag" = "v7.1.0-beta.4", "git" = "https://github.com/lance-format/lance.git" }
lance-testing = { "version" = "=7.1.0-beta.4", "tag" = "v7.1.0-beta.4", "git" = "https://github.com/lance-format/lance.git" }
lance-datafusion = { "version" = "=7.1.0-beta.4", "tag" = "v7.1.0-beta.4", "git" = "https://github.com/lance-format/lance.git" }
lance-encoding = { "version" = "=7.1.0-beta.4", "tag" = "v7.1.0-beta.4", "git" = "https://github.com/lance-format/lance.git" }
lance-arrow = { "version" = "=7.1.0-beta.4", "tag" = "v7.1.0-beta.4", "git" = "https://github.com/lance-format/lance.git" }
lance = { "version" = "=7.0.0", default-features = false }
lance-core = "=7.0.0"
lance-datagen = "=7.0.0"
lance-file = "=7.0.0"
lance-io = { "version" = "=7.0.0", default-features = false }
lance-index = "=7.0.0"
lance-linalg = "=7.0.0"
lance-namespace = "=7.0.0"
lance-namespace-impls = { "version" = "=7.0.0", default-features = false }
lance-table = "=7.0.0"
lance-testing = "=7.0.0"
lance-datafusion = "=7.0.0"
lance-encoding = "=7.0.0"
lance-arrow = "=7.0.0"
ahash = "0.8"
# Note that this one does not include pyarrow
arrow = { version = "58.0.0", optional = false }

View File

@@ -14,7 +14,7 @@ Add the following dependency to your `pom.xml`:
<dependency>
<groupId>com.lancedb</groupId>
<artifactId>lancedb-core</artifactId>
<version>0.30.0-beta.1</version>
<version>0.30.0</version>
</dependency>
```

View File

@@ -8,7 +8,7 @@
<parent>
<groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId>
<version>0.30.0-beta.1</version>
<version>0.30.0-final.0</version>
<relativePath>../pom.xml</relativePath>
</parent>

View File

@@ -6,7 +6,7 @@
<groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId>
<version>0.30.0-beta.1</version>
<version>0.30.0-final.0</version>
<packaging>pom</packaging>
<name>${project.artifactId}</name>
<description>LanceDB Java SDK Parent POM</description>
@@ -28,7 +28,7 @@
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<arrow.version>15.0.0</arrow.version>
<lance-core.version>7.1.0-beta.4</lance-core.version>
<lance-core.version>7.0.0</lance-core.version>
<spotless.skip>false</spotless.skip>
<spotless.version>2.30.0</spotless.version>
<spotless.java.googlejavaformat.version>1.7</spotless.java.googlejavaformat.version>

View File

@@ -1,7 +1,7 @@
[package]
name = "lancedb-nodejs"
edition.workspace = true
version = "0.30.0-beta.1"
version = "0.30.0"
publish = false
license.workspace = true
description.workspace = true

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-darwin-arm64",
"version": "0.30.0-beta.1",
"version": "0.30.0",
"os": ["darwin"],
"cpu": ["arm64"],
"main": "lancedb.darwin-arm64.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-arm64-gnu",
"version": "0.30.0-beta.1",
"version": "0.30.0",
"os": ["linux"],
"cpu": ["arm64"],
"main": "lancedb.linux-arm64-gnu.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-arm64-musl",
"version": "0.30.0-beta.1",
"version": "0.30.0",
"os": ["linux"],
"cpu": ["arm64"],
"main": "lancedb.linux-arm64-musl.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-x64-gnu",
"version": "0.30.0-beta.1",
"version": "0.30.0",
"os": ["linux"],
"cpu": ["x64"],
"main": "lancedb.linux-x64-gnu.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-x64-musl",
"version": "0.30.0-beta.1",
"version": "0.30.0",
"os": ["linux"],
"cpu": ["x64"],
"main": "lancedb.linux-x64-musl.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-win32-arm64-msvc",
"version": "0.30.0-beta.1",
"version": "0.30.0",
"os": [
"win32"
],

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-win32-x64-msvc",
"version": "0.30.0-beta.1",
"version": "0.30.0",
"os": ["win32"],
"cpu": ["x64"],
"main": "lancedb.win32-x64-msvc.node",

View File

@@ -1,12 +1,12 @@
{
"name": "@lancedb/lancedb",
"version": "0.30.0-beta.1",
"version": "0.30.0",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "@lancedb/lancedb",
"version": "0.30.0-beta.1",
"version": "0.30.0",
"cpu": [
"x64",
"arm64"

View File

@@ -11,7 +11,7 @@
"ann"
],
"private": false,
"version": "0.30.0-beta.1",
"version": "0.30.0",
"main": "dist/index.js",
"exports": {
".": "./dist/index.js",

View File

@@ -1,5 +1,5 @@
[tool.bumpversion]
current_version = "0.33.0-beta.1"
current_version = "0.33.0"
parse = """(?x)
(?P<major>0|[1-9]\\d*)\\.
(?P<minor>0|[1-9]\\d*)\\.

View File

@@ -1,6 +1,6 @@
[package]
name = "lancedb-python"
version = "0.33.0-beta.1"
version = "0.33.0"
publish = false
edition.workspace = true
description = "Python bindings for LanceDB"

View File

@@ -1,6 +1,6 @@
[package]
name = "lancedb"
version = "0.30.0-beta.1"
version = "0.30.0"
edition.workspace = true
description = "LanceDB: A serverless, low-latency vector database for AI applications"
license.workspace = true

View File

@@ -982,4 +982,105 @@ mod tests {
table2.add(struct_batch).execute().await.unwrap();
assert_eq!(table2.count_rows(None).await.unwrap(), 2);
}
/// Regression test: appending `arrow.json` (PyArrow `pa.json_()`) data into a table
/// whose schema was created with `pa.json_()` (internally stored as `lance.json`, backed
/// by `LargeBinary`) must succeed without a schema-mismatch error.
///
/// Previously `build_field_exprs` would attempt a `Utf8 → LargeBinary` DataFusion cast,
/// which produced a field whose Arrow extension metadata still read `arrow.json` instead
/// of `lance.json`. Lance-core then rejected the append with
/// `"json vs large_binary" schema mismatch`.
///
/// PyArrow's `pa.json_()` may be backed by either `Utf8` or `LargeUtf8` depending on the
/// constructor used, so the test is parameterized over the input backing type.
#[rstest::rstest]
#[case::utf8(DataType::Utf8)]
#[case::large_utf8(DataType::LargeUtf8)]
#[tokio::test]
async fn test_add_arrow_json_into_lance_json_table(#[case] input_type: DataType) {
use arrow_array::{Array, cast::AsArray};
use lance_arrow::ARROW_EXT_NAME_KEY;
use lance_arrow::json::{ARROW_JSON_EXT_NAME, JSON_EXT_NAME};
// Build a table whose "data" column is lance.json (LargeBinary +
// ARROW:extension:name = "lance.json").
let lance_json_field = lance_arrow::json::json_field("data", true);
let table_schema = Arc::new(Schema::new(vec![lance_json_field]));
let db = connect("memory://").execute().await.unwrap();
let table = db
.create_empty_table("json_test", table_schema)
.execute()
.await
.unwrap();
// Sanity-check the stored schema.
let stored_field = table.schema().await.unwrap();
let data_field = stored_field.field_with_name("data").unwrap();
assert_eq!(data_field.data_type(), &DataType::LargeBinary);
assert_eq!(
data_field
.metadata()
.get(ARROW_EXT_NAME_KEY)
.map(|s| s.as_str()),
Some(JSON_EXT_NAME),
);
// Build an arrow.json input field (Utf8/LargeUtf8 + arrow.json extension).
// This is what PyArrow produces for pa.json_() arrays.
let arrow_json_metadata = std::collections::HashMap::from([(
ARROW_EXT_NAME_KEY.to_string(),
ARROW_JSON_EXT_NAME.to_string(),
)]);
let arrow_json_field =
Field::new("data", input_type.clone(), true).with_metadata(arrow_json_metadata);
let arrow_json_schema = Arc::new(Schema::new(vec![arrow_json_field]));
let rows: Vec<Option<&str>> = vec![None, Some(r#"{"a": 1}"#), Some(r#"{"b": 2}"#)];
let string_array: Arc<dyn arrow_array::Array> = match input_type {
DataType::Utf8 => Arc::new(arrow_array::StringArray::from(rows.clone())),
DataType::LargeUtf8 => Arc::new(arrow_array::LargeStringArray::from(rows.clone())),
other => panic!("unsupported arrow.json backing type for this test: {other:?}"),
};
let batch = RecordBatch::try_new(arrow_json_schema, vec![string_array]).unwrap();
// This must not fail with a schema-mismatch error.
table.add(batch).execute().await.unwrap();
assert_eq!(table.count_rows(None).await.unwrap(), rows.len());
// A lance.json column is read back as Utf8 carrying arrow.json extension metadata.
let results: Vec<RecordBatch> = table
.query()
.select(Select::columns(&["data"]))
.execute()
.await
.unwrap()
.try_collect()
.await
.unwrap();
assert_eq!(results.len(), 1);
let batch = &results[0];
assert_eq!(batch.num_rows(), rows.len());
let json_col = batch.column(0);
assert_eq!(json_col.data_type(), &DataType::Utf8);
let json_strs = json_col.as_string::<i32>();
for (i, expected) in rows.iter().enumerate() {
match expected {
None => assert!(json_strs.is_null(i), "row {i} expected null"),
Some(raw) => {
assert!(!json_strs.is_null(i), "row {i} expected non-null");
let actual: serde_json::Value = serde_json::from_str(json_strs.value(i))
.expect("read-back JSON should be valid");
let expected: serde_json::Value =
serde_json::from_str(raw).expect("expected JSON should be valid");
assert_eq!(actual, expected, "row {i} JSON mismatch");
}
}
}
}
}

View File

@@ -13,6 +13,7 @@ use datafusion_physical_expr::expressions::{CastExpr, Literal};
use datafusion_physical_plan::expressions::Column;
use datafusion_physical_plan::projection::ProjectionExec;
use datafusion_physical_plan::{ExecutionPlan, PhysicalExpr};
use lance_arrow::json::{is_arrow_json_field, is_json_field};
use crate::{Error, Result};
@@ -64,6 +65,18 @@ fn build_field_exprs(
let input_field = &input_fields[input_idx];
let input_expr = get_input_expr(input_idx);
// Special case: input is arrow.json (PyArrow pa.json_() extension type backed by
// Utf8/LargeUtf8) and the table field is lance.json (backed by LargeBinary).
// Lance-core's write path already handles the arrow.json → lance.json conversion
// (including JSONB encoding), so we pass the expression through unchanged and let
// lance-core deal with it. Attempting to cast Utf8 → LargeBinary here would
// produce a field whose metadata still identifies it as arrow.json, which then
// causes a schema-mismatch error inside lance-core.
if is_arrow_json_field(input_field) && is_json_field(table_field) {
result.push((input_expr, Arc::clone(input_field) as FieldRef));
continue;
}
let expr = match (input_field.data_type(), table_field.data_type()) {
// Both are structs: recurse into sub-fields to handle subschemas and casts.
(DataType::Struct(in_children), DataType::Struct(tbl_children))
@@ -618,4 +631,75 @@ mod tests {
.unwrap();
assert_eq!(a.values(), &[1, 3]);
}
/// `arrow.json` input (PyArrow `pa.json_()`, Utf8/LargeUtf8 + extension metadata) against a
/// `lance.json` table field (LargeBinary + extension metadata) must be passed through
/// without a cast so that lance-core can perform its own arrow.json → JSONB conversion.
///
/// Before the fix, `cast_to_table_schema` attempted a `Utf8 → LargeBinary` DataFusion
/// cast that preserved the wrong extension metadata, causing lance-core to reject the
/// batch with a "json vs large_binary" schema-mismatch error.
#[rstest::rstest]
#[case::utf8(DataType::Utf8)]
#[case::large_utf8(DataType::LargeUtf8)]
#[tokio::test]
async fn test_arrow_json_passthrough_to_lance_json(#[case] input_type: DataType) {
use lance_arrow::ARROW_EXT_NAME_KEY;
use lance_arrow::json::{ARROW_JSON_EXT_NAME, json_field};
// Build a table schema with a lance.json field (LargeBinary + lance.json metadata).
let lance_field = json_field("data", true);
let table_schema = Schema::new(vec![lance_field]);
// Build an input batch with an arrow.json field (Utf8/LargeUtf8 + arrow.json metadata).
let arrow_meta = std::collections::HashMap::from([(
ARROW_EXT_NAME_KEY.to_string(),
ARROW_JSON_EXT_NAME.to_string(),
)]);
let arrow_field = Field::new("data", input_type.clone(), true).with_metadata(arrow_meta);
let input_schema = Arc::new(Schema::new(vec![arrow_field]));
let values = vec![Some(r#"{"x": 1}"#), None, Some(r#"{"y": 2}"#)];
let input_array: Arc<dyn arrow_array::Array> = match input_type {
DataType::Utf8 => Arc::new(StringArray::from(values)),
DataType::LargeUtf8 => Arc::new(arrow_array::LargeStringArray::from(values)),
other => panic!("unsupported arrow.json backing type for this test: {other:?}"),
};
let input_batch = RecordBatch::try_new(input_schema, vec![input_array]).unwrap();
let plan = plan_from_batch(input_batch).await;
let projected = cast_to_table_schema(plan, &table_schema).unwrap();
// The projected schema's "data" field must carry arrow.json metadata
// (the input field), not be silently dropped or miscast.
let out_field = projected.schema().field_with_name("data").unwrap().clone();
assert_eq!(out_field.data_type(), &input_type);
assert_eq!(
out_field
.metadata()
.get(ARROW_EXT_NAME_KEY)
.map(|s| s.as_str()),
Some(ARROW_JSON_EXT_NAME),
"output field must still carry arrow.json metadata so lance-core can handle it"
);
// The data must flow through correctly (3 rows, no panic).
let result = collect(projected).await;
assert_eq!(result.num_rows(), 3);
let (v0, v2) = match input_type {
DataType::Utf8 => {
let col: &StringArray = result.column(0).as_any().downcast_ref().unwrap();
(col.value(0).to_string(), col.value(2).to_string())
}
DataType::LargeUtf8 => {
let col: &arrow_array::LargeStringArray =
result.column(0).as_any().downcast_ref().unwrap();
(col.value(0).to_string(), col.value(2).to_string())
}
_ => unreachable!(),
};
assert_eq!(v0, r#"{"x": 1}"#);
assert!(result.column(0).is_null(1));
assert_eq!(v2, r#"{"y": 2}"#);
}
}