diff --git a/.agents/skills/README.md b/.agents/skills/README.md new file mode 100644 index 000000000..296ae3f86 --- /dev/null +++ b/.agents/skills/README.md @@ -0,0 +1,7 @@ +# Agent Skills + +This directory contains repo-scoped code agent skills for the LanceDB project. + +Each skill is a folder that contains a required `SKILL.md` and optional bundled resources. + +Codex discovers skills from `.agents/skills` in the current working directory and parent directories. diff --git a/.agents/skills/lancedb-update-lance-dependency/SKILL.md b/.agents/skills/lancedb-update-lance-dependency/SKILL.md new file mode 100644 index 000000000..e97fae7ac --- /dev/null +++ b/.agents/skills/lancedb-update-lance-dependency/SKILL.md @@ -0,0 +1,98 @@ +--- +name: lancedb-update-lance-dependency +description: Update LanceDB to a specific Lance release or tag. Use when bumping Lance dependencies in the lancedb repository, including Rust workspace Lance crates, Java lance-core, validation, branch creation, commit, push, and PR creation when requested. +--- + +# LanceDB Update Lance Dependency + +## Scope + +Use this skill in the `lancedb/lancedb` repository when updating the Lance dependency to a specific Lance version or tag. + +Inputs can be a version (`7.2.0-beta.1`), a tag (`v7.2.0-beta.1`), a tag ref (`refs/tags/v7.2.0-beta.1`), or `latest`. + +## Workflow + +1. Confirm the worktree status with `git status --short`. +2. Resolve the target Lance version: + + - If the input is `latest`, empty, or omitted, run: + + ```bash + python3 ci/check_lance_release.py + ``` + + Parse the JSON output. If `needs_update` is not `true`, stop without creating a PR. Otherwise use `latest_tag`. + + - If the input is explicit, use it directly. + +3. Compute update metadata without changing files: + + ```bash + python3 ci/update_lance_dependency.py "$TAG_OR_VERSION" --metadata-only + ``` + + Before making changes, check for an existing open PR with the emitted `pr_title`: + + ```bash + gh pr list --search "\"$PR_TITLE\" in:title" --state open --limit 1 --json number,url,title + ``` + + If a matching open PR exists, stop and report it instead of creating a duplicate. + +4. Run the deterministic update entrypoint: + + ```bash + python3 ci/update_lance_dependency.py "$TAG_OR_VERSION" + ``` + + This updates the Rust workspace Lance dependencies through `ci/set_lance_version.py`, updates `java/pom.xml`, refreshes Cargo metadata, and prints JSON metadata containing `branch_name`, `commit_message`, and `pr_title`. + +5. Run validation: + + ```bash + cargo clippy --quiet --workspace --tests --all-features -- -D warnings + cargo fmt --all --quiet + ``` + + Fix real diagnostics and rerun clippy until it succeeds. Do not skip warnings. + +6. Inspect `git status --short` and `git diff` to ensure only the Lance dependency update and required compatibility fixes are present. + +7. If the task only asks to prepare local changes, stop here and report the changed files and validation result. + +8. If the task asks to publish the update, create a branch using the printed `branch_name`, stage all relevant files, and commit using the printed `commit_message`. Do not amend or rewrite existing commits. + +9. Push to `origin`. Before creating the PR, check that the current token has push permission: + + ```bash + gh api repos/lancedb/lancedb --jq .permissions.push + ``` + + If the remote branch already exists for the same generated branch name, delete the remote ref with `gh api -X DELETE repos/lancedb/lancedb/git/refs/heads/$BRANCH_NAME`, then push. Do not force-push. + +10. Create a PR targeting `main` with the printed `pr_title`. If there is no PR template, keep the body to two or three concise sentences: state the Lance dependency bump, note any required compatibility fixes, and link the triggering Lance tag or release. + +11. Read back the remote PR title after creation. If it is not a Conventional Commit title, fix it immediately. + +12. When running in GitHub Actions after creating the LanceDB PR, trigger the Sophon dependency update: + + ```bash + gh workflow run codex-bump-lancedb-lance.yml \ + --repo lancedb/sophon \ + -f lance_ref="$LANCE_TAG" \ + -f lancedb_ref="$BRANCH_NAME" + gh run list --repo lancedb/sophon --workflow codex-bump-lancedb-lance.yml --limit 1 --json databaseId,url,displayTitle + ``` + + Use the emitted metadata `tag` value as `LANCE_TAG`. Do this only after a new LanceDB PR has been created. If the update was skipped because no update is needed or an open PR already exists, do not trigger Sophon. + +## GitHub Actions + +When this skill is used from GitHub Actions, `TAG`, `GH_TOKEN`, and `GITHUB_TOKEN` may already be set. Resolve `latest` first when `TAG` is empty. Once an explicit tag or version is known, use: + +```bash +python3 ci/update_lance_dependency.py "$TAG" --github-output "$GITHUB_OUTPUT" +``` + +Then use the emitted `branch_name`, `commit_message`, and `pr_title` values for branch, commit, and PR creation. diff --git a/.github/workflows/codex-update-lance-dependency.yml b/.github/workflows/codex-update-lance-dependency.yml index 383998bb5..b164535fb 100644 --- a/.github/workflows/codex-update-lance-dependency.yml +++ b/.github/workflows/codex-update-lance-dependency.yml @@ -4,14 +4,16 @@ on: workflow_call: inputs: tag: - description: "Tag name from Lance" - required: true + description: "Tag name from Lance. If omitted, the skill will use the latest Lance release that needs an update." + required: false + default: "" type: string workflow_dispatch: inputs: tag: - description: "Tag name from Lance" - required: true + description: "Tag name from Lance. Leave empty to use the latest Lance release that needs an update." + required: false + default: "" type: string permissions: @@ -25,7 +27,7 @@ jobs: steps: - name: Show inputs run: | - echo "tag = ${{ inputs.tag }}" + echo "tag = ${{ inputs.tag || 'latest' }}" - name: Checkout Repo LanceDB uses: actions/checkout@v4 @@ -71,65 +73,21 @@ jobs: OPENAI_API_KEY: ${{ secrets.CODEX_TOKEN }} run: | set -euo pipefail - VERSION="${TAG#refs/tags/}" - VERSION="${VERSION#v}" - BRANCH_NAME="codex/update-lance-${VERSION//[^a-zA-Z0-9]/-}" - - # Use "chore" for beta/rc versions, "feat" for stable releases - if [[ "${VERSION}" == *beta* ]] || [[ "${VERSION}" == *rc* ]]; then - COMMIT_TYPE="chore" - else - COMMIT_TYPE="feat" - fi + TARGET_TAG="${TAG:-latest}" cat </tmp/codex-prompt.txt - You are running inside the lancedb repository on a GitHub Actions runner. Update the Lance dependency to version ${VERSION} and prepare a pull request for maintainers to review. + You are running inside the lancedb repository on a GitHub Actions runner. - Follow these steps exactly: - 1. Use script "ci/set_lance_version.py" to update Lance Rust dependencies. The script already refreshes Cargo metadata, so allow it to finish even if it takes time. - 2. Update the Java lance-core dependency version in "java/pom.xml": change the "..." property to "${VERSION}". - 3. Run "cargo clippy --workspace --tests --all-features -- -D warnings". If diagnostics appear, fix them yourself and rerun clippy until it exits cleanly. Do not skip any warnings. - 4. After clippy succeeds, run "cargo fmt --all" to format the workspace. - 5. Ensure the repository is clean except for intentional changes. Inspect "git status --short" and "git diff" to confirm the dependency update and any required fixes. - 6. Create and switch to a new branch named "${BRANCH_NAME}" (replace any duplicated hyphens if necessary). - 7. Stage all relevant files with "git add -A". Commit using the message "${COMMIT_TYPE}: update lance dependency to v${VERSION}". - 8. Push the branch to origin. If the remote branch already exists, delete it first with "gh api -X DELETE repos/lancedb/lancedb/git/refs/heads/${BRANCH_NAME}" then push with "git push origin ${BRANCH_NAME}". Do NOT use "git push --force" or "git push -f". - 9. env "GH_TOKEN" is available, use "gh" tools for github related operations like creating pull request. - 10. Create a pull request targeting "main" with title "${COMMIT_TYPE}: update lance dependency to v${VERSION}". First, write the PR body to /tmp/pr-body.md using a heredoc (cat <<'EOF' > /tmp/pr-body.md). The body should summarize the dependency bump, clippy/fmt verification, and link the triggering tag (${TAG}). Then run "gh pr create --body-file /tmp/pr-body.md". - 11. After creating the PR, display the PR URL, "git status --short", and a concise summary of the commands run and their results. + Use \$lancedb-update-lance-dependency with target "${TARGET_TAG}". Constraints: - - Use bash commands; avoid modifying GitHub workflow files other than through the scripted task above. - - Do not merge the PR. - - If any command fails, diagnose and fix the issue instead of aborting. + - Use env "GH_TOKEN" for GitHub operations. + - Do not merge the pull request. + - Do not force-push. + - Do not create a duplicate pull request if an open PR already exists for the target Lance version. + - If any command fails, diagnose and fix the root cause instead of aborting. + - After creating the PR, display the PR URL, "git status --short", and a concise summary of the commands run and their results. EOF printenv OPENAI_API_KEY | codex login --with-api-key codex --config shell_environment_policy.ignore_default_excludes=true exec --dangerously-bypass-approvals-and-sandbox "$(cat /tmp/codex-prompt.txt)" - - - name: Trigger sophon dependency update - env: - TAG: ${{ inputs.tag }} - GH_TOKEN: ${{ secrets.ROBOT_TOKEN }} - run: | - set -euo pipefail - VERSION="${TAG#refs/tags/}" - VERSION="${VERSION#v}" - LANCEDB_BRANCH="codex/update-lance-${VERSION//[^a-zA-Z0-9]/-}" - - echo "Triggering sophon workflow with:" - echo " lance_ref: ${TAG#refs/tags/}" - echo " lancedb_ref: ${LANCEDB_BRANCH}" - - gh workflow run codex-bump-lancedb-lance.yml \ - --repo lancedb/sophon \ - -f lance_ref="${TAG#refs/tags/}" \ - -f lancedb_ref="${LANCEDB_BRANCH}" - - - name: Show latest sophon workflow run - env: - GH_TOKEN: ${{ secrets.ROBOT_TOKEN }} - run: | - set -euo pipefail - echo "Latest sophon workflow run:" - gh run list --repo lancedb/sophon --workflow codex-bump-lancedb-lance.yml --limit 1 --json databaseId,url,displayTitle diff --git a/.github/workflows/lance-release-timer.yml b/.github/workflows/lance-release-timer.yml deleted file mode 100644 index 4e801b767..000000000 --- a/.github/workflows/lance-release-timer.yml +++ /dev/null @@ -1,62 +0,0 @@ -name: Lance Release Timer - -on: - schedule: - - cron: "*/10 * * * *" - workflow_dispatch: - -permissions: - contents: read - actions: write - -concurrency: - group: lance-release-timer - cancel-in-progress: false - -jobs: - trigger-update: - runs-on: ubuntu-latest - steps: - - name: Checkout repository - uses: actions/checkout@v4 - - - name: Check for new Lance tag - id: check - env: - GH_TOKEN: ${{ secrets.ROBOT_TOKEN }} - run: | - python3 ci/check_lance_release.py --github-output "$GITHUB_OUTPUT" - - - name: Look for existing PR - if: steps.check.outputs.needs_update == 'true' - id: pr - env: - GH_TOKEN: ${{ secrets.ROBOT_TOKEN }} - run: | - set -euo pipefail - TITLE="chore: update lance dependency to v${{ steps.check.outputs.latest_version }}" - COUNT=$(gh pr list --search "\"$TITLE\" in:title" --state open --limit 1 --json number --jq 'length') - if [ "$COUNT" -gt 0 ]; then - echo "Open PR already exists for $TITLE" - echo "pr_exists=true" >> "$GITHUB_OUTPUT" - else - echo "No existing PR for $TITLE" - echo "pr_exists=false" >> "$GITHUB_OUTPUT" - fi - - - name: Trigger codex update workflow - if: steps.check.outputs.needs_update == 'true' && steps.pr.outputs.pr_exists != 'true' - env: - GH_TOKEN: ${{ secrets.ROBOT_TOKEN }} - run: | - set -euo pipefail - TAG=${{ steps.check.outputs.latest_tag }} - gh workflow run codex-update-lance-dependency.yml -f tag=refs/tags/$TAG - - - name: Show latest codex workflow run - if: steps.check.outputs.needs_update == 'true' && steps.pr.outputs.pr_exists != 'true' - env: - GH_TOKEN: ${{ secrets.ROBOT_TOKEN }} - run: | - set -euo pipefail - gh run list --workflow codex-update-lance-dependency.yml --limit 1 --json databaseId,url,displayTitle diff --git a/ci/update_lance_dependency.py b/ci/update_lance_dependency.py new file mode 100644 index 000000000..3277870c0 --- /dev/null +++ b/ci/update_lance_dependency.py @@ -0,0 +1,126 @@ +#!/usr/bin/env python3 +"""Prepare a Lance dependency update for LanceDB.""" + +from __future__ import annotations + +import argparse +import json +import re +import subprocess +import sys +from pathlib import Path +from typing import Sequence + +try: + from check_lance_release import parse_semver +except ModuleNotFoundError: + # Supports importing as ci.update_lance_dependency from tests or ad hoc checks. + from ci.check_lance_release import parse_semver # type: ignore + + +def normalize_version(raw: str) -> str: + value = raw.strip() + value = value.removeprefix("refs/tags/") + value = value.removeprefix("v") + try: + parse_semver(value) + except ValueError: + raise ValueError(f"Unsupported Lance version or tag: {raw}") + return value + + +def normalized_tag(version: str) -> str: + return f"v{version}" + + +def branch_name(version: str) -> str: + suffix = re.sub(r"[^a-zA-Z0-9]+", "-", version).strip("-") + suffix = re.sub(r"-+", "-", suffix) + return f"codex/update-lance-{suffix}" + + +def commit_type(version: str) -> str: + prerelease = version.split("-", maxsplit=1)[1] if "-" in version else "" + return "chore" if "beta" in prerelease or "rc" in prerelease else "feat" + + +def metadata_for(version: str) -> dict[str, str]: + kind = commit_type(version) + message = f"{kind}: update lance dependency to v{version}" + return { + "version": version, + "tag": normalized_tag(version), + "branch_name": branch_name(version), + "commit_type": kind, + "commit_message": message, + "pr_title": message, + } + + +def run_command(cmd: Sequence[str], *, cwd: Path) -> None: + subprocess.run(cmd, cwd=cwd, check=True) + + +def update_java_lance_core_version(repo_root: Path, version: str) -> None: + pom_path = repo_root / "java" / "pom.xml" + contents = pom_path.read_text(encoding="utf-8") + updated, count = re.subn( + r"()[^<]+()", + rf"\g<1>{version}\g<2>", + contents, + count=1, + ) + if count != 1: + raise RuntimeError( + "Expected exactly one entry in java/pom.xml" + ) + pom_path.write_text(updated, encoding="utf-8") + + +def write_github_outputs(path: str | None, payload: dict[str, str]) -> None: + if not path: + return + with open(path, "a", encoding="utf-8") as output: + for key, value in payload.items(): + output.write(f"{key}={value}\n") + + +def main(argv: Sequence[str] | None = None) -> int: + parser = argparse.ArgumentParser(description=__doc__) + parser.add_argument( + "tag_or_version", + help="Lance tag or version, for example refs/tags/v7.2.0-beta.1 or 7.2.0", + ) + parser.add_argument( + "--repo-root", + type=Path, + default=Path(__file__).resolve().parents[1], + help="Path to the lancedb repository root", + ) + parser.add_argument( + "--github-output", + default=None, + help="Optional GitHub Actions output file to receive metadata fields", + ) + parser.add_argument( + "--metadata-only", + action="store_true", + help="Only print derived metadata; do not modify dependency files", + ) + args = parser.parse_args(argv) + + repo_root = args.repo_root.resolve() + version = normalize_version(args.tag_or_version) + payload = metadata_for(version) + + if not args.metadata_only: + run_command([sys.executable, "ci/set_lance_version.py", version], cwd=repo_root) + update_java_lance_core_version(repo_root, version) + + write_github_outputs(args.github_output, payload) + print(json.dumps(payload, sort_keys=True)) + return 0 + + +if __name__ == "__main__": + sys.exit(main())