Skip to content

Feature request: improve runs-on/snapshot ergonomics for build-state snapshots in matrix CI jobs #21

Description

@garysassano

Summary

I have been testing runs-on/snapshot@v1 for Rust/Cargo CI build-state reuse on RunsOn runners.

The short version: it works, and it is the best result I have found so far for making Cargo behave close to a local no-op rebuild in CI. However, the current action interface appears to be shaped mostly around simple single-path use cases such as Docker layer state. For a Rust workspace with many deployable binaries built in parallel, I had to encode cache identity into version, manually mount into an isolated path, manually implement a checkout that does not destroy the restored state, and choose blunt save policies.

This issue is a request for API additions and documentation examples that would make runs-on/snapshot a cleaner fit for build-state snapshots in matrix CI jobs.

Context: Rust/Cargo CI caching problem

Cargo no-op rebuilds are fast locally because Cargo reuses more than compiled artifacts. It reuses the whole build state and filesystem context:

target/debug/.fingerprint/
target/debug/deps/*.d
target/debug/incremental/
target/debug/build/
target/release/.fingerprint/
target/<target-triple>/release/.fingerprint/
Cargo registry source dirs
build script outputs
source tree mtimes
workspace absolute path
toolchain/build flags/env

Traditional actions/cache-style archive caches help with dependency downloads and sometimes compiled artifacts, but they do not reliably reproduce the same filesystem state Cargo uses to decide whether units are fresh.

For Rust workloads, a block-level snapshot is a much better primitive than archive caching because it restores the actual filesystem state instead of reconstructing it from a tar/zstd archive.

flowchart TD
    local[Local no-op Cargo rebuild]
    state[Same filesystem state]
    cargo[Cargo freshness check]
    fresh[Fresh units, no rebuild]

    state --> cargo --> fresh
    local --> state

    state --> source[Source tree mtimes]
    state --> target[Target fingerprints and dep-info]
    state --> scripts[Build script outputs]
    state --> registry[Cargo registry source paths]
    state --> tools[Toolchain and build flags]
Loading

Repository shape

The workload is a Rust workspace with many workspace crates and many deployable binaries. CI builds each deployable binary in a matrix job using a command shaped like:

cargo lambda build --release --arm64 --bin <binary-name>

Conceptually:

strategy:
  matrix:
    function:
      - { name: function_a, binary: crate_binary_a }
      - { name: function_b, binary: crate_binary_b }
      - { name: function_c, binary: crate_binary_c }
      - { name: function_d, binary: crate_binary_d }

Each matrix entry runs on a separate runner and should have its own build-state snapshot so matrix jobs do not overwrite or race with each other.

flowchart LR
    workflow[Matrix workflow]
    fna[function_a job]
    fnb[function_b job]
    fnc[function_c job]

    sna[(snapshot stream A)]
    snb[(snapshot stream B)]
    snc[(snapshot stream C)]

    workflow --> fna --> sna
    workflow --> fnb --> snb
    workflow --> fnc --> snc

    sna -. default branch fallback .-> main[(default branch snapshots)]
    snb -. default branch fallback .-> main
    snc -. default branch fallback .-> main
Loading

What worked well

Using runs-on/snapshot@v1, I was able to snapshot an isolated filesystem root containing:

/mnt/rust-cargo-snapshot/workspace       # source checkout
/mnt/rust-cargo-snapshot/workspace/app/target
/mnt/rust-cargo-snapshot/cargo-home      # CARGO_HOME
/mnt/rust-cargo-snapshot/tools           # optional tool installs such as Zig/cargo-lambda
/mnt/rust-cargo-snapshot/zig-cache       # optional Zig caches

After the first snapshot was seeded, subsequent matrix jobs restored build state correctly and cargo lambda build became effectively a no-op for unchanged builds.

I measured a large improvement over the previous setup. A lean version of the workflow with snapshot-local tools was around 36s-41s per matrix job.

The actual Cargo build step itself was only a few seconds, with the remaining time mostly runner setup, snapshot restore/save initiation, and tool setup.

This is an excellent result and clearly better than the S3/archive cache approaches I considered for this specific “preserve Cargo local build state” goal.

Current working workflow shape

This is a simplified version of the working shape:

jobs:
  build-functions:
    runs-on: runs-on=${{ github.run_id }}/cpu=12/family=m8azn/image=ubuntu24-full-x64

    strategy:
      fail-fast: false
      matrix:
        function:
          - { name: function_a, binary: crate_binary_a }
          - { name: function_b, binary: crate_binary_b }

    env:
      SNAPSHOT_ROOT: /mnt/rust-cargo-snapshot
      SNAPSHOT_WORKSPACE: /mnt/rust-cargo-snapshot/workspace
      CARGO_HOME: /mnt/rust-cargo-snapshot/cargo-home
      CARGO_TARGET_DIR: /mnt/rust-cargo-snapshot/workspace/app/target
      FUNCTION_OUTPUT_PATH: /mnt/rust-cargo-snapshot/workspace/path/to/output/${{ matrix.function.name }}

      # I had to use `version` as a cache identity/key.
      SNAPSHOT_VERSION: cargo-function-${{ matrix.function.name }}-release-arm64-v3

    steps:
      - name: Restore function workspace snapshot
        uses: runs-on/snapshot@v1
        with:
          path: ${{ env.SNAPSHOT_ROOT }}
          version: ${{ env.SNAPSHOT_VERSION }}
          volume_size: 10

      - name: Configure snapshot paths
        run: |
          set -euo pipefail
          sudo chown "$USER:$USER" "$SNAPSHOT_ROOT"
          mkdir -p "$SNAPSHOT_WORKSPACE" "$CARGO_HOME"
          echo "$CARGO_HOME/bin" >> "$GITHUB_PATH"

      - name: Checkout Repo
        env:
          GITHUB_TOKEN: ${{ github.token }}
        run: |
          set -euo pipefail
          git config --global --add safe.directory "$SNAPSHOT_WORKSPACE"
          cd "$SNAPSHOT_WORKSPACE"

          if [ ! -d .git ]; then
            git init .
            git remote add origin "https://github.com/${GITHUB_REPOSITORY}.git"
          else
            git remote set-url origin "https://github.com/${GITHUB_REPOSITORY}.git"
          fi

          auth_header="AUTHORIZATION: basic $(printf 'x-access-token:%s' "$GITHUB_TOKEN" | base64 -w0)"
          if [ "${GITHUB_REF#refs/heads/}" != "$GITHUB_REF" ]; then
            fetch_ref="+${GITHUB_REF}:refs/remotes/origin/${GITHUB_REF_NAME}"
          else
            fetch_ref="$GITHUB_REF"
          fi

          git -c "http.https://github.com/.extraheader=$auth_header" fetch --force --prune --no-tags origin "$fetch_ref"
          if [ "$(git rev-parse --verify HEAD 2>/dev/null || true)" != "$GITHUB_SHA" ]; then
            git -c advice.detachedHead=false checkout --detach --force "$GITHUB_SHA"
          fi

      - name: Configure private Cargo registry credentials
        run: |
          # Project-specific secret setup happens here.
          # The workflow removes credentials again before snapshot save.
          true

      - name: Setup Rust
        uses: dtolnay/rust-toolchain@stable
        with:
          components: rustfmt
          targets: aarch64-unknown-linux-gnu

      - name: Build binary
        working-directory: ${{ env.SNAPSHOT_WORKSPACE }}/app
        run: |
          cargo lambda build \
            --lambda-dir "$FUNCTION_OUTPUT_PATH" \
            --bin "${{ matrix.function.binary }}" \
            --flatten "${{ matrix.function.binary }}" \
            --release \
            --arm64

      - name: Scrub credentials before snapshot
        if: always()
        run: |
          rm -f "$CARGO_HOME/credentials" "$CARGO_HOME/credentials.toml" "$CARGO_HOME/config.toml"

This works, but it took a lot of workflow plumbing to get there.

sequenceDiagram
    participant Job as Matrix job
    participant Snapshot as runs-on/snapshot
    participant FS as /mnt/rust-cargo-snapshot
    participant Git as Manual checkout
    participant Cargo as Cargo build

    Job->>Snapshot: restore path with per-function identity
    Snapshot->>FS: mount restored EBS volume
    Job->>FS: create workspace, cargo-home, tool dirs
    Job->>Git: fetch requested ref
    Git->>FS: checkout only if HEAD differs
    Job->>Cargo: build function binary
    Cargo->>FS: reuse target fingerprints/artifacts
    Job->>FS: remove credentials before save
    Snapshot->>FS: unmount and start new snapshot in post step
Loading

Friction points

1. Snapshot identity is overloaded into version

The current inputs are:

path
version
volume_type
volume_iops
volume_throughput
volume_size
volume_initialization_rate
wait_for_completion
save

version appears to be both a schema/version bump and part of the snapshot lookup identity. For a matrix workflow, each matrix entry needs a distinct snapshot stream. Since snapshot lookup does not include path, I had to do this:

version: cargo-function-${{ matrix.function.name }}-release-arm64-v3

That works, but semantically it is a cache key, not a version.

It would be cleaner to have a first-class key input:

with:
  path: /mnt/rust-cargo-snapshot
  key: cargo-function-${{ matrix.function.name }}-release-arm64
  version: v3

Then version could mean only “break compatibility / force fresh snapshot” and key could mean “which snapshot stream is this?”

2. No restore-key semantics

The current built-in branch fallback is useful:

current branch snapshot -> default branch snapshot -> blank volume

For build state, it would be useful to expose something closer to actions/cache restore keys:

with:
  key: cargo-function-${{ matrix.function.name }}-${{ github.ref_name }}
  restore-keys: |
    cargo-function-${{ matrix.function.name }}-${{ github.ref_name }}
    cargo-function-${{ matrix.function.name }}-main

Or a simpler RunsOn-native branch fallback:

with:
  key: cargo-function-${{ matrix.function.name }}
  branch-fallback: true
  default-branch-fallback: true

3. Restore/save lifecycle is coupled

Today restore happens in the main action and save happens in the post step. That is simple, but it limits workflow control.

For PRs, a useful policy is:

main builds: restore + save
PR builds: restore main snapshot + do not save

This is possible with save expressions, for example:

save: ${{ github.event_name == 'push' && github.ref_name == 'main' }}

However, a more advanced case would be “restore, build, then decide at runtime whether saving is worth it.” For example, skip saving if the restored snapshot already corresponds to this commit, or skip saving if no relevant files changed.

There is no way for a shell step after restore/build to tell the post step “do not save after all.”

This would be useful:

with:
  save: auto
  save-marker-file: /mnt/rust-cargo-snapshot/.snapshot-source-sha
  save-marker-value: ${{ github.sha }}

Or a file-command API:

echo "save=false" >> "$RUNS_ON_SNAPSHOT_STATE"

The action could then skip the post-step snapshot if workflow logic determined it was unnecessary.

4. No outputs describing what was restored

For observability and debugging, it would be very helpful if the restore step exposed outputs such as:

outputs:
  restored: true
  restored-from: branch | default-branch | empty
  restored-branch: main
  restored-snapshot-id: snap-...
  volume-id: vol-...
  new-volume: false

I had to inspect action logs manually to answer questions like:

Did this run restore a real snapshot?
Did it fall back to the default branch?
Did it create a blank volume?
Which snapshot id was used?
Did it save successfully?

Outputs would make it easy to add workflow summaries and conditional behavior.

5. Directly mounting over ${{ github.workspace }} is dangerous

The first attempt mounted the snapshot directly at GitHub’s workspace path:

path: ${{ github.workspace }}

This caused runs-on/snapshot post-step save to fail:

umount: /home/runner/_work/.../<repo>: target is busy

The job still completed successfully, but no snapshot was saved. The next run restored a blank volume again.

The working pattern was to mount somewhere else:

/mnt/rust-cargo-snapshot

Then use:

/mnt/rust-cargo-snapshot/workspace

as the actual repository checkout/build directory.

This pattern is not obvious from the docs. It may be worth documenting explicitly:

For source/workspace snapshots, avoid mounting directly on GITHUB_WORKSPACE.
Mount under /mnt/... and perform checkout/build inside that mount.

6. actions/checkout was not ideal for restored workspaces

I also found that actions/checkout can be too aggressive for this use case. The goal is not just to get the right source contents into the workspace. The goal is to update a previously restored workspace in a way that resembles a local git fetch / git checkout without destroying the restored target/ tree or touching unchanged source files unnecessarily.

In the first blank-volume run, actions/checkout printed:

Deleting the contents of '<workspace>'

and then failed during cleanup because the restored mount point was not yet a Git repository:

fatal: --local can only be used inside a git repository
fatal: not a git repository (or any parent up to mount point ...)

Even when clean: false is set, actions/checkout still has bootstrap behavior for an empty/non-repo target directory. That behavior is reasonable for normal ephemeral CI, but it is not ideal when the target directory is a restored build-state volume that should be preserved.

The other important concern is file mtimes. Cargo freshness checks rely heavily on relationships between source mtimes, dep-info files, fingerprint files, build script outputs, and compiled artifacts. If checkout rewrites unchanged source files, Cargo can see source files as newer than restored target metadata and mark units dirty even when file contents are identical.

So the checkout operation for this pattern needs slightly different semantics:

if the restored path is not a repo:
  initialize it

fetch the requested ref

if HEAD is already the requested SHA:
  do not checkout again
  avoid touching files

if HEAD differs:
  checkout the requested SHA

do not run git clean
do not delete target/

I ended up writing a manual checkout step that:

  1. Initializes the repo only if .git is missing.
  2. Fetches the requested ref.
  3. Checks out the requested SHA only if HEAD is not already that SHA.
  4. Avoids git clean.

This is important because Cargo freshness depends on unchanged source file mtimes relative to target metadata.

A helper action or documented recipe would be very useful:

- uses: runs-on/snapshot-checkout@v1
  with:
    path: /mnt/rust-cargo-snapshot/workspace
    ref: ${{ github.ref }}
    sha: ${{ github.sha }}
    clean: false
    skip-if-head-matches: true

Or the snapshot action docs could include a recommended checkout snippet for this pattern.

7. Matrix/workspace use case could use first-class examples

The current examples are mostly simple single-path examples. For a Rust workspace with many matrix builds, the important details are:

one snapshot stream per matrix item
same absolute build path each time
CARGO_HOME inside snapshot if target dep-info references registry source paths
tools inside snapshot if setup time matters
manual checkout that does not clobber target or unchanged files
save policy: main saves, PRs may restore-only

A Rust/Cargo example in the docs would help users avoid a lot of trial and error.

Suggested API improvements

Add key

with:
  path: /mnt/rust-cargo-snapshot
  key: cargo-function-${{ matrix.function.name }}-release-arm64
  version: v3

Recommended semantics:

key = snapshot stream identity
version = format/schema/manual invalidation value

Add restore-key or fallback controls

Option A, close to actions/cache:

with:
  key: cargo-function-${{ matrix.function.name }}-${{ github.ref_name }}
  restore-keys: |
    cargo-function-${{ matrix.function.name }}-${{ github.ref_name }}
    cargo-function-${{ matrix.function.name }}-main

Option B, simpler RunsOn-native branch fallback:

with:
  key: cargo-function-${{ matrix.function.name }}
  branch-fallback: true
  default-branch-fallback: true

Add restore outputs

outputs:
  restored: true
  restored-from: branch | default-branch | empty
  restored-snapshot-id: snap-...
  restored-branch: main
  volume-id: vol-...
  new-volume: false

Add save outputs

outputs:
  saved: true
  saved-snapshot-id: snap-...
  save-started: true
  save-waited-for-completion: false

Add runtime save control

For example:

with:
  save: auto
  save-marker-file: /mnt/rust-cargo-snapshot/.snapshot-source-sha
  save-marker-value: ${{ github.sha }}

Or expose a state file/env file that later steps can write to:

echo "save=false" >> "$RUNS_ON_SNAPSHOT_STATE"

Document source/workspace snapshot pattern

Specifically document that users should prefer:

/mnt/my-snapshot-root/workspace

over mounting directly onto:

${{ github.workspace }}

for source/build workspace snapshots.

Provide checkout guidance or helper

A helper or example that preserves restored state:

- name: Checkout Repo without destroying snapshot state
  run: |
    set -euo pipefail
    git config --global --add safe.directory "$SNAPSHOT_WORKSPACE"
    cd "$SNAPSHOT_WORKSPACE"

    if [ ! -d .git ]; then
      git init .
      git remote add origin "https://github.com/${GITHUB_REPOSITORY}.git"
    else
      git remote set-url origin "https://github.com/${GITHUB_REPOSITORY}.git"
    fi

    auth_header="AUTHORIZATION: basic $(printf 'x-access-token:%s' "$GITHUB_TOKEN" | base64 -w0)"
    git -c "http.https://github.com/.extraheader=$auth_header" fetch --force --prune --no-tags origin "$fetch_ref"
    if [ "$(git rev-parse --verify HEAD 2>/dev/null || true)" != "$GITHUB_SHA" ]; then
      git -c advice.detachedHead=false checkout --detach --force "$GITHUB_SHA"
    fi

Desired final workflow shape

Ideally, a workflow could look more like this:

jobs:
  build-functions:
    runs-on: runs-on=${{ github.run_id }}/cpu=12/family=m8azn/image=ubuntu24-full-x64

    strategy:
      matrix:
        function:
          - { name: function_a, binary: crate_binary_a }
          - { name: function_b, binary: crate_binary_b }

    env:
      SNAPSHOT_ROOT: /mnt/rust-cargo-snapshot
      SNAPSHOT_WORKSPACE: /mnt/rust-cargo-snapshot/workspace
      CARGO_HOME: /mnt/rust-cargo-snapshot/cargo-home
      CARGO_TARGET_DIR: /mnt/rust-cargo-snapshot/workspace/app/target

    steps:
      - uses: runs-on/snapshot@v1
        id: snapshot
        with:
          path: ${{ env.SNAPSHOT_ROOT }}
          key: cargo-function-${{ matrix.function.name }}-release-arm64
          version: v3
          volume_size: 10
          save: ${{ github.event_name == 'push' && github.ref_name == 'main' }}

      - uses: runs-on/snapshot-checkout@v1
        with:
          path: ${{ env.SNAPSHOT_WORKSPACE }}
          clean: false
          skip-if-head-matches: true

      - uses: dtolnay/rust-toolchain@stable
        with:
          components: rustfmt
          targets: aarch64-unknown-linux-gnu

      - working-directory: ${{ env.SNAPSHOT_WORKSPACE }}/app
        run: |
          cargo lambda build \
            --lambda-dir "$FUNCTION_OUTPUT_PATH" \
            --bin "${{ matrix.function.binary }}" \
            --flatten "${{ matrix.function.binary }}" \
            --release \
            --arm64

Why this matters

Rust CI performance is often bottlenecked by Cargo rebuilds. Generic archive caches are not enough to reproduce local no-op rebuild behavior. runs-on/snapshot is the first mechanism I tested that actually got close to local behavior for cargo lambda build in CI.

The current action is already powerful enough to make this work, but the YAML is more complex than it needs to be for matrix/workspace build-state snapshots. A few small API additions would make this pattern much easier to adopt and less error-prone.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions