Feature request: improve `runs-on/snapshot` ergonomics for build-state snapshots in matrix CI jobs

# Summary

I have been testing `runs-on/snapshot@v1` for Rust/Cargo CI build-state reuse on RunsOn runners.

The short version: it works, and it is the best result I have found so far for making Cargo behave close to a local no-op rebuild in CI. However, the current action interface appears to be shaped mostly around simple single-path use cases such as Docker layer state. For a Rust workspace with many deployable binaries built in parallel, I had to encode cache identity into `version`, manually mount into an isolated path, manually implement a checkout that does not destroy the restored state, and choose blunt save policies.

This issue is a request for API additions and documentation examples that would make `runs-on/snapshot` a cleaner fit for build-state snapshots in matrix CI jobs.

# Context: Rust/Cargo CI caching problem

Cargo no-op rebuilds are fast locally because Cargo reuses more than compiled artifacts. It reuses the whole build state and filesystem context:

```text
target/debug/.fingerprint/
target/debug/deps/*.d
target/debug/incremental/
target/debug/build/
target/release/.fingerprint/
target/<target-triple>/release/.fingerprint/
Cargo registry source dirs
build script outputs
source tree mtimes
workspace absolute path
toolchain/build flags/env
```

Traditional `actions/cache`-style archive caches help with dependency downloads and sometimes compiled artifacts, but they do not reliably reproduce the same filesystem state Cargo uses to decide whether units are fresh.

For Rust workloads, a block-level snapshot is a much better primitive than archive caching because it restores the actual filesystem state instead of reconstructing it from a tar/zstd archive.

```mermaid
flowchart TD
    local[Local no-op Cargo rebuild]
    state[Same filesystem state]
    cargo[Cargo freshness check]
    fresh[Fresh units, no rebuild]

    state --> cargo --> fresh
    local --> state

    state --> source[Source tree mtimes]
    state --> target[Target fingerprints and dep-info]
    state --> scripts[Build script outputs]
    state --> registry[Cargo registry source paths]
    state --> tools[Toolchain and build flags]
```

# Repository shape

The workload is a Rust workspace with many workspace crates and many deployable binaries. CI builds each deployable binary in a matrix job using a command shaped like:

```text
cargo lambda build --release --arm64 --bin <binary-name>
```

Conceptually:

```yaml
strategy:
  matrix:
    function:
      - { name: function_a, binary: crate_binary_a }
      - { name: function_b, binary: crate_binary_b }
      - { name: function_c, binary: crate_binary_c }
      - { name: function_d, binary: crate_binary_d }
```

Each matrix entry runs on a separate runner and should have its own build-state snapshot so matrix jobs do not overwrite or race with each other.

```mermaid
flowchart LR
    workflow[Matrix workflow]
    fna[function_a job]
    fnb[function_b job]
    fnc[function_c job]

    sna[(snapshot stream A)]
    snb[(snapshot stream B)]
    snc[(snapshot stream C)]

    workflow --> fna --> sna
    workflow --> fnb --> snb
    workflow --> fnc --> snc

    sna -. default branch fallback .-> main[(default branch snapshots)]
    snb -. default branch fallback .-> main
    snc -. default branch fallback .-> main
```

# What worked well

Using `runs-on/snapshot@v1`, I was able to snapshot an isolated filesystem root containing:

```text
/mnt/rust-cargo-snapshot/workspace       # source checkout
/mnt/rust-cargo-snapshot/workspace/app/target
/mnt/rust-cargo-snapshot/cargo-home      # CARGO_HOME
/mnt/rust-cargo-snapshot/tools           # optional tool installs such as Zig/cargo-lambda
/mnt/rust-cargo-snapshot/zig-cache       # optional Zig caches
```

After the first snapshot was seeded, subsequent matrix jobs restored build state correctly and `cargo lambda build` became effectively a no-op for unchanged builds.

I measured a large improvement over the previous setup. A lean version of the workflow with snapshot-local tools was around 36s-41s per matrix job.

The actual Cargo build step itself was only a few seconds, with the remaining time mostly runner setup, snapshot restore/save initiation, and tool setup.

This is an excellent result and clearly better than the S3/archive cache approaches I considered for this specific “preserve Cargo local build state” goal.

# Current working workflow shape

This is a simplified version of the working shape:

```yaml
jobs:
  build-functions:
    runs-on: runs-on=${{ github.run_id }}/cpu=12/family=m8azn/image=ubuntu24-full-x64

    strategy:
      fail-fast: false
      matrix:
        function:
          - { name: function_a, binary: crate_binary_a }
          - { name: function_b, binary: crate_binary_b }

    env:
      SNAPSHOT_ROOT: /mnt/rust-cargo-snapshot
      SNAPSHOT_WORKSPACE: /mnt/rust-cargo-snapshot/workspace
      CARGO_HOME: /mnt/rust-cargo-snapshot/cargo-home
      CARGO_TARGET_DIR: /mnt/rust-cargo-snapshot/workspace/app/target
      FUNCTION_OUTPUT_PATH: /mnt/rust-cargo-snapshot/workspace/path/to/output/${{ matrix.function.name }}

      # I had to use `version` as a cache identity/key.
      SNAPSHOT_VERSION: cargo-function-${{ matrix.function.name }}-release-arm64-v3

    steps:
      - name: Restore function workspace snapshot
        uses: runs-on/snapshot@v1
        with:
          path: ${{ env.SNAPSHOT_ROOT }}
          version: ${{ env.SNAPSHOT_VERSION }}
          volume_size: 10

      - name: Configure snapshot paths
        run: |
          set -euo pipefail
          sudo chown "$USER:$USER" "$SNAPSHOT_ROOT"
          mkdir -p "$SNAPSHOT_WORKSPACE" "$CARGO_HOME"
          echo "$CARGO_HOME/bin" >> "$GITHUB_PATH"

      - name: Checkout Repo
        env:
          GITHUB_TOKEN: ${{ github.token }}
        run: |
          set -euo pipefail
          git config --global --add safe.directory "$SNAPSHOT_WORKSPACE"
          cd "$SNAPSHOT_WORKSPACE"

          if [ ! -d .git ]; then
            git init .
            git remote add origin "https://github.com/${GITHUB_REPOSITORY}.git"
          else
            git remote set-url origin "https://github.com/${GITHUB_REPOSITORY}.git"
          fi

          auth_header="AUTHORIZATION: basic $(printf 'x-access-token:%s' "$GITHUB_TOKEN" | base64 -w0)"
          if [ "${GITHUB_REF#refs/heads/}" != "$GITHUB_REF" ]; then
            fetch_ref="+${GITHUB_REF}:refs/remotes/origin/${GITHUB_REF_NAME}"
          else
            fetch_ref="$GITHUB_REF"
          fi

          git -c "http.https://github.com/.extraheader=$auth_header" fetch --force --prune --no-tags origin "$fetch_ref"
          if [ "$(git rev-parse --verify HEAD 2>/dev/null || true)" != "$GITHUB_SHA" ]; then
            git -c advice.detachedHead=false checkout --detach --force "$GITHUB_SHA"
          fi

      - name: Configure private Cargo registry credentials
        run: |
          # Project-specific secret setup happens here.
          # The workflow removes credentials again before snapshot save.
          true

      - name: Setup Rust
        uses: dtolnay/rust-toolchain@stable
        with:
          components: rustfmt
          targets: aarch64-unknown-linux-gnu

      - name: Build binary
        working-directory: ${{ env.SNAPSHOT_WORKSPACE }}/app
        run: |
          cargo lambda build \
            --lambda-dir "$FUNCTION_OUTPUT_PATH" \
            --bin "${{ matrix.function.binary }}" \
            --flatten "${{ matrix.function.binary }}" \
            --release \
            --arm64

      - name: Scrub credentials before snapshot
        if: always()
        run: |
          rm -f "$CARGO_HOME/credentials" "$CARGO_HOME/credentials.toml" "$CARGO_HOME/config.toml"
```

This works, but it took a lot of workflow plumbing to get there.

```mermaid
sequenceDiagram
    participant Job as Matrix job
    participant Snapshot as runs-on/snapshot
    participant FS as /mnt/rust-cargo-snapshot
    participant Git as Manual checkout
    participant Cargo as Cargo build

    Job->>Snapshot: restore path with per-function identity
    Snapshot->>FS: mount restored EBS volume
    Job->>FS: create workspace, cargo-home, tool dirs
    Job->>Git: fetch requested ref
    Git->>FS: checkout only if HEAD differs
    Job->>Cargo: build function binary
    Cargo->>FS: reuse target fingerprints/artifacts
    Job->>FS: remove credentials before save
    Snapshot->>FS: unmount and start new snapshot in post step
```

# Friction points

## 1. Snapshot identity is overloaded into `version`

The current inputs are:

```text
path
version
volume_type
volume_iops
volume_throughput
volume_size
volume_initialization_rate
wait_for_completion
save
```

`version` appears to be both a schema/version bump and part of the snapshot lookup identity. For a matrix workflow, each matrix entry needs a distinct snapshot stream. Since snapshot lookup does not include `path`, I had to do this:

```yaml
version: cargo-function-${{ matrix.function.name }}-release-arm64-v3
```

That works, but semantically it is a cache key, not a version.

It would be cleaner to have a first-class `key` input:

```yaml
with:
  path: /mnt/rust-cargo-snapshot
  key: cargo-function-${{ matrix.function.name }}-release-arm64
  version: v3
```

Then `version` could mean only “break compatibility / force fresh snapshot” and `key` could mean “which snapshot stream is this?”

## 2. No restore-key semantics

The current built-in branch fallback is useful:

```text
current branch snapshot -> default branch snapshot -> blank volume
```

For build state, it would be useful to expose something closer to `actions/cache` restore keys:

```yaml
with:
  key: cargo-function-${{ matrix.function.name }}-${{ github.ref_name }}
  restore-keys: |
    cargo-function-${{ matrix.function.name }}-${{ github.ref_name }}
    cargo-function-${{ matrix.function.name }}-main
```

Or a simpler RunsOn-native branch fallback:

```yaml
with:
  key: cargo-function-${{ matrix.function.name }}
  branch-fallback: true
  default-branch-fallback: true
```

## 3. Restore/save lifecycle is coupled

Today restore happens in the main action and save happens in the post step. That is simple, but it limits workflow control.

For PRs, a useful policy is:

```text
main builds: restore + save
PR builds: restore main snapshot + do not save
```

This is possible with `save` expressions, for example:

```yaml
save: ${{ github.event_name == 'push' && github.ref_name == 'main' }}
```

However, a more advanced case would be “restore, build, then decide at runtime whether saving is worth it.” For example, skip saving if the restored snapshot already corresponds to this commit, or skip saving if no relevant files changed.

There is no way for a shell step after restore/build to tell the post step “do not save after all.”

This would be useful:

```yaml
with:
  save: auto
  save-marker-file: /mnt/rust-cargo-snapshot/.snapshot-source-sha
  save-marker-value: ${{ github.sha }}
```

Or a file-command API:

```bash
echo "save=false" >> "$RUNS_ON_SNAPSHOT_STATE"
```

The action could then skip the post-step snapshot if workflow logic determined it was unnecessary.

## 4. No outputs describing what was restored

For observability and debugging, it would be very helpful if the restore step exposed outputs such as:

```yaml
outputs:
  restored: true
  restored-from: branch | default-branch | empty
  restored-branch: main
  restored-snapshot-id: snap-...
  volume-id: vol-...
  new-volume: false
```

I had to inspect action logs manually to answer questions like:

```text
Did this run restore a real snapshot?
Did it fall back to the default branch?
Did it create a blank volume?
Which snapshot id was used?
Did it save successfully?
```

Outputs would make it easy to add workflow summaries and conditional behavior.

## 5. Directly mounting over `${{ github.workspace }}` is dangerous

The first attempt mounted the snapshot directly at GitHub’s workspace path:

```yaml
path: ${{ github.workspace }}
```

This caused `runs-on/snapshot` post-step save to fail:

```text
umount: /home/runner/_work/.../<repo>: target is busy
```

The job still completed successfully, but no snapshot was saved. The next run restored a blank volume again.

The working pattern was to mount somewhere else:

```text
/mnt/rust-cargo-snapshot
```

Then use:

```text
/mnt/rust-cargo-snapshot/workspace
```

as the actual repository checkout/build directory.

This pattern is not obvious from the docs. It may be worth documenting explicitly:

```text
For source/workspace snapshots, avoid mounting directly on GITHUB_WORKSPACE.
Mount under /mnt/... and perform checkout/build inside that mount.
```

## 6. `actions/checkout` was not ideal for restored workspaces

I also found that `actions/checkout` can be too aggressive for this use case. The goal is not just to get the right source contents into the workspace. The goal is to update a previously restored workspace in a way that resembles a local `git fetch` / `git checkout` without destroying the restored `target/` tree or touching unchanged source files unnecessarily.

In the first blank-volume run, `actions/checkout` printed:

```text
Deleting the contents of '<workspace>'
```

and then failed during cleanup because the restored mount point was not yet a Git repository:

```text
fatal: --local can only be used inside a git repository
fatal: not a git repository (or any parent up to mount point ...)
```

Even when `clean: false` is set, `actions/checkout` still has bootstrap behavior for an empty/non-repo target directory. That behavior is reasonable for normal ephemeral CI, but it is not ideal when the target directory is a restored build-state volume that should be preserved.

The other important concern is file mtimes. Cargo freshness checks rely heavily on relationships between source mtimes, dep-info files, fingerprint files, build script outputs, and compiled artifacts. If checkout rewrites unchanged source files, Cargo can see source files as newer than restored target metadata and mark units dirty even when file contents are identical.

So the checkout operation for this pattern needs slightly different semantics:

```text
if the restored path is not a repo:
  initialize it

fetch the requested ref

if HEAD is already the requested SHA:
  do not checkout again
  avoid touching files

if HEAD differs:
  checkout the requested SHA

do not run git clean
do not delete target/
```

I ended up writing a manual checkout step that:

1. Initializes the repo only if `.git` is missing.
2. Fetches the requested ref.
3. Checks out the requested SHA only if `HEAD` is not already that SHA.
4. Avoids `git clean`.

This is important because Cargo freshness depends on unchanged source file mtimes relative to target metadata.

A helper action or documented recipe would be very useful:

```yaml
- uses: runs-on/snapshot-checkout@v1
  with:
    path: /mnt/rust-cargo-snapshot/workspace
    ref: ${{ github.ref }}
    sha: ${{ github.sha }}
    clean: false
    skip-if-head-matches: true
```

Or the snapshot action docs could include a recommended checkout snippet for this pattern.

## 7. Matrix/workspace use case could use first-class examples

The current examples are mostly simple single-path examples. For a Rust workspace with many matrix builds, the important details are:

```text
one snapshot stream per matrix item
same absolute build path each time
CARGO_HOME inside snapshot if target dep-info references registry source paths
tools inside snapshot if setup time matters
manual checkout that does not clobber target or unchanged files
save policy: main saves, PRs may restore-only
```

A Rust/Cargo example in the docs would help users avoid a lot of trial and error.

# Suggested API improvements

## Add `key`

```yaml
with:
  path: /mnt/rust-cargo-snapshot
  key: cargo-function-${{ matrix.function.name }}-release-arm64
  version: v3
```

Recommended semantics:

```text
key = snapshot stream identity
version = format/schema/manual invalidation value
```

## Add restore-key or fallback controls

Option A, close to `actions/cache`:

```yaml
with:
  key: cargo-function-${{ matrix.function.name }}-${{ github.ref_name }}
  restore-keys: |
    cargo-function-${{ matrix.function.name }}-${{ github.ref_name }}
    cargo-function-${{ matrix.function.name }}-main
```

Option B, simpler RunsOn-native branch fallback:

```yaml
with:
  key: cargo-function-${{ matrix.function.name }}
  branch-fallback: true
  default-branch-fallback: true
```

## Add restore outputs

```yaml
outputs:
  restored: true
  restored-from: branch | default-branch | empty
  restored-snapshot-id: snap-...
  restored-branch: main
  volume-id: vol-...
  new-volume: false
```

## Add save outputs

```yaml
outputs:
  saved: true
  saved-snapshot-id: snap-...
  save-started: true
  save-waited-for-completion: false
```

## Add runtime save control

For example:

```yaml
with:
  save: auto
  save-marker-file: /mnt/rust-cargo-snapshot/.snapshot-source-sha
  save-marker-value: ${{ github.sha }}
```

Or expose a state file/env file that later steps can write to:

```bash
echo "save=false" >> "$RUNS_ON_SNAPSHOT_STATE"
```

## Document source/workspace snapshot pattern

Specifically document that users should prefer:

```text
/mnt/my-snapshot-root/workspace
```

over mounting directly onto:

```text
${{ github.workspace }}
```

for source/build workspace snapshots.

## Provide checkout guidance or helper

A helper or example that preserves restored state:

```yaml
- name: Checkout Repo without destroying snapshot state
  run: |
    set -euo pipefail
    git config --global --add safe.directory "$SNAPSHOT_WORKSPACE"
    cd "$SNAPSHOT_WORKSPACE"

    if [ ! -d .git ]; then
      git init .
      git remote add origin "https://github.com/${GITHUB_REPOSITORY}.git"
    else
      git remote set-url origin "https://github.com/${GITHUB_REPOSITORY}.git"
    fi

    auth_header="AUTHORIZATION: basic $(printf 'x-access-token:%s' "$GITHUB_TOKEN" | base64 -w0)"
    git -c "http.https://github.com/.extraheader=$auth_header" fetch --force --prune --no-tags origin "$fetch_ref"
    if [ "$(git rev-parse --verify HEAD 2>/dev/null || true)" != "$GITHUB_SHA" ]; then
      git -c advice.detachedHead=false checkout --detach --force "$GITHUB_SHA"
    fi
```

# Desired final workflow shape

Ideally, a workflow could look more like this:

```yaml
jobs:
  build-functions:
    runs-on: runs-on=${{ github.run_id }}/cpu=12/family=m8azn/image=ubuntu24-full-x64

    strategy:
      matrix:
        function:
          - { name: function_a, binary: crate_binary_a }
          - { name: function_b, binary: crate_binary_b }

    env:
      SNAPSHOT_ROOT: /mnt/rust-cargo-snapshot
      SNAPSHOT_WORKSPACE: /mnt/rust-cargo-snapshot/workspace
      CARGO_HOME: /mnt/rust-cargo-snapshot/cargo-home
      CARGO_TARGET_DIR: /mnt/rust-cargo-snapshot/workspace/app/target

    steps:
      - uses: runs-on/snapshot@v1
        id: snapshot
        with:
          path: ${{ env.SNAPSHOT_ROOT }}
          key: cargo-function-${{ matrix.function.name }}-release-arm64
          version: v3
          volume_size: 10
          save: ${{ github.event_name == 'push' && github.ref_name == 'main' }}

      - uses: runs-on/snapshot-checkout@v1
        with:
          path: ${{ env.SNAPSHOT_WORKSPACE }}
          clean: false
          skip-if-head-matches: true

      - uses: dtolnay/rust-toolchain@stable
        with:
          components: rustfmt
          targets: aarch64-unknown-linux-gnu

      - working-directory: ${{ env.SNAPSHOT_WORKSPACE }}/app
        run: |
          cargo lambda build \
            --lambda-dir "$FUNCTION_OUTPUT_PATH" \
            --bin "${{ matrix.function.binary }}" \
            --flatten "${{ matrix.function.binary }}" \
            --release \
            --arm64
```

# Why this matters

Rust CI performance is often bottlenecked by Cargo rebuilds. Generic archive caches are not enough to reproduce local no-op rebuild behavior. `runs-on/snapshot` is the first mechanism I tested that actually got close to local behavior for `cargo lambda build` in CI.

The current action is already powerful enough to make this work, but the YAML is more complex than it needs to be for matrix/workspace build-state snapshots. A few small API additions would make this pattern much easier to adopt and less error-prone.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature request: improve `runs-on/snapshot` ergonomics for build-state snapshots in matrix CI jobs #21

Summary

Context: Rust/Cargo CI caching problem

Repository shape

What worked well

Current working workflow shape

Friction points

1. Snapshot identity is overloaded into `version`

2. No restore-key semantics

3. Restore/save lifecycle is coupled

4. No outputs describing what was restored

5. Directly mounting over `${{ github.workspace }}` is dangerous

6. `actions/checkout` was not ideal for restored workspaces

7. Matrix/workspace use case could use first-class examples

Suggested API improvements

Add `key`

Add restore-key or fallback controls

Add restore outputs

Add save outputs

Add runtime save control

Document source/workspace snapshot pattern

Provide checkout guidance or helper

Desired final workflow shape

Why this matters

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Feature request: improve runs-on/snapshot ergonomics for build-state snapshots in matrix CI jobs #21

Description

Summary

Context: Rust/Cargo CI caching problem

Repository shape

What worked well

Current working workflow shape

Friction points

1. Snapshot identity is overloaded into version

2. No restore-key semantics

3. Restore/save lifecycle is coupled

4. No outputs describing what was restored

5. Directly mounting over ${{ github.workspace }} is dangerous

6. actions/checkout was not ideal for restored workspaces

7. Matrix/workspace use case could use first-class examples

Suggested API improvements

Add key

Add restore-key or fallback controls

Add restore outputs

Add save outputs

Add runtime save control

Document source/workspace snapshot pattern

Provide checkout guidance or helper

Desired final workflow shape

Why this matters

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Feature request: improve `runs-on/snapshot` ergonomics for build-state snapshots in matrix CI jobs #21

1. Snapshot identity is overloaded into `version`

5. Directly mounting over `${{ github.workspace }}` is dangerous

6. `actions/checkout` was not ideal for restored workspaces

Add `key`