Skip to content

Feature request: smart snapshot save based on relevant Git path changes #23

Description

@garysassano

Summary

I have been testing runs-on/snapshot@v1 for preserving build state between ephemeral CI runners. The restore side works very well for Rust/Cargo workloads, but snapshot saving has a fixed cost even when saving is unlikely to improve the next run.

For matrix builds, especially where each matrix entry owns a separate snapshot stream, it would be useful if the snapshot action could automatically skip the post-step save when no relevant source paths changed.

The goal is to keep restoring snapshots aggressively, but save snapshots only when the restored build-state volume actually needs to learn about changes relevant to that snapshot.

Why this matters

In the measured workflow, once snapshots are warm, the actual build can be close to a no-op. However, the snapshot post step still has to do work when save: true:

unmount snapshot volume
detach EBS volume
wait until volume is detached
start EBS snapshot
return, or optionally wait for completion

Even with wait_for_completion: false, this post-step save initiation can add noticeable latency.

For some CI runs, saving is valuable:

main branch build changed relevant source paths
dependency lockfile changed
toolchain version changed
build configuration changed

For other runs, saving is probably wasteful:

the restored snapshot already corresponds to this commit
only docs changed
only unrelated packages changed
only unrelated matrix entries changed
the job restored from main and the PR did not change paths relevant to this snapshot

Today, the only practical controls are static workflow expressions such as:

save: ${{ github.event_name == 'push' && github.ref_name == 'main' }}

That is useful, but it is blunt. It cannot express “save only if paths relevant to this matrix entry changed.”

Example use case

Imagine a Rust workspace with many deployable binaries built in a matrix:

strategy:
  matrix:
    function:
      - { name: function_a, binary: crate_binary_a }
      - { name: function_b, binary: crate_binary_b }
      - { name: function_c, binary: crate_binary_c }

Each function has an independent snapshot stream:

cargo-function-function_a-release-arm64
cargo-function-function_b-release-arm64
cargo-function-function_c-release-arm64

If a change only affects function_a, it is useful to save a new function_a snapshot. It may be unnecessary to save new snapshots for function_b and function_c, because their restored build-state volumes did not meaningfully change.

flowchart TD
    change[Git commit]
    paths[Changed paths]
    a[function_a snapshot]
    b[function_b snapshot]
    c[function_c snapshot]
    saveA[save new snapshot]
    skipB[skip save]
    skipC[skip save]

    change --> paths
    paths -->|matches function_a or shared deps| a --> saveA
    paths -->|does not match function_b| b --> skipB
    paths -->|does not match function_c| c --> skipC
Loading

Desired behavior

The snapshot action would restore as usual, then decide in the post step whether saving is necessary based on Git path changes.

Conceptually:

- uses: runs-on/snapshot@v1
  with:
    path: /mnt/build-snapshot
    key: cargo-function-${{ matrix.function.name }}-release-arm64
    save: auto
    save-if-git-changed: true
    git-repository: /mnt/build-snapshot/workspace
    git-base: restored-snapshot
    git-head: ${{ github.sha }}
    changed-paths: |
      app/Cargo.toml
      app/Cargo.lock
      app/libs/**
      app/bins/${{ matrix.function.name }}/**
      app/build.rs

If any configured path changed between the snapshot’s saved source revision and the current source revision, the post step saves a new snapshot.

If no configured path changed, the post step skips saving and logs why:

Skipping snapshot save: no relevant Git path changes for key cargo-function-function_b-release-arm64

How the action could know the restored base revision

There are a few possible designs.

Option 1: Snapshot metadata tag

When saving a snapshot, the action could tag it with source metadata:

runs-on-snapshot-source-sha=<sha>
runs-on-snapshot-source-ref=<branch>

On restore, the action knows which snapshot was restored and can expose or reuse that metadata in the post step.

Then save-if-git-changed can compare:

base = source sha stored on restored snapshot
head = current GITHUB_SHA
paths = configured changed-paths

Option 2: Marker file inside the snapshot volume

The action could write a marker file before saving:

/mnt/build-snapshot/.runs-on-snapshot/source-sha
/mnt/build-snapshot/.runs-on-snapshot/source-ref

On the next restore, the action reads the marker and uses it as the comparison base.

Option 3: User-provided base/head

The action could let the workflow provide both values explicitly:

with:
  save: auto
  save-if-git-changed: true
  git-repository: /mnt/build-snapshot/workspace
  git-base: ${{ steps.snapshot.outputs.restored-source-sha }}
  git-head: ${{ github.sha }}
  changed-paths: |
    app/libs/**
    app/bins/${{ matrix.function.name }}/**

This would require restore outputs such as restored-source-sha.

Possible API shape

One possible input set:

with:
  path: /mnt/build-snapshot
  key: cargo-function-${{ matrix.function.name }}-release-arm64

  save: auto
  save-if: git-paths-changed
  git-repository: /mnt/build-snapshot/workspace
  git-base: restored-snapshot
  git-head: ${{ github.sha }}
  git-paths: |
    app/Cargo.toml
    app/Cargo.lock
    app/libs/**
    app/bins/${{ matrix.function.name }}/**

Another possible shape:

with:
  path: /mnt/build-snapshot
  key: cargo-function-${{ matrix.function.name }}-release-arm64

  save-policy: changed-paths
  save-policy-git-repository: /mnt/build-snapshot/workspace
  save-policy-base: restored-source-sha
  save-policy-head: ${{ github.sha }}
  save-policy-paths: |
    app/Cargo.lock
    app/libs/**
    app/bins/${{ matrix.function.name }}/**

Useful outputs

This feature would be much easier to debug with outputs or summary logs:

outputs:
  save-skipped: true
  save-skip-reason: no-relevant-git-path-changes
  restored-source-sha: abc123
  current-source-sha: def456
  changed-path-count: 0
  matched-changed-paths: ''

If saving does happen:

outputs:
  save-skipped: false
  save-reason: matched-git-paths
  matched-changed-paths: |
    app/libs/shared_crate/src/lib.rs
    app/Cargo.lock

Why workflow-level conditions are not enough

It is possible to use static policies today:

save: ${{ github.event_name == 'push' && github.ref_name == 'main' }}

This is useful, but it cannot decide based on what happened inside the restored snapshot or what paths changed for a specific matrix entry.

For example, a monorepo matrix may have many independent build outputs. A global branch-level save policy either saves all matrix snapshots or none of them.

A path-aware save policy could save only the snapshots that have a meaningful reason to change.

flowchart LR
    static[Static save expression]
    all[All matrix jobs save or all skip]
    smart[Path-aware save policy]
    selected[Only affected matrix snapshots save]

    static --> all
    smart --> selected
Loading

Edge cases to consider

Shared dependencies

For a Rust workspace, paths are not always isolated to one binary. A change in a shared crate can affect many matrix entries.

The workflow author should be able to include shared paths in every matrix entry’s relevant path list:

git-paths: |
  app/Cargo.toml
  app/Cargo.lock
  app/libs/**
  app/bins/${{ matrix.function.name }}/**

Missing base revision

If the restored snapshot does not have source metadata, the action should probably save by default:

No restored source metadata found; saving snapshot.

Force-save escape hatch

There should be a way to override the smart policy:

save: true

or:

force-save: true

Failed builds

The existing behavior of saving only on successful jobs is still the right default. A smart save policy should not persist failed or partially written build state unless explicitly requested.

Why this helps

For restore-heavy CI workflows, this would reduce unnecessary post-step work and reduce snapshot churn.

It would be especially useful for:

monorepos
matrix builds
Rust/Cargo workspaces
multi-function serverless builds
large build-state volumes
PR builds that mostly touch a small subset of paths

The snapshot feature already makes it possible to get close to local no-op build behavior. A path-aware save policy would make it cheaper to use that pattern at scale.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions