Skip to content

Flex v3 control-plane writes empty defaultBranchruns-on/snapshot never falls back to the default-branch snapshot #24

Description

@gnuletik

Summary

After upgrading from RunsOn v2.12.6 (CloudFormation) to v3.1.1 (Terraform flex module), runs-on/snapshot stopped restoring the default-branch snapshot on pull-request runs. Every PR's first run now starts from a blank volume instead of inheriting the default branch's cache.

Root cause: the v3 flexd control-plane writes "defaultBranch": "" (empty) into the per-runner config at $RUNS_ON_HOME/config.json. The runs-on/snapshot action only performs its "fall back to the default branch" restore when that field is non-empty, so the fallback is silently disabled.

Our default branch is main — i.e. config.json should contain "defaultBranch": "main", and it did under v2.12.6.

Environment

RunsOn (before, working) v2.12.6, CloudFormation install
RunsOn (after, broken) v3.1.1, Terraform flex module 3.1.1 (also reproduces conceptually on 3.1.2)
Control-plane service flexd (ECS), app_version: v3.1.1
runs-on/snapshot action v1.1.1 (commit d3bcc42) — unchanged across the upgrade
Runner linux / amd64
Default branch main

Expected vs actual

runs-on/snapshot documents this restore order:

  1. snapshot for the current branch
  2. else snapshot for the repository default branch
  3. else a blank volume

For a PR, step 1 can never match on the first run — the ref is the PR merge ref (<PR>/merge), unique per PR. So PRs rely entirely on step 2.

  • Expected: step 2 restores the main snapshot (which exists and is tagged runs-on-snapshot-branch=main).
  • Actual: step 2 is skipped because defaultBranch is empty, and a blank volume is created.

Where it's gated (action side — for reference)

In runs-on/snapshot (d3bcc42), the fallback is conditional on a non-empty value read from the control-plane-provided config:

// internal/snapshot/restore.go
} else if s.config.RunnerConfig.DefaultBranch != "" {   // ← fallback only runs if non-empty
    // search snapshots tagged with the default branch
}
// internal/config/config.go
configBytes, _ := os.ReadFile(filepath.Join(os.Getenv("RUNS_ON_HOME"), "config.json"))
// → RunnerConfig.DefaultBranch  (json key "defaultBranch")

The action is behaving correctly given its input; the input is empty.

Evidence

1. Runner-side: config.json has an empty defaultBranch

The action logs PrettyPrint(cfg.RunnerConfig) on every run. customTags populate correctly (so the control-plane is writing the file), but defaultBranch is empty:

Runner config: {
  "defaultBranch": "",
  "customTags": [ { "key": "runs-on-stack-name", "value": "<redacted>" }, ... ]
}

2. Default-branch run works; PR run does not

On a push to the default branch, the ref is main, so it matches its own branch-tagged snapshot via step 1 and never needs the fallback:

RestoreSnapshot: Using git ref: main
RestoreSnapshot: Found latest snapshot snap-xxxx for branch main
CreateSnapshot: Using git ref: main → Snapshot created: snap-yyyy   (~88% savings)

On a PR the fallback is needed but is skipped:

RestoreSnapshot: Using git ref: <PR>/merge
RestoreSnapshot: Searching ... tag:runs-on-snapshot-branch = ["<PR>/merge"]
RestoreSnapshot: Creating a new blank volume

This means the main snapshot is healthy and present; PRs simply can't reach it because the fallback never fires.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions