Skip to content

ci: add dependabot auto-merge workflow#2705

Open
yonib05 wants to merge 6 commits into
strands-agents:mainfrom
yonib05:feat/dependabot-auto-merge
Open

ci: add dependabot auto-merge workflow#2705
yonib05 wants to merge 6 commits into
strands-agents:mainfrom
yonib05:feat/dependabot-auto-merge

Conversation

@yonib05

@yonib05 yonib05 commented Jun 9, 2026

Copy link
Copy Markdown
Member

Summary

  • Add dependabot-auto-merge.yml: analyzes dependabot PRs with a read-only Strands agent, runs integration tests on safe Python/TypeScript updates, and enables auto-merge
  • Add parse_dependabot_title.py and sanitize_dependabot_body.py (with tests) to defend against prompt injection
  • Add reusable dependabot-integration-tests.yml routing by ecosystem (python/typescript)

Depends on strands-agents/devtools#61 (dependabot-analyze SOP, mode wiring, and sanitized_changelog agent-runner input). That PR must merge first since this workflow references the devtools actions at @main.

How it works

  1. On a dependabot PR, the title is parsed for structured metadata and the body is sanitized into an <untrusted-changelog> block
  2. The strands-command agent runs read-only: reads the diff, searches the repo for package usage, consumes the sanitized changelog, and posts an analysis comment with a verdict
  3. The verdict becomes a Dependabot Analysis check run
  4. safe Python/TypeScript updates → integration tests → auto-merge once branch protection is satisfied
  5. safe docs/actions updates → labeled dependabot-safe, left for manual merge (preview link review)
  6. Anything flagged → labeled dependabot-needs-review

Security

All external content (PR body, changelog) is sanitized and wrapped in <untrusted-changelog> tags before reaching the agent. The agent runs with write_permission: false. Upstream commit fetches are restricted to URL-validated GitHub sources.

Test plan

  • Merge devtools#61 first
  • Run /strands dependabot-analyze on an existing dependabot PR; confirm the analysis comment and check run
  • Confirm a safe Python/TS update flows to auto-merge after approval
  • Confirm a flagged update gets dependabot-needs-review

Comment thread .github/workflows/dependabot-auto-merge.yml Outdated
Comment thread .github/workflows/dependabot-auto-merge.yml Outdated
Comment thread .github/workflows/dependabot-auto-merge.yml Outdated
Comment thread .github/workflows/dependabot-auto-merge.yml Outdated
Comment thread .github/workflows/dependabot-auto-merge.yml
Comment thread .github/workflows/dependabot-integration-tests.yml
Comment thread .github/workflows/dependabot-integration-tests.yml
Comment thread .github/scripts/test_parse_dependabot_title.py Outdated
Comment thread .github/scripts/sanitize_dependabot_body.py
Comment thread .github/scripts/sanitize_dependabot_body.py Outdated
Comment thread .github/scripts/test_sanitize_dependabot_body.py Outdated
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Assessment: Comment

Well-designed automation workflow with strong security posture (sanitization pipeline, read-only agent, separated read/write phases). The defense-in-depth approach is solid.

Review Themes
  • Security hardening: A few areas could benefit from tighter controls — the verdict comment filter is broad (any bot/strands-matching user), the heredoc delimiter is predictable, and HTML entity encoding could bypass injection detection.
  • Shell injection surface: PR title is passed as a positional shell argument; piping via stdin (like the body sanitizer) would be more defensive.
  • Test infrastructure: Tests use relative paths and no CI job runs them — they should be wired into CI and use absolute paths.
  • Edge cases in conditions: The flag-for-review job condition and always() interactions need careful consideration for skipped-vs-failure job states.
  • Truncation boundary: The preamble is counted against the 2000-char limit meant for untrusted content.

The sanitization scripts are clean and well-documented. Good use of the monorepo structure for the integration test routing.

@yonib05

yonib05 commented Jun 9, 2026

Copy link
Copy Markdown
Member Author

Thanks for the review. Addressed in 978c477:

Critical (found in dependabot-analyze devtools PR): the input parser crashed under pull_request_target because it only honored explicit inputs for workflow_dispatch. Fixed in strands-agents/devtools#61 — it now uses explicit issue_id/command inputs whenever provided, falling back to the comment payload only for issue_comment events.

Sanitizer (line 87, 92): now decodes HTML entities before injection-pattern matching, and truncates the untrusted payload before prepending the preamble/wrapper (so the cap applies only to attacker-influenceable text and the closing tag can't be truncated away).

Shell arg (line 34): PR title is now piped via stdin instead of passed as a positional argument; the parser reads stdin when no argv is given.

Heredoc delimiter (line 44): now uses a random SANITIZED_$(openssl rand -hex 8) delimiter.

Verdict filter (line 111): restricted to comments authored by github-actions[bot] (the agent posts via GITHUB_TOKEN), preventing forged-verdict injection from arbitrary bot/strands-named accounts.

flag-for-review (line 177): condition now fires whenever a safe python/typescript verdict's integration tests did not success (covers failure, cancelled, and erroneously-skipped). Verified exhaustively that every verdict×ecosystem×result combo has exactly one terminal outcome.

pull_request_target code execution: added a same-repo head guard (head.repo.full_name == github.repository) to both integration-test jobs so fork heads can never run with the privileged test role.

Test paths: script tests now resolve the script via pathlib so they run from any CWD.

Left as-is (intentional): the Python/TypeScript jobs use different AWS role secrets to mirror the existing *-integration-test.yml workflows; the working-directory: strands-py monorepo checkout pattern; and the read-only analyze job permissions. Wiring the script tests into a CI job is a reasonable follow-up but out of scope here.

@yonib05 yonib05 force-pushed the feat/dependabot-auto-merge branch from 978c475 to 4c4d334 Compare June 9, 2026 23:50
@yonib05

yonib05 commented Jun 9, 2026

Copy link
Copy Markdown
Member Author

Removed the unit test files from the PR (force-pushed). They were run locally to validate the parser and sanitizer (15 passing, including HTML-entity injection and truncation-boundary cases) but are kept out of the PR per repo preference. The branch now contains only the two scripts and two workflows.

@github-actions github-actions Bot added size/m and removed size/l labels Jun 9, 2026
Comment thread .github/workflows/dependabot-auto-merge.yml Outdated
Comment thread .github/workflows/dependabot-auto-merge.yml
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Assessment: Comment

The author's fixes in 978c477 thoroughly address all prior review findings. One remaining functional issue: the if: always() on finalize causes downstream jobs (verdict, flag-for-review) to execute on every non-dependabot PR, potentially creating spurious check runs and labels.

Details
  • Job isolation: The prep job correctly guards against non-dependabot actors, but finalize (with always()) and verdict (unconditional) break that isolation — they run even when the pipeline was effectively no-op'd. Adding && needs.parse-input.result != 'skipped' to finalize would properly short-circuit the pipeline.
  • Test coverage: The stdin path used in production isn't exercised by the title-parser tests (they use argv). Minor gap but worth a test case.
  • Security fixes confirmed: HTML entity decoding, restricted verdict filter, random delimiter, same-repo head guards, and truncation ordering are all correctly implemented.

Strong security design overall — the defense-in-depth sanitization pipeline and privilege separation are well thought out.

Comment thread .github/workflows/dependabot-auto-merge.yml Outdated
Comment thread .github/workflows/dependabot-auto-merge.yml Outdated
Comment thread .github/workflows/dependabot-auto-merge.yml
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Assessment: Request Changes

The previous review findings were well-addressed (sanitizer hardening, stdin piping, random delimiter, restricted verdict filter, same-repo head guard). However, one critical issue remains with the always() conditions.

Review Details
  • Critical — Non-dependabot PR pollution: The workflow triggers on all pull_request_target events (no path/actor filter at workflow level). Jobs with if: always() (finalize, flag-for-review) and their dependents (verdict) will execute for every non-dependabot PR, creating spurious "Dependabot Analysis" check runs and adding dependabot-needs-review labels to unrelated PRs.
  • Important — Verdict collision: The verdict regex matches any {"verdict": "safe"} in any github-actions[bot] comment. Other workflows (e.g., /strands review) also post as github-actions[bot], creating potential for false positives.

The sanitization pipeline and security architecture are solid. The fix for the critical issue is straightforward — add github.event.pull_request.user.login == 'dependabot[bot]' to the always() job conditions.

Guard finalize against the skipped-pipeline cascade and require a
successful verdict job before flag-for-review can label a PR, so the
workflow no longer creates spurious check runs or labels on PRs from
other authors.
@yonib05

yonib05 commented Jun 9, 2026

Copy link
Copy Markdown
Member Author

Good catch on the non-dependabot cascade — fixed in dd9d9d6.

The prep job has the dependabot[bot] actor gate, but finalize's if: always() would have run on every PR (with prep/parse-input/analyze skipped), cascading into verdict and flag-for-review and producing a spurious action_required check run / dependabot-needs-review label on unrelated PRs.

Fixes:

  • finalize now guarded with if: always() && needs.parse-input.result != 'skipped', so it (and therefore verdict) does not run when the pipeline was skipped for a non-dependabot PR.
  • flag-for-review additionally requires needs.verdict.result == 'success', so it can never label a PR whose verdict job didn't actually run.

Verified exhaustively that (a) no terminal job fires for a non-dependabot PR, and (b) every dependabot verdict×ecosystem×test-result combination still resolves to exactly one terminal outcome.

Match only github-actions[bot] comments carrying a DEPENDABOT_VERDICT
marker so an unrelated agent comment (e.g. a prior /strands review)
cannot be misread as this PR's verdict.
@github-actions github-actions Bot added size/m and removed size/m labels Jun 10, 2026
@yonib05

yonib05 commented Jun 10, 2026

Copy link
Copy Markdown
Member Author

Addressed in 2458909 (+ devtools#61 @ 0d41152):

  • Non-dependabot cascade (the if: always() finalize → verdict → flag chain): fixed in dd9d9d6finalize is now guarded with needs.parse-input.result != 'skipped' and flag-for-review requires needs.verdict.result == 'success'. No terminal job fires for non-dependabot PRs (verified exhaustively).
  • Verdict shadowing by an unrelated agent comment (e.g. a prior /strands review): the SOP now emits a unique DEPENDABOT_VERDICT: {"verdict": "..."} marker line, and the workflow only matches github-actions[bot] comments that contain that marker (most recent wins). A /strands review comment without the marker can no longer be mistaken for a verdict; anything unmatched safely defaults to needs-review.

@github-actions

Copy link
Copy Markdown
Contributor

Assessment: Approve

All issues from prior review rounds have been properly addressed in dd9d9d6. The non-dependabot cascade is now correctly prevented by needs.parse-input.result != 'skipped', and flag-for-review additionally requires needs.verdict.result == 'success'. I verified exhaustively that no terminal job fires for non-dependabot PRs, and all dependabot verdict×ecosystem×test-result combinations resolve to exactly one terminal outcome.

Verified Fixes
  • Non-dependabot isolation: finalize correctly short-circuits when parse-input is skipped
  • Flag-for-review guard: Requires verdict.result == 'success' so it can't fire when verdict didn't run
  • Security pipeline: HTML entity decoding before injection matching, truncation before preamble, random heredoc delimiter, stdin piping, same-repo head guard, restricted verdict author filter

Clean, well-documented workflow with strong defense-in-depth. The pr_number input on the reusable workflow is declared but unused — presumably reserved for future metrics; not blocking.

@github-actions

Copy link
Copy Markdown
Contributor

Assessment: Approve

All previously raised issues are resolved. The latest commit (2458909) adds a DEPENDABOT_VERDICT: marker to the verdict extraction, fully addressing the last open concern about collision with unrelated github-actions[bot] comments.

Verification Summary
  • Non-dependabot isolation: finalize guarded by needs.parse-input.result != 'skipped' — no terminal job fires for non-dependabot PRs ✓
  • Verdict collision: Extraction now requires both github-actions[bot] authorship AND DEPENDABOT_VERDICT: marker in the comment body ✓
  • Flag-for-review: Additionally requires needs.verdict.result == 'success' — cannot fire when verdict was skipped ✓
  • Execution flow: Exhaustively verified all 5 scenario combinations (non-dependabot, safe/python/pass, safe/docs, needs-review, safe/python/fail) — each resolves to exactly one terminal outcome ✓

No new issues found. Ready to merge once the devtools#61 dependency lands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant