ci: add dependabot auto-merge workflow by yonib05 · Pull Request #2705 · strands-agents/harness-sdk

yonib05 · 2026-06-09T23:32:07Z

Summary

Add dependabot-auto-merge.yml: analyzes dependabot PRs with a read-only Strands agent, runs integration tests on safe Python/TypeScript updates, and enables auto-merge
Add parse_dependabot_title.py and sanitize_dependabot_body.py (with tests) to defend against prompt injection
Add reusable dependabot-integration-tests.yml routing by ecosystem (python/typescript)

Depends on strands-agents/devtools#61 (dependabot-analyze SOP, mode wiring, and sanitized_changelog agent-runner input). That PR must merge first since this workflow references the devtools actions at @main.

How it works

On a dependabot PR, the title is parsed for structured metadata and the body is sanitized into an <untrusted-changelog> block
The strands-command agent runs read-only: reads the diff, searches the repo for package usage, consumes the sanitized changelog, and posts an analysis comment with a verdict
The verdict becomes a Dependabot Analysis check run
safe Python/TypeScript updates → integration tests → auto-merge once branch protection is satisfied
safe docs/actions updates → labeled dependabot-safe, left for manual merge (preview link review)
Anything flagged → labeled dependabot-needs-review

Security

All external content (PR body, changelog) is sanitized and wrapped in <untrusted-changelog> tags before reaching the agent. The agent runs with write_permission: false. Upstream commit fetches are restricted to URL-validated GitHub sources.

Test plan

Merge devtools#61 first
Run /strands dependabot-analyze on an existing dependabot PR; confirm the analysis comment and check run
Confirm a safe Python/TS update flows to auto-merge after approval
Confirm a flagged update gets dependabot-needs-review

github-actions · 2026-06-09T23:37:44Z

Assessment: Comment

Well-designed automation workflow with strong security posture (sanitization pipeline, read-only agent, separated read/write phases). The defense-in-depth approach is solid.

Review Themes

Security hardening: A few areas could benefit from tighter controls — the verdict comment filter is broad (any bot/strands-matching user), the heredoc delimiter is predictable, and HTML entity encoding could bypass injection detection.
Shell injection surface: PR title is passed as a positional shell argument; piping via stdin (like the body sanitizer) would be more defensive.
Test infrastructure: Tests use relative paths and no CI job runs them — they should be wired into CI and use absolute paths.
Edge cases in conditions: The flag-for-review job condition and always() interactions need careful consideration for skipped-vs-failure job states.
Truncation boundary: The preamble is counted against the 2000-char limit meant for untrusted content.

The sanitization scripts are clean and well-documented. Good use of the monorepo structure for the integration test routing.

yonib05 · 2026-06-09T23:49:42Z

Thanks for the review. Addressed in 978c477:

Critical (found in dependabot-analyze devtools PR): the input parser crashed under pull_request_target because it only honored explicit inputs for workflow_dispatch. Fixed in strands-agents/devtools#61 — it now uses explicit issue_id/command inputs whenever provided, falling back to the comment payload only for issue_comment events.

Sanitizer (line 87, 92): now decodes HTML entities before injection-pattern matching, and truncates the untrusted payload before prepending the preamble/wrapper (so the cap applies only to attacker-influenceable text and the closing tag can't be truncated away).

Shell arg (line 34): PR title is now piped via stdin instead of passed as a positional argument; the parser reads stdin when no argv is given.

Heredoc delimiter (line 44): now uses a random SANITIZED_$(openssl rand -hex 8) delimiter.

Verdict filter (line 111): restricted to comments authored by github-actions[bot] (the agent posts via GITHUB_TOKEN), preventing forged-verdict injection from arbitrary bot/strands-named accounts.

flag-for-review (line 177): condition now fires whenever a safe python/typescript verdict's integration tests did not success (covers failure, cancelled, and erroneously-skipped). Verified exhaustively that every verdict×ecosystem×result combo has exactly one terminal outcome.

pull_request_target code execution: added a same-repo head guard (head.repo.full_name == github.repository) to both integration-test jobs so fork heads can never run with the privileged test role.

Test paths: script tests now resolve the script via pathlib so they run from any CWD.

Left as-is (intentional): the Python/TypeScript jobs use different AWS role secrets to mirror the existing *-integration-test.yml workflows; the working-directory: strands-py monorepo checkout pattern; and the read-only analyze job permissions. Wiring the script tests into a CI job is a reasonable follow-up but out of scope here.

yonib05 · 2026-06-09T23:50:58Z

Removed the unit test files from the PR (force-pushed). They were run locally to validate the parser and sanitizer (15 passing, including HTML-entity injection and truncation-boundary cases) but are kept out of the PR per repo preference. The branch now contains only the two scripts and two workflows.

github-actions · 2026-06-09T23:54:44Z

Assessment: Comment

The author's fixes in 978c477 thoroughly address all prior review findings. One remaining functional issue: the if: always() on finalize causes downstream jobs (verdict, flag-for-review) to execute on every non-dependabot PR, potentially creating spurious check runs and labels.

Details

Job isolation: The prep job correctly guards against non-dependabot actors, but finalize (with always()) and verdict (unconditional) break that isolation — they run even when the pipeline was effectively no-op'd. Adding && needs.parse-input.result != 'skipped' to finalize would properly short-circuit the pipeline.
Test coverage: The stdin path used in production isn't exercised by the title-parser tests (they use argv). Minor gap but worth a test case.
Security fixes confirmed: HTML entity decoding, restricted verdict filter, random delimiter, same-repo head guards, and truncation ordering are all correctly implemented.

Strong security design overall — the defense-in-depth sanitization pipeline and privilege separation are well thought out.

github-actions · 2026-06-09T23:56:15Z

Assessment: Request Changes

The previous review findings were well-addressed (sanitizer hardening, stdin piping, random delimiter, restricted verdict filter, same-repo head guard). However, one critical issue remains with the always() conditions.

Review Details

Critical — Non-dependabot PR pollution: The workflow triggers on all pull_request_target events (no path/actor filter at workflow level). Jobs with if: always() (finalize, flag-for-review) and their dependents (verdict) will execute for every non-dependabot PR, creating spurious "Dependabot Analysis" check runs and adding dependabot-needs-review labels to unrelated PRs.
Important — Verdict collision: The verdict regex matches any {"verdict": "safe"} in any github-actions[bot] comment. Other workflows (e.g., /strands review) also post as github-actions[bot], creating potential for false positives.

The sanitization pipeline and security architecture are solid. The fix for the critical issue is straightforward — add github.event.pull_request.user.login == 'dependabot[bot]' to the always() job conditions.

Guard finalize against the skipped-pipeline cascade and require a successful verdict job before flag-for-review can label a PR, so the workflow no longer creates spurious check runs or labels on PRs from other authors.

yonib05 · 2026-06-09T23:57:46Z

Good catch on the non-dependabot cascade — fixed in dd9d9d6.

The prep job has the dependabot[bot] actor gate, but finalize's if: always() would have run on every PR (with prep/parse-input/analyze skipped), cascading into verdict and flag-for-review and producing a spurious action_required check run / dependabot-needs-review label on unrelated PRs.

Fixes:

finalize now guarded with if: always() && needs.parse-input.result != 'skipped', so it (and therefore verdict) does not run when the pipeline was skipped for a non-dependabot PR.
flag-for-review additionally requires needs.verdict.result == 'success', so it can never label a PR whose verdict job didn't actually run.

Verified exhaustively that (a) no terminal job fires for a non-dependabot PR, and (b) every dependabot verdict×ecosystem×test-result combination still resolves to exactly one terminal outcome.

Match only github-actions[bot] comments carrying a DEPENDABOT_VERDICT marker so an unrelated agent comment (e.g. a prior /strands review) cannot be misread as this PR's verdict.

yonib05 · 2026-06-10T00:00:46Z

Addressed in 2458909 (+ devtools#61 @ 0d41152):

Non-dependabot cascade (the if: always() finalize → verdict → flag chain): fixed in dd9d9d6 — finalize is now guarded with needs.parse-input.result != 'skipped' and flag-for-review requires needs.verdict.result == 'success'. No terminal job fires for non-dependabot PRs (verified exhaustively).
Verdict shadowing by an unrelated agent comment (e.g. a prior /strands review): the SOP now emits a unique DEPENDABOT_VERDICT: {"verdict": "..."} marker line, and the workflow only matches github-actions[bot] comments that contain that marker (most recent wins). A /strands review comment without the marker can no longer be mistaken for a verdict; anything unmatched safely defaults to needs-review.

github-actions · 2026-06-10T00:01:58Z

Assessment: Approve

All issues from prior review rounds have been properly addressed in dd9d9d6. The non-dependabot cascade is now correctly prevented by needs.parse-input.result != 'skipped', and flag-for-review additionally requires needs.verdict.result == 'success'. I verified exhaustively that no terminal job fires for non-dependabot PRs, and all dependabot verdict×ecosystem×test-result combinations resolve to exactly one terminal outcome.

Verified Fixes

Non-dependabot isolation: finalize correctly short-circuits when parse-input is skipped
Flag-for-review guard: Requires verdict.result == 'success' so it can't fire when verdict didn't run
Security pipeline: HTML entity decoding before injection matching, truncation before preamble, random heredoc delimiter, stdin piping, same-repo head guard, restricted verdict author filter

Clean, well-documented workflow with strong defense-in-depth. The pr_number input on the reusable workflow is declared but unused — presumably reserved for future metrics; not blocking.

github-actions · 2026-06-10T00:03:44Z

Assessment: Approve

All previously raised issues are resolved. The latest commit (2458909) adds a DEPENDABOT_VERDICT: marker to the verdict extraction, fully addressing the last open concern about collision with unrelated github-actions[bot] comments.

Verification Summary

Non-dependabot isolation: finalize guarded by needs.parse-input.result != 'skipped' — no terminal job fires for non-dependabot PRs ✓
Verdict collision: Extraction now requires both github-actions[bot] authorship AND DEPENDABOT_VERDICT: marker in the comment body ✓
Flag-for-review: Additionally requires needs.verdict.result == 'success' — cannot fire when verdict was skipped ✓
Execution flow: Exhaustively verified all 5 scenario combinations (non-dependabot, safe/python/pass, safe/docs, needs-review, safe/python/fail) — each resolves to exactly one terminal outcome ✓

No new issues found. Ready to merge once the devtools#61 dependency lands.

github-actions Bot added the size/l label Jun 9, 2026

yonib05 temporarily deployed to auto-approve June 9, 2026 23:32 — with GitHub Actions Inactive

github-actions Bot added the strands-running label Jun 9, 2026

github-actions Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread .github/workflows/dependabot-auto-merge.yml Outdated

github-actions Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread .github/workflows/dependabot-auto-merge.yml Outdated

github-actions Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread .github/workflows/dependabot-auto-merge.yml Outdated

github-actions Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread .github/workflows/dependabot-auto-merge.yml Outdated

github-actions Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread .github/workflows/dependabot-auto-merge.yml

github-actions Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread .github/workflows/dependabot-integration-tests.yml

github-actions Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread .github/workflows/dependabot-integration-tests.yml

github-actions Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread .github/scripts/test_parse_dependabot_title.py Outdated

github-actions Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread .github/scripts/sanitize_dependabot_body.py

github-actions Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread .github/scripts/sanitize_dependabot_body.py Outdated

github-actions Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread .github/scripts/test_sanitize_dependabot_body.py Outdated

github-actions Bot added size/l and removed strands-running size/l labels Jun 9, 2026

yonib05 temporarily deployed to auto-approve June 9, 2026 23:49 — with GitHub Actions Inactive

github-actions Bot added the strands-running label Jun 9, 2026

yonib05 added 4 commits June 9, 2026 19:50

ci: add dependabot PR title parser for auto-merge workflow

33b2444

ci: add dependabot PR body sanitizer for auto-merge workflow

60c16d5

ci: add dependabot auto-merge workflow

0c8877f

ci: add reusable dependabot integration-tests workflow

4c4d334

yonib05 force-pushed the feat/dependabot-auto-merge branch from 978c475 to 4c4d334 Compare June 9, 2026 23:50

github-actions Bot added size/m and removed size/l labels Jun 9, 2026

yonib05 temporarily deployed to auto-approve June 9, 2026 23:51 — with GitHub Actions Inactive

github-actions Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread .github/workflows/dependabot-auto-merge.yml Outdated

github-actions Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread .github/workflows/dependabot-auto-merge.yml

github-actions Bot removed the strands-running label Jun 9, 2026

github-actions Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread .github/workflows/dependabot-auto-merge.yml Outdated

github-actions Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread .github/workflows/dependabot-auto-merge.yml Outdated

github-actions Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread .github/workflows/dependabot-auto-merge.yml

ci: prevent dependabot pipeline from running on non-dependabot PRs

dd9d9d6

Guard finalize against the skipped-pipeline cascade and require a successful verdict job before flag-for-review can label a PR, so the workflow no longer creates spurious check runs or labels on PRs from other authors.

github-actions Bot added size/m and removed size/m labels Jun 9, 2026

yonib05 temporarily deployed to auto-approve June 9, 2026 23:57 — with GitHub Actions Inactive

github-actions Bot added the strands-running label Jun 9, 2026

ci: require explicit verdict marker in agent comment

2458909

Match only github-actions[bot] comments carrying a DEPENDABOT_VERDICT marker so an unrelated agent comment (e.g. a prior /strands review) cannot be misread as this PR's verdict.

github-actions Bot added size/m and removed size/m labels Jun 10, 2026

yonib05 temporarily deployed to auto-approve June 10, 2026 00:00 — with GitHub Actions Inactive

github-actions Bot removed the strands-running label Jun 10, 2026

Conversation

yonib05 commented Jun 9, 2026

Summary

How it works

Security

Test plan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 9, 2026

Uh oh!

yonib05 commented Jun 9, 2026

Uh oh!

yonib05 commented Jun 9, 2026

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 9, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 9, 2026

Uh oh!

yonib05 commented Jun 9, 2026

Uh oh!

yonib05 commented Jun 10, 2026

Uh oh!

github-actions Bot commented Jun 10, 2026

Uh oh!

github-actions Bot commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant