pkg/workflow: lazy-load embedded JSON datasets used only at compile-time paths#41587
Conversation
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
|
✅ PR Code Quality Reviewer completed the code quality review. |
|
✅ Test Quality Sentinel completed test quality analysis. Test Quality Sentinel already completed for PR #41587 in the prior run (workflow-run-id 28217486929). Comment and APPROVE review were successfully submitted. Both add_comment and submit_pull_request_review limits are exhausted (1/1 each used). Score: 100/100 — 3 behavioral-contract tests, no guideline violations. |
|
🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅ |
|
✅ Design Decision Gate 🏗️ completed the design decision gate check. No ADR enforcement needed: PR #41587 does not have the 'implementation' label and has only 48 new lines of code in business logic directories (threshold: 100). |
There was a problem hiding this comment.
Pull request overview
This PR reduces pkg/workflow startup work by deferring parsing/compiling of embedded JSON datasets until first use, using sync.OnceValue-backed accessors instead of eager init() initialization. This helps ensure non-compile code paths don’t pay JSON unmarshal / map-build costs when those datasets are unused.
Changes:
- Lazy-load
ecosystem_domains.jsonviagetLoadedEcosystemDomains()and update callers to use the accessor. - Lazy-load/compile
gh_cli_permissions.jsonviagetCompiledGHCLIPermissions()and update permission inference/write-detection paths. - Lazy-load
github_tool_to_toolset.jsonviagetGitHubToolToToolsetMap()and update validation/tests to use the accessor.
Show a summary per file
| File | Description |
|---|---|
| pkg/workflow/tools_validation_github_toolsets.go | Switch tool-to-toolset validation to use the lazy accessor (but currently loads before an early return). |
| pkg/workflow/network_firewall_validation.go | Use lazy ecosystem-domain loading for ecosystem identifier validation. |
| pkg/workflow/github_tool_to_toolset.go | Replace eager init() JSON unmarshal with a sync.OnceValue accessor. |
| pkg/workflow/github_tool_to_toolset_test.go | Update tests to reference the lazy accessor instead of the removed global map. |
| pkg/workflow/gh_cli_permissions.go | Replace eager permission dataset compilation with a sync.OnceValue accessor and update callers. |
| pkg/workflow/domains.go | Replace eager ecosystem domains dataset unmarshal/sort with a sync.OnceValue accessor and update lookups. |
| pkg/workflow/awf_helpers.go | Update workflow-call network script generation to read ecosystem data through the lazy accessor. |
Review details
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 7/7 changed files
- Comments generated: 1
- Review effort level: Low
| githubToolToToolsetLog.Printf("Validating GitHub tools against toolsets: allowed_tools=%d, enabled_toolsets=%d", len(allowedTools), len(enabledToolsets)) | ||
| toolToToolsetMap := getGitHubToolToToolsetMap() | ||
|
|
||
| if len(allowedTools) == 0 { | ||
| githubToolToToolsetLog.Print("No tools to validate, skipping") |
There was a problem hiding this comment.
Fixed in e340254 by keeping the len(allowedTools) == 0 early return ahead of getGitHubToolToToolsetMap(), so the lazy JSON load is still skipped on the no-tools path.
Test Quality Sentinel 🧪 — PR #41587Score: 100/100 ✅ Excellent This PR makes 3 mechanical API-adaptation updates to existing tests. All modified tests continue to enforce strong behavioral contracts, correctly updated to call the new 📊 Metrics & Classification
📐 Score Breakdown
Verdict: ✅ All 3 modified tests are behavioral-contract tests. The changes are minimal, correct API adaptations — References:
|
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /zoom-out and /improve-codebase-architecture — no blocking issues; a few nits worth addressing.
📋 Key Themes & Highlights
Key Themes
- Stale comment (
domains.go:369): the// sorted once at init() timecomment was not updated wheninit()was replaced bysync.OnceValue. Easy one-liner fix. - One inconsistent call site (
domains.go:586):for ... range getLoadedEcosystemDomains()— all other nine call sites assign to a local variable first. Functionally identical but slightly inconsistent. - Accessor naming (
github_tool_to_toolset.go:17):getGitHubToolToToolsetMaplacks the past-participle qualifier used by its two siblings (getLoadedEcosystemDomains,getCompiledGHCLIPermissions).
Positive Highlights
- ✅ Clean use of
sync.OnceValue— idiomatic Go 1.21+ lazy init that's simpler than the struct-embeddedsync.Oncepattern used elsewhere in the package. - ✅ Pre-sort behaviour correctly preserved inside the
OnceValueclosure. - ✅ The removal of the exported
GitHubToolToToolsetMapvariable is a good API surface reduction — no external callers exist. - ✅ All three
init()-level panics are correctly preserved for embedded JSON failures. - ✅ Test updates are minimal and correct.
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 62.2 AIC · ⌖ 10.4 AIC · ⊞ 6.5K
Comments that could not be inline-anchored
pkg/workflow/domains.go:369
[/zoom-out] Stale comment: still says init() time after switching to sync.OnceValue.\n\nDomains are now sorted lazily on first access, not at program startup. The comment should be updated to avoid confusing future readers.\n\n<details>\n<summary>💡 Suggested update</summary>\n\ngo\n// Return a copy to avoid external modification. The underlying list is already\n// sorted once at first load so no per-call sort.Strings is needed.\n\n\n</details>
pkg/workflow/domains.go:586
[/improve-codebase-architecture] Minor inconsistency: this site calls getLoadedEcosystemDomains() inline in the range expression, while every other call site in the PR assigns it to a local variable first.\n\nThe inline form is functionally correct (the sync.OnceValue result is cached), but assigning to a local variable makes the read-once intent explicit and matches the pattern used across the other six call sites.\n\n<details>\n<summary>💡 Suggested change</summary>\n\n```go\n// Befor…
pkg/workflow/github_tool_to_toolset.go:17
[/improve-codebase-architecture] Naming inconsistency across the three new lazy accessors: getLoadedEcosystemDomains and getCompiledGHCLIPermissions use past-participle qualifiers that signal lazy initialisation ("Loaded", "Compiled"), but getGitHubToolToToolsetMap has no such qualifier.\n\nNaming is minor, but aligning to the established convention (e.g. getLoadedGitHubToolToToolsetMap) makes all three accessors visually consistent and signals to readers that this variable is a laz…
There was a problem hiding this comment.
Review: lazy-load embedded JSON datasets
The sync.OnceValue refactor is structurally sound and the memoised accessors are correctly thread-safe. Two non-blocking observations.
🔍 Findings
Existing review thread (already noted): getGitHubToolToToolsetMap() is called before the len(allowedTools) == 0 early-return guard in validateGitHubToolsAgainstToolsetsCore, which triggers the lazy JSON parse even on the skip path — directly undermining the PR's goal for that code path. The fix is to move the call to after the guard.
New (low): stale test error messages — See inline comment on github_tool_to_toolset_test.go. Three t.Errorf strings still reference GitHubToolToToolsetMap by name after the rename. Non-blocking, but will waste debugging time when those tests fail.
No correctness, security, or data-loss issues found beyond the above.
Warning
Firewall blocked 1 domain
The following domain was blocked by the firewall during workflow execution:
proxy.golang.org
To allow these domains, add them to the
network.allowedlist in your workflow frontmatter:
network:
allowed:
- defaults
- "proxy.golang.org"See Network Configuration for more information.
🔎 Code quality review by PR Code Quality Reviewer · 140 AIC · ⌖ 6.65 AIC · ⊞ 5.2K
| for _, tool := range constants.DefaultReadOnlyGitHubTools { | ||
| if _, exists := GitHubToolToToolsetMap[tool]; !exists { | ||
| if _, exists := toolToToolsetMap[tool]; !exists { | ||
| t.Errorf("Expected tool %q from constants.DefaultReadOnlyGitHubTools to be in GitHubToolToToolsetMap", tool) |
There was a problem hiding this comment.
Stale error messages will mislead debuggers: three t.Errorf strings still reference GitHubToolToToolsetMap (lines 279, 288, 322), a symbol deleted by this PR — developers hitting a test failure will search for it and find nothing.
💡 Suggested fix
Update the three error strings to reference the new accessor:
// line ~279
t.Errorf("Expected to find tools for toolset %q in getGitHubToolToToolsetMap()", expectedToolset)
// line ~288
t.Errorf("Expected tool %q from constants.DefaultReadOnlyGitHubTools to be in getGitHubToolToToolsetMap()", tool)
// line ~322
t.Errorf("Expected tool %q to be in getGitHubToolToToolsetMap()", tool)All three test functions updated their lookup call site but left the diagnostic string unchanged, so only half the rename landed.
There was a problem hiding this comment.
Fixed in e340254 by updating the three stale test diagnostics in pkg/workflow/github_tool_to_toolset_test.go to reference getGitHubToolToToolsetMap().
|
@copilot run pr-finisher skill |
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Fixed in 01f01d8. I updated the lazy JSON loaders to avoid panic-based initialization, propagated gh CLI loader errors through the compiler call sites, and revalidated with |
pkg/workfloweagerly unmarshaled three embedded JSON datasets ininit(), so non-compile commands paid parse/map-build startup cost even when those datasets were never used. This change defers those costs until first access by converting each eager initializer to async.Oncelazy accessor.Domains dataset: remove import-time unmarshal
init()load ofecosystem_domains.jsonwithsync.OnceValue(getLoadedEcosystemDomains).gh CLI permissions dataset: lazy compile on first script analysis
init()parse/compile ofgh_cli_permissions.jsonwithsync.OnceValue(getCompiledGHCLIPermissions).GitHub tool→toolset mapping: lazy map initialization
init()load ofgithub_tool_to_toolset.jsonwithsync.OnceValue(getGitHubToolToToolsetMap).Caller updates where direct map access existed