Skip to content

pkg/workflow: lazy-load embedded JSON datasets used only at compile-time paths#41587

Merged
pelikhan merged 7 commits into
mainfrom
copilot/deep-report-lazy-load-json-parses
Jun 26, 2026
Merged

pkg/workflow: lazy-load embedded JSON datasets used only at compile-time paths#41587
pelikhan merged 7 commits into
mainfrom
copilot/deep-report-lazy-load-json-parses

Conversation

Copilot AI commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

pkg/workflow eagerly unmarshaled three embedded JSON datasets in init(), so non-compile commands paid parse/map-build startup cost even when those datasets were never used. This change defers those costs until first access by converting each eager initializer to a sync.Once lazy accessor.

  • Domains dataset: remove import-time unmarshal

    • Replaced eager init() load of ecosystem_domains.json with sync.OnceValue (getLoadedEcosystemDomains).
    • Updated domain/ecosystem lookup code paths to read through the accessor instead of package-global eager state.
  • gh CLI permissions dataset: lazy compile on first script analysis

    • Replaced eager init() parse/compile of gh_cli_permissions.json with sync.OnceValue (getCompiledGHCLIPermissions).
    • Updated permission inference and write-command detection to fetch compiled data lazily.
  • GitHub tool→toolset mapping: lazy map initialization

    • Replaced eager init() load of github_tool_to_toolset.json with sync.OnceValue (getGitHubToolToToolsetMap).
    • Updated validation and tests to consume the lazy accessor.
  • Caller updates where direct map access existed

    • Adjusted code that iterated/read ecosystem and toolset maps to resolve via the new lazy accessors, preserving existing behavior while removing unconditional startup parsing.
var getCompiledGHCLIPermissions = sync.OnceValue(func() compiledGHCLIPermissions {
    var data ghCLIPermissionsData
    if err := json.Unmarshal(ghCLIPermissionsJSON, &data); err != nil {
        panic(fmt.Sprintf("failed to load gh CLI permissions from JSON: %v", err))
    }
    // ...build compiled maps/regex...
    return cp
})

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Lazy-load JSON parses in pkg/workflow to reduce startup cost pkg/workflow: lazy-load embedded JSON datasets used only at compile-time paths Jun 26, 2026
Copilot AI requested a review from pelikhan June 26, 2026 04:10
@pelikhan pelikhan marked this pull request as ready for review June 26, 2026 04:38
Copilot AI review requested due to automatic review settings June 26, 2026 04:38
@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

PR Code Quality Reviewer completed the code quality review.

@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Test Quality Sentinel completed test quality analysis.

Test Quality Sentinel already completed for PR #41587 in the prior run (workflow-run-id 28217486929). Comment and APPROVE review were successfully submitted. Both add_comment and submit_pull_request_review limits are exhausted (1/1 each used). Score: 100/100 — 3 behavioral-contract tests, no guideline violations.

@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #41587 does not have the 'implementation' label and has only 48 new lines of code in business logic directories (threshold: 100).

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces pkg/workflow startup work by deferring parsing/compiling of embedded JSON datasets until first use, using sync.OnceValue-backed accessors instead of eager init() initialization. This helps ensure non-compile code paths don’t pay JSON unmarshal / map-build costs when those datasets are unused.

Changes:

  • Lazy-load ecosystem_domains.json via getLoadedEcosystemDomains() and update callers to use the accessor.
  • Lazy-load/compile gh_cli_permissions.json via getCompiledGHCLIPermissions() and update permission inference/write-detection paths.
  • Lazy-load github_tool_to_toolset.json via getGitHubToolToToolsetMap() and update validation/tests to use the accessor.
Show a summary per file
File Description
pkg/workflow/tools_validation_github_toolsets.go Switch tool-to-toolset validation to use the lazy accessor (but currently loads before an early return).
pkg/workflow/network_firewall_validation.go Use lazy ecosystem-domain loading for ecosystem identifier validation.
pkg/workflow/github_tool_to_toolset.go Replace eager init() JSON unmarshal with a sync.OnceValue accessor.
pkg/workflow/github_tool_to_toolset_test.go Update tests to reference the lazy accessor instead of the removed global map.
pkg/workflow/gh_cli_permissions.go Replace eager permission dataset compilation with a sync.OnceValue accessor and update callers.
pkg/workflow/domains.go Replace eager ecosystem domains dataset unmarshal/sort with a sync.OnceValue accessor and update lookups.
pkg/workflow/awf_helpers.go Update workflow-call network script generation to read ecosystem data through the lazy accessor.

Review details

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 7/7 changed files
  • Comments generated: 1
  • Review effort level: Low

Comment on lines 15 to 19
githubToolToToolsetLog.Printf("Validating GitHub tools against toolsets: allowed_tools=%d, enabled_toolsets=%d", len(allowedTools), len(enabledToolsets))
toolToToolsetMap := getGitHubToolToToolsetMap()

if len(allowedTools) == 0 {
githubToolToToolsetLog.Print("No tools to validate, skipping")

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e340254 by keeping the len(allowedTools) == 0 early return ahead of getGitHubToolToToolsetMap(), so the lazy JSON load is still skipped on the no-tools path.

@github-actions github-actions Bot mentioned this pull request Jun 26, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Test Quality Sentinel 🧪 — PR #41587

Score: 100/100 ✅ Excellent

This PR makes 3 mechanical API-adaptation updates to existing tests. All modified tests continue to enforce strong behavioral contracts, correctly updated to call the new getGitHubToolToToolsetMap() lazy-loader instead of the removed package-level GitHubToolToToolsetMap variable.

📊 Metrics & Classification
Metric Value
Modified test files 1 (github_tool_to_toolset_test.go)
Modified test functions 3
New test functions 0
Design tests (behavioral) 3 (100%)
Implementation-only tests 0 (0%)
Tests with error/edge coverage 3 (100%)
Duplication clusters 0
Inflation ratio (test:prod lines) 5:9 ≈ 0.56 ✅
Mock library violations None ✅
Missing build tags None ✅
Test Function Classification Error Coverage Notes
TestGitHubToolToToolsetMap_Completeness ✅ Design (behavioral) ✅ Yes Verifies all expected toolset categories are represented in the map
TestGitHubToolToToolsetMap_IncludesDefaultGitHubTools ✅ Design (behavioral) ✅ Yes Verifies all DefaultReadOnlyGitHubTools constants have a toolset mapping
TestGitHubToolToToolsetMap_ConsistencyWithDocumentation ✅ Design (behavioral) ✅ Yes Verifies specific tool→toolset mappings against documented expectations
📐 Score Breakdown
Component Score Max
Behavioral Coverage (3/3 design tests) 40 40
Error/Edge Case Coverage (3/3 with error paths) 30 30
Low Duplication (0 duplicate clusters) 20 20
Proportional Growth (0.56 ratio, no inflation) 10 10
Total 100 100

Verdict: ✅ All 3 modified tests are behavioral-contract tests. The changes are minimal, correct API adaptations — GitHubToolToToolsetMap (eager init) → getGitHubToolToToolsetMap() (lazy sync.OnceValue). No new test coverage was needed since no new observable behavior was introduced. No guideline violations detected.

References:

🧪 Test quality analysis by Test Quality Sentinel · 62.6 AIC · ⌖ 11.5 AIC · ⊞ 8.4K ·

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 100/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%). All 3 modified tests enforce behavioral contracts; no guideline violations detected.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /zoom-out and /improve-codebase-architecture — no blocking issues; a few nits worth addressing.

📋 Key Themes & Highlights

Key Themes

  • Stale comment (domains.go:369): the // sorted once at init() time comment was not updated when init() was replaced by sync.OnceValue. Easy one-liner fix.
  • One inconsistent call site (domains.go:586): for ... range getLoadedEcosystemDomains() — all other nine call sites assign to a local variable first. Functionally identical but slightly inconsistent.
  • Accessor naming (github_tool_to_toolset.go:17): getGitHubToolToToolsetMap lacks the past-participle qualifier used by its two siblings (getLoadedEcosystemDomains, getCompiledGHCLIPermissions).

Positive Highlights

  • ✅ Clean use of sync.OnceValue — idiomatic Go 1.21+ lazy init that's simpler than the struct-embedded sync.Once pattern used elsewhere in the package.
  • ✅ Pre-sort behaviour correctly preserved inside the OnceValue closure.
  • ✅ The removal of the exported GitHubToolToToolsetMap variable is a good API surface reduction — no external callers exist.
  • ✅ All three init()-level panics are correctly preserved for embedded JSON failures.
  • ✅ Test updates are minimal and correct.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 62.2 AIC · ⌖ 10.4 AIC · ⊞ 6.5K

Comments that could not be inline-anchored

pkg/workflow/domains.go:369

[/zoom-out] Stale comment: still says init() time after switching to sync.OnceValue.\n\nDomains are now sorted lazily on first access, not at program startup. The comment should be updated to avoid confusing future readers.\n\n<details>\n<summary>💡 Suggested update</summary>\n\ngo\n// Return a copy to avoid external modification. The underlying list is already\n// sorted once at first load so no per-call sort.Strings is needed.\n\n\n</details>

pkg/workflow/domains.go:586

[/improve-codebase-architecture] Minor inconsistency: this site calls getLoadedEcosystemDomains() inline in the range expression, while every other call site in the PR assigns it to a local variable first.\n\nThe inline form is functionally correct (the sync.OnceValue result is cached), but assigning to a local variable makes the read-once intent explicit and matches the pattern used across the other six call sites.\n\n<details>\n<summary>💡 Suggested change</summary>\n\n```go\n// Befor…

pkg/workflow/github_tool_to_toolset.go:17

[/improve-codebase-architecture] Naming inconsistency across the three new lazy accessors: getLoadedEcosystemDomains and getCompiledGHCLIPermissions use past-participle qualifiers that signal lazy initialisation ("Loaded", "Compiled"), but getGitHubToolToToolsetMap has no such qualifier.\n\nNaming is minor, but aligning to the established convention (e.g. getLoadedGitHubToolToToolsetMap) makes all three accessors visually consistent and signals to readers that this variable is a laz…

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: lazy-load embedded JSON datasets

The sync.OnceValue refactor is structurally sound and the memoised accessors are correctly thread-safe. Two non-blocking observations.

🔍 Findings

Existing review thread (already noted): getGitHubToolToToolsetMap() is called before the len(allowedTools) == 0 early-return guard in validateGitHubToolsAgainstToolsetsCore, which triggers the lazy JSON parse even on the skip path — directly undermining the PR's goal for that code path. The fix is to move the call to after the guard.

New (low): stale test error messages — See inline comment on github_tool_to_toolset_test.go. Three t.Errorf strings still reference GitHubToolToToolsetMap by name after the rename. Non-blocking, but will waste debugging time when those tests fail.

No correctness, security, or data-loss issues found beyond the above.

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • proxy.golang.org

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "proxy.golang.org"

See Network Configuration for more information.

🔎 Code quality review by PR Code Quality Reviewer · 140 AIC · ⌖ 6.65 AIC · ⊞ 5.2K

for _, tool := range constants.DefaultReadOnlyGitHubTools {
if _, exists := GitHubToolToToolsetMap[tool]; !exists {
if _, exists := toolToToolsetMap[tool]; !exists {
t.Errorf("Expected tool %q from constants.DefaultReadOnlyGitHubTools to be in GitHubToolToToolsetMap", tool)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale error messages will mislead debuggers: three t.Errorf strings still reference GitHubToolToToolsetMap (lines 279, 288, 322), a symbol deleted by this PR — developers hitting a test failure will search for it and find nothing.

💡 Suggested fix

Update the three error strings to reference the new accessor:

// line ~279
t.Errorf("Expected to find tools for toolset %q in getGitHubToolToToolsetMap()", expectedToolset)

// line ~288
t.Errorf("Expected tool %q from constants.DefaultReadOnlyGitHubTools to be in getGitHubToolToToolsetMap()", tool)

// line ~322
t.Errorf("Expected tool %q to be in getGitHubToolToToolsetMap()", tool)

All three test functions updated their lookup call site but left the diagnostic string unchanged, so only half the rename landed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e340254 by updating the three stale test diagnostics in pkg/workflow/github_tool_to_toolset_test.go to reference getGitHubToolToToolsetMap().

@pelikhan

Copy link
Copy Markdown
Collaborator

@copilot run pr-finisher skill

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

@copilot run pr-finisher skill

Done in e340254. I ran the PR-finisher flow, addressed the unresolved review feedback, and revalidated locally with make fmt, make lint, make test-unit, make test, and make recompile.

@pelikhan

Copy link
Copy Markdown
Collaborator

Copilot AI and others added 2 commits June 26, 2026 05:24
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

@copilot fix lint go

https://github.com/github/gh-aw/actions/runs/28218323419/job/83594453346

Fixed in 01f01d8. I updated the lazy JSON loaders to avoid panic-based initialization, propagated gh CLI loader errors through the compiler call sites, and revalidated with make golint-custom (using the CI flags), make agent-report-progress, and go test ./pkg/workflow.

@pelikhan pelikhan merged commit 6132c32 into main Jun 26, 2026
29 checks passed
@pelikhan pelikhan deleted the copilot/deep-report-lazy-load-json-parses branch June 26, 2026 06:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants