Skip to content

feat(issues): add Kyverno as a source for /api/issues + MCP issues tool (T11)#723

Merged
nadaverell merged 7 commits into
mainfrom
feat/t11-kyverno-issues-source
May 18, 2026
Merged

feat(issues): add Kyverno as a source for /api/issues + MCP issues tool (T11)#723
nadaverell merged 7 commits into
mainfrom
feat/t11-kyverno-issues-source

Conversation

@nadaverell
Copy link
Copy Markdown
Contributor

@nadaverell nadaverell commented May 17, 2026

Wave 2 / Phase 4c. Adds Kyverno PolicyReport findings as a first-class source for /api/issues and the MCP issues tool, with explicit lifecycle signaling so an empty result is never ambiguous.

What lands

pkg/policyreports/index.goIndex over PolicyReport / ClusterPolicyReport CRs. Indexed by group/Kind/ns/name (group first because group AND namespace can each be empty independently — encoding group last leaves a cluster-scoped CRD key ambiguous with a namespaced core-group key under any 3-part parse). Group derived from scope.apiVersion AND results[].resources[].apiVersion so per-result resources targeting multiple subjects each get the right group.

Subject struct carries {Group, Kind, Namespace, Name} — Group propagates all the way through to Issue.Group via fromKyverno in internal/issues/issues.go.

internal/issues/ — new SourceKyverno constant, composer integration, Compose honors ?source=kyverno filter.

internal/server/issues_handler.go — REST /api/issues?source=kyverno returns Kyverno-only findings.

internal/mcp/tools.go::handleIssues — MCP issues tool accepts source=kyverno (and include_kyverno=true for additive opt-in).

Lifecycle signaling

Empty Kyverno results were silently empty whether Kyverno was unavailable, warmup-deferred, or genuinely had no findings — agents had no way to tell. Added k8s.KyvernoStatus enum (not_installed / deferred / warmup / ready) backed by an atomic.Value. Both REST and MCP emit meta.kyverno: "<status>" on the response only when the caller opted into Kyverno (source=kyverno or include_kyverno=true). Default responses skip the meta to keep payload lean.

Race protection: Reset vs Warmup

The Reset / Warmup interaction needed two rounds of hardening:

  1. kyvernoWarmupGen (atomic.Int64) bumped inside ResetPolicyReportIndex; all setDecision / publishReady writes inside the Warmup lambda check gen under policyReportMu before storing.
  2. Snapshot of (policyReportInit, kyvernoWarmupGen) happens UNDER policyReportMu BEFORE Do() — closes the window where the lambda could capture the post-Reset gen and still write OLD-cluster data with the NEW gen.

A -race torture test (4 resetters × 4 snapshotters × 500 iterations) plus a structural-invariant test pin the design.

Source filter contract

source=X filters to ONLY the listed sources. include_X=true adds X to the default problem + condition set without silencing. Docs in issues_handler.go, the MCP issues tool description, and the wantSource implementation comment all now spell this out — earlier wording (incl. the MCP jsonschema string "Sources are AND'd; opting one in does not silence the defaults") was misleading and asserted the opposite behavior.

Wire-surface alignment with T6

Mirrors PR #721's speculative-surface trim of pkg/resourcecontext/types.go — drops ContextRef.Reason/Source/Confidence, the RefReason/RefSource enums, ResourceContext.Truncated, and two unused OmittedReason values. T11 doesn't reference the deleted fields; this keeps the shared types in sync across the wave-2 PRs so they don't conflict on merge regardless of order. Retained OmittedCacheCold + OmittedNotInstalled per the explicit TODO(T10) in internal/k8s/policy_reports.go that plans to populate these when the diagnostic-tier Kyverno consumer arrives.

Verification

go vet ./... + go test -count=1 -race ./internal/k8s/... + full go test -count=1 ./... all green. Tests pin: group propagation from scope.apiVersion AND per-result resources[].apiVersion, Issue.Group via fromKyverno, status enum lifecycle, gen monotonicity, stale-gen write skip, Once/gen atomic snapshot under race.


Note

Medium Risk
Adds a new Kyverno-backed issues source and modifies PolicyReport warmup/reset concurrency with new lifecycle state tracking; correctness depends on atomic/generation logic and could affect issue completeness or stale state across cluster switches.

Overview
Adds Kyverno PolicyReport/ClusterPolicyReport findings as an opt-in kyverno source for the unified issues surfaces (/api/issues and the MCP issues tool), mapping fail/errorcritical, warnwarning, and omitting pass/skip, while propagating the subject’s API group to avoid kind collisions.

Introduces Kyverno index lifecycle reporting (not_installed/deferred/warmup/ready) and returns it as meta.kyverno only when Kyverno is requested (include_kyverno=true or source=kyverno), plus clarifies/enforces that source= is a filter (not additive) and adds include_kyverno for “defaults + kyverno” behavior.

Hardens PolicyReport warmup vs context-switch reset with generation-guarded atomic writes and adds extensive unit tests (issues composition, group keying/index enumeration, lifecycle transitions, and race-protection), along with documentation updates noting Kyverno support and issues mapping.

Reviewed by Cursor Bugbot for commit e92001b. Bugbot is set up for automated code reviews on this repo. Configure here.

@nadaverell nadaverell requested a review from hisco as a code owner May 17, 2026 07:51
@nadaverell nadaverell force-pushed the feat/t11-kyverno-issues-source branch from 4f1f131 to bbea180 Compare May 17, 2026 09:03
nadaverell added a commit that referenced this pull request May 17, 2026
…e in /api/issues meta (T11 follow-up)

Two reviewer findings on PR #723:

1. PolicyReport Subject dropped apiVersion/group, so Kyverno-sourced
   Issues lost Issue.Group. This broke RBAC consistency (the cluster-
   scoped SAR check uses {kind, group}) and let two CRDs with the same
   Kind across different groups collide on a single index entry.

   - Add Group to policyreports.Subject.
   - Switch the index key from audit.ResourceKey ("Kind/ns/name") to
     a group-prefixed "group/Kind/ns/name" form so distinct CRDs
     sharing a Kind don't collide. Group first because both group and
     namespace can be empty independently; encoding group last would
     leave cluster-scoped CRD keys ambiguous with namespaced core-group
     keys under any 3-part parse.
   - Derive Group from scope.apiVersion (scope-only path) and from
     results[].resources[].apiVersion (per-resource path).
   - FindingsFor now takes (group, kind, ns, name); only test callers
     touched (no production consumers besides All()).
   - Propagate Subject.Group → Issue.Group in fromKyverno.

2. /api/issues returned nil when the PolicyReport index was nil,
   collapsing four distinct states (not_installed / deferred /
   warmup / ready-but-empty) into one indistinguishable empty list.

   Option A: add k8s.GetKyvernoStatus() backed by an atomic that
   WarmupKyvernoPolicyReports records its decision into. The /api/issues
   handler and MCP issues tool emit meta.kyverno when the caller opted
   into Kyverno (include_kyverno=true or source=kyverno), so the SPA
   and agents can render the right copy ("Kyverno not installed" vs
   "Indexing in progress" vs "No violations" vs "Cluster too large —
   findings deferred"). ResetPolicyReportIndex clears the decision so
   context switches start a fresh detection pass.

Tests pin:
- Subject.Group population from scope.apiVersion + per-result apiVersion
  (TestBuildIndex_GroupFromScopeAPIVersion, _GroupFromResourceAPIVersion)
- Group-keyed CRD collisions don't drop findings
  (TestBuildIndex_GroupCollisionAcrossCRDs)
- Updated TestParseSubjectKey for 4-part keys (group/Kind/ns/name)
- fromKyverno wires Subject.Group → Issue.Group
  (TestCompose_KyvernoGroupPropagated)
- GetKyvernoStatus state machine across all four lifecycle states
  (TestGetKyvernoStatus_LifecycleTransitions)
- ResetPolicyReportIndex clears the decision atomic
  (TestResetPolicyReportIndex_ClearsKyvernoDecision)
- /api/issues emits meta.kyverno on opt-in, omits it on default request
  (TestIssuesHandler_KyvernoMetaEmittedOnOptIn,
   TestIssuesHandler_KyvernoMetaOmittedWhenNotRequested)
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 37c40a2. Configure here.

Comment thread internal/mcp/tools.go Outdated
nadaverell added a commit that referenced this pull request May 18, 2026
…ption

Bugbot on PR #723: the MCP `issues` tool description told agents to "pass
include_audit=true / include_events=true / include_kyverno=true", but the
issuesInput struct has NO such fields — only namespace, severity, source,
kind, since, limit, filter. Agents following the description sent JSON
keys that Go's decoder silently dropped, and the include_X opt-in path
never actually worked over MCP.

The previous round's docs clarification overreached: it described REST
/api/issues query params (where include_X DOES exist as boolean shortcuts)
as if they applied to the MCP input shape. They don't.

Fix is to the description, not the input. Document the MCP-only path:
to add an excluded source to defaults, list everything explicitly
(source=problem,condition,kyverno). Note that REST has the shortcut
flags so cross-surface readers aren't confused. No behavior change.
Surfaces Kyverno PolicyReport findings (read via pkg/policyreports.Index)
through the unified issues envelope. Default-off — matches the audit /
events opt-in pattern — to keep the default operator view focused on
actionable problems rather than best-practice / policy noise. Lift via
source=kyverno (or include_kyverno=true on the HTTP handler).

Severity mapping is by Kyverno result: fail/error -> critical, warn ->
warning, pass/skip omitted. The reporting-API `severity` field is
intentionally ignored — it is policy-author-set, inconsistent across
policies, and not aligned with the operator-actionable axis the Issue
list represents.

Index gains an All() iterator returning each indexed subject + findings
as defensive copies in stable alphabetical key order, so consumers can
enumerate without per-subject lookups and downstream output stays
deterministic.

Graceful when Kyverno is not installed (GetPolicyReportIndex() returns
nil — emits zero issues, no error).
…e in /api/issues meta (T11 follow-up)

Two reviewer findings on PR #723:

1. PolicyReport Subject dropped apiVersion/group, so Kyverno-sourced
   Issues lost Issue.Group. This broke RBAC consistency (the cluster-
   scoped SAR check uses {kind, group}) and let two CRDs with the same
   Kind across different groups collide on a single index entry.

   - Add Group to policyreports.Subject.
   - Switch the index key from audit.ResourceKey ("Kind/ns/name") to
     a group-prefixed "group/Kind/ns/name" form so distinct CRDs
     sharing a Kind don't collide. Group first because both group and
     namespace can be empty independently; encoding group last would
     leave cluster-scoped CRD keys ambiguous with namespaced core-group
     keys under any 3-part parse.
   - Derive Group from scope.apiVersion (scope-only path) and from
     results[].resources[].apiVersion (per-resource path).
   - FindingsFor now takes (group, kind, ns, name); only test callers
     touched (no production consumers besides All()).
   - Propagate Subject.Group → Issue.Group in fromKyverno.

2. /api/issues returned nil when the PolicyReport index was nil,
   collapsing four distinct states (not_installed / deferred /
   warmup / ready-but-empty) into one indistinguishable empty list.

   Option A: add k8s.GetKyvernoStatus() backed by an atomic that
   WarmupKyvernoPolicyReports records its decision into. The /api/issues
   handler and MCP issues tool emit meta.kyverno when the caller opted
   into Kyverno (include_kyverno=true or source=kyverno), so the SPA
   and agents can render the right copy ("Kyverno not installed" vs
   "Indexing in progress" vs "No violations" vs "Cluster too large —
   findings deferred"). ResetPolicyReportIndex clears the decision so
   context switches start a fresh detection pass.

Tests pin:
- Subject.Group population from scope.apiVersion + per-result apiVersion
  (TestBuildIndex_GroupFromScopeAPIVersion, _GroupFromResourceAPIVersion)
- Group-keyed CRD collisions don't drop findings
  (TestBuildIndex_GroupCollisionAcrossCRDs)
- Updated TestParseSubjectKey for 4-part keys (group/Kind/ns/name)
- fromKyverno wires Subject.Group → Issue.Group
  (TestCompose_KyvernoGroupPropagated)
- GetKyvernoStatus state machine across all four lifecycle states
  (TestGetKyvernoStatus_LifecycleTransitions)
- ResetPolicyReportIndex clears the decision atomic
  (TestResetPolicyReportIndex_ClearsKyvernoDecision)
- /api/issues emits meta.kyverno on opt-in, omits it on default request
  (TestIssuesHandler_KyvernoMetaEmittedOnOptIn,
   TestIssuesHandler_KyvernoMetaOmittedWhenNotRequested)
Reviewer's critical: an in-flight WarmupKyvernoPolicyReports goroutine
could stamp KyvernoStatusReady (or an early not_installed/deferred
decision) onto the atomic AFTER ResetPolicyReportIndex had wiped it for
a new cluster context. The decision writes lived outside the mutex, so:

  1. Old-cluster Warmup releases mutex with index stored (L233).
  2. Reset (new cluster) takes mutex, nils index, clears decision (L366).
  3. Old Warmup's pending L253 store fires, writing "ready" to the
     atomic — but the index is empty and the new cluster is still
     warming up.
  4. GetKyvernoStatus: index nil, decision "ready" → returns "ready"
     for the new cluster while it's actually empty.

Fix is a generation counter (`kyvernoWarmupGen atomic.Int64`). Each
Warmup lambda captures the gen at entry; all decision/index writes go
through setDecision / publishReady helpers that re-check gen under the
mutex before storing. Reset bumps the gen inside its critical section
(line ~395), so any stale Warmup completion finds a mismatch and skips
its writes entirely. Closes both interleaving windows (Reset-during-
Warmup-publish AND Warmup-completes-after-Reset).

Also:
- gofmt: re-align Filters{} and anonymous-struct literals in
  internal/server/issues_handler.go + issues_handler_test.go (longest
  key changed in this branch).
- StoreKyvernoIndexForTest: panic on wrong type instead of silently
  no-op'ing. Test-only hook; a wrong type is a test bug, not a runtime
  condition to swallow.

Tests: TestResetPolicyReportIndex_BumpsWarmupGen pins that Reset
advances gen monotonically (the invariant the race protection rests
on). Existing TestResetPolicyReportIndex_ClearsKyvernoDecision still
passes — the visible "decision empty after Reset" contract is unchanged.
…) to close race

Snapshot both policyReportInit and kyvernoWarmupGen under
policyReportMu BEFORE entering Do(), rather than capturing myGen
inside the lambda. Without this, Reset could interleave between the
caller's read of policyReportInit and the lambda's Load of
kyvernoWarmupGen — the lambda would capture the post-Reset gen,
making subsequent gen checks falsely succeed and stamp the previous
cluster's outcome onto the new cluster's atomics. 213fc8a only
protected the publish-step gen check; this closes the earlier
interleaving.

Add structural-invariant coverage:
- TestWarmupKyvernoPolicyReports_StaleGenSkipsWrites replicates the
  exact write protocol used by setDecision/publishReady and verifies
  a manually-bumped gen causes writes to be skipped.
- TestWarmupKyvernoPolicyReports_SnapshotsOnceAndGenAtomically spins
  concurrent snapshotters and resetters under -race and asserts the
  (once, gen) pair stays coherent — a regression that drops either
  field from the snapshot-mutex would surface here.
The handler/tool docs previously implied `source=kyverno` was an
additive opt-in ("lifts include_kyverno"), but the implementation
treats Sources as a filter — source=kyverno returns ONLY kyverno
rows, not defaults+kyverno. Same for source=audit/event.

This commit aligns the docs with the implementation across all
three doc sites:

- internal/server/issues_handler.go: rewrite the source= /
  include_X param block to call out filter-vs-additive explicitly,
  and add kyverno to the "loud sources, opt-in" preamble.
- internal/mcp/tools.go: rewrite the `issues` tool description and
  the `source` jsonschema, and fix the local comment near
  IncludeAudit/Events/Kyverno hydration to explain that the MCP
  source list doubles as both filter and collection trigger.
- internal/issues/issues.go: add a contract doc comment on
  wantSource so the implementation site documents the contract.

Also renames TestCompose_KyvernoSourceListOptsIn (misleading — the
contract is "list narrows but does not opt in") and tightens its
docstring. No behavior changes; existing tests pass.
…ption

Bugbot on PR #723: the MCP `issues` tool description told agents to "pass
include_audit=true / include_events=true / include_kyverno=true", but the
issuesInput struct has NO such fields — only namespace, severity, source,
kind, since, limit, filter. Agents following the description sent JSON
keys that Go's decoder silently dropped, and the include_X opt-in path
never actually worked over MCP.

The previous round's docs clarification overreached: it described REST
/api/issues query params (where include_X DOES exist as boolean shortcuts)
as if they applied to the MCP input shape. They don't.

Fix is to the description, not the input. Document the MCP-only path:
to add an excluded source to defaults, list everything explicitly
(source=problem,condition,kyverno). Note that REST has the shortcut
flags so cross-surface readers aren't confused. No behavior change.
@nadaverell nadaverell force-pushed the feat/t11-kyverno-issues-source branch from 61ee926 to e92001b Compare May 18, 2026 15:23
@nadaverell nadaverell merged commit 79a273a into main May 18, 2026
8 checks passed
@nadaverell nadaverell deleted the feat/t11-kyverno-issues-source branch May 18, 2026 19:34
nadaverell added a commit that referenced this pull request May 18, 2026
Two T6 fixes required after rebasing onto post-#723 (Kyverno) and
post-#726 (RBAC reverse-lookup + Crossplane) main:

- T11 (#723) threaded API group through the PolicyReport lookup,
  changing policyreports.Index.FindingsFor from (kind, ns, name) to
  (group, kind, ns, name). Updated resourcecontext.PolicyReportLookup
  interface, the build.go call site (passing ident.Group), the
  policyReportLookupAdapter in ai_handlers.go, and the test mock.
  Group threading is a strict improvement — two CRDs sharing
  kind+ns+name across API groups now get disjoint findings instead
  of one inheriting the other's.

- internal/server/server_smoke_test.go acquired a `rbacv1` import on
  main (#726) at the same line our `networkingv1` import lands on
  this branch. Conflict resolution: keep both, sorted.
nadaverell added a commit that referenced this pull request May 18, 2026
…vider

T11 (#723) added KyvernoFindings + KyvernoStatus to the issues.Provider
interface so the composer can route PolicyReport findings into the
unified issue stream and surface index-lifecycle status. Our test
fake didn't implement either, so summarycontext tests stopped
compiling after the rebase onto post-T11 main. Returning nil/"" is
correct for these tests: BuildIssueIndex doesn't read Kyverno
findings (kindFilter was dropped in the prior commit; the index
buckets problem+condition only), and KyvernoStatus is consumed by
the issues meta block — not by the index path under test.
nadaverell added a commit that referenced this pull request May 21, 2026
…vider

T11 (#723) added KyvernoFindings + KyvernoStatus to the issues.Provider
interface so the composer can route PolicyReport findings into the
unified issue stream and surface index-lifecycle status. Our test
fake didn't implement either, so summarycontext tests stopped
compiling after the rebase onto post-T11 main. Returning nil/"" is
correct for these tests: BuildIssueIndex doesn't read Kyverno
findings (kindFilter was dropped in the prior commit; the index
buckets problem+condition only), and KyvernoStatus is consumed by
the issues meta block — not by the index path under test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant