Add HPA diagnosis insights#916
Merged
Merged
Conversation
cb0d80a to
2db4ac8
Compare
347883b to
255cb83
Compare
255cb83 to
32c9045
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 4801a0b. Configure here.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
HPAs can fail quietly: the target workload may still have healthy-looking pods while autoscaling is capped, unable to read metrics, pinned by configuration, or paused at zero replicas. This PR makes HPA diagnosis a first-class Radar insight so operators can understand autoscaling state directly from Radar instead of reconstructing it from raw HPA YAML and conditions.
The feature is deliberately conservative about where it creates noise. Broad scan surfaces only promote high-signal autoscaling failures, while detail drawers keep the richer context for states like partial metric gaps, min-bound scaling, stale status, pinned replica bounds, and stabilization windows.
What Changed
Shared HPA Diagnosis Engine
pkg/hpadiag, a shared analyzer for autoscaling/v2 HPAs.Signal Policy
Maxednow requires controller evidence:ScalingLimited=TruewithTooManyReplicas.current == desired == maxReplicasis treated as normal unless Kubernetes says it wanted more replicas and was capped.ScalingActive=Falseis classified as metrics unavailable unless it is the intentional zero-replicaScalingDisabledcase.AbleToScale=Falseis classified separately as unable to scale.Backend Surfaces
hpaDiagnosisfor HPA resources.Frontend / UX
hpa/name, with wrapping only at the separator when space is tight.Shared UI / Types
@skyhook-io/k8s-ui.resource-utils-hpafor table-state classification, label/tone mapping, and status badge generation.Reviewer Focus
pkg/hpadiag: whether each condition/state maps to the right Radar severity and surface.metrics_incomplete,limited_min,stale, orstabilizedinto table warnings.hpaDiagnosison resource detail responses is the right contract for Radar app consumers.Testing
Automated:
go test ./hpadiag ./resourcecontext ./ai/contextfrompkg/go test ./...frompkg/go test ./internal/k8s ./internal/servermake testmake tscnpm --workspace @skyhook-io/k8s-ui run tscnpm --workspace @skyhook-io/k8s-ui test -- --run src/components/resources/renderers/HPARenderer.test.tsx src/components/resources/renderers/WorkloadRenderer.test.tsx src/components/resources/resource-utils-hpa.test.tsnpm --workspace @skyhook-io/k8s-ui test -- --run src/components/resources/renderers/WorkloadRenderer.test.tsxmake buildLive visual test:
kind-radar-gitops-demoradar-hpa-visual-testhpa-vt-disabled,hpa-vt-maxed,hpa-vt-metrics-incomplete,hpa-vt-metrics-unavailable,hpa-vt-min-limited,hpa-vt-pinned,hpa-vt-scaling-up,hpa-vt-stabilized,hpa-vt-stable,hpa-vt-stale,hpa-vt-unable-to-scaleScreens covered:
Maxed,Metrics unavailable,Unable to scale,Disabled, andPinnedsurfaced; min-bound, stale, stabilized, stable, and scaling-up fixtures stayed quiet unless scan-worthy.ScalingLimited / TooManyReplicasevidence, CPU metric row, and amber condition rendering.unknown status_onlymetric row, and onlyScalingActivecounted as failing.Deployment/hpa-vt-maxed: verified compact HPA autoscaler context and disabled manual Scale action.Deployment/hpa-vt-metrics-incomplete: verified compact missing-metrics copy, inlinehpa/namecontroller badge, and word-boundary wrapping.Deployment/hpa-vt-pinned: verified compactFixed at 5 replicascopy, inlinehpa/namecontroller badge, and disabled manual Scale action.Tooltiprendering withrole="tooltip", explanatoryaria-label, and no nativetitleon the Scale button.Not covered by live visual test:
Notes / Tradeoffs
Note
Medium Risk
Changes when HPAs are flagged as maxed in problem detection (fewer false positives) and adds new optional API fields consumed by UI; scope is autoscaling/observability rather than auth or data paths, with broad fixture and test coverage.
Overview
Introduces
pkg/hpadiagas the single place that interprets autoscaling/v2 HPAs (state, summary, bounds, metrics, condition-backed reasons), and wires it through backend detection, resource/AI context, and the k8s-ui drawers.Detection policy tightens: “maxed” problems now require controller evidence (
ScalingLimited=True/TooManyReplicas), not merelycurrent == desired == maxReplicas. Metrics and scale failures still surface as separate cannot-scale issues; min-bound, stale, stabilized, and pinned cases stay out of broad scan noise.API & context: HPA GET responses optionally include
hpaDiagnosisonResourceWithRelationships; resource context gainshpaSummary; AI summary/minify uses the same analyzer for HPA issue text.UI: HPA detail gets a Diagnosis section (replacing ad-hoc condition heuristics); HPA list status uses conservative table classification; workload views show compact inline autoscaler diagnosis when scale is HPA-blocked and fetch sibling HPAs for that context. Shared types, condition warning tones, and layout tweaks support the new surfaces.
Reviewed by Cursor Bugbot for commit 2b353cf. Bugbot is set up for automated code reviews on this repo. Configure here.