-
Notifications
You must be signed in to change notification settings - Fork 43
feat(crs): add tracking rules for new custom resources #359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Evgeniy Frolov <[email protected]>
Signed-off-by: Evgeniy Frolov <[email protected]>
@@ -48,6 +48,7 @@ rules: | |||
failed: | |||
- "False" | |||
# ArgoCD Application | |||
# https://github.com/argoproj/gitops-engine/blob/master/pkg/health/health.go#L12 | |||
- resourceGroup: "argoproj.io" | |||
resourceKind: "Application" | |||
jsonPath: "$.status.health.status" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You sure that's the best field? I also found this one, and these statuses do look much more clear and on point. But whatever you choose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion!
I looked into status.operationState.phase
, and while it indeed provides detailed insight into the current operation (e.g., sync or rollback), it’s not always present — it's only set during or immediately after an operation. Outside of that window, the field can be missing.
On the other hand, status.health.status
is always available and reflects the overall state of the application from Argo CD’s perspective — including conditions like Healthy
, Progressing
, Degraded
, and Missing
. It’s also the field used by the Argo CD UI to display application status.
So for general status tracking across all phases of the application lifecycle (even when no operation is ongoing), I think status.health.status
is the most reliable and consistent choice.
That said, using status.operationState.phase
as an additional rule could be a good enhancement in the future for more granular operation tracking.
Let me know what you think!
- "Suspended" | ||
failed: | ||
- "Degraded" | ||
- "Missing" | ||
- "Unknown" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why Unknown
here? I mentioned, that there should be as little as possible false positives in failed
. You sure Unknown
really indicates an error? Same with Missing, and even Degraded to some extent... This is why I don't like this status.health.status
field, very poor indicators, no clear meaning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the follow-up and I appreciate your concern around false positives.
You're absolutely right that the semantics of status.health.status
are a bit broad and can be vague. Here's how I approached it:
-
Unknown
— According to the Argo CD health source, this indicates a failure to assess the health, not necessarily a failure of the resource itself. So technically, it's not always an error, but from an automation or alerting perspective, it means we can't reliably say things are working — which may justify treating it as failed, depending on the use case. Still, I'm open to moving it out offailed
if we aim for minimal false positives. -
Missing
— This one is more concrete: it means the resource doesn't exist in the cluster. Unless the resource was intentionally deleted or part of a partial sync, this is likely a real problem — which is why I classified it underfailed
. -
Degraded
— Per the docs and code, this is used when health checks failed or the resource didn’t become healthy within a timeout. Again, this may or may not be fatal, but in practice, it's often a symptom of underlying issues (e.g. CrashLoopBackOff, failed rollout), so treating it asfailed
seems appropriate unless we're tuning for very optimistic reporting.
That said, I completely agree the health field is not perfect. It's a coarse indicator, and there's no strong signal for whether an app is actually broken or just not healthy yet. But it’s still the most consistently available field, which is why I chose to use it for now.
Let me know what you think — I can tune the mapping if we want to reduce the risk of false alarms.
No description provided.