-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Epic: Worker Monitoring #608
Comments
This keeps coming up so I think we want to spend some time on it. I think there are two seperate but related big issues right now:
Some quick thoughts about possible performance bottlenecks:
|
What would be useful for debugging sometimes (ie right now) is to see a) the JSON sent to the worker for each run, and b) the compiled execution plan from that incoming run. It's expensive annoying and unreadable to just log the json of these data structures. Can we post them somewhere where they're easily accessible? This would also help us reproduce runs that are lost or broken, because we should be able to get exact input data |
An epic issue to have oversight over monitoring on the worker.
The high level brief is: we need better visibility of what's going on inside the worker, especially when things go wrong.
We should consider metrics tracking, sentry reporting, email notification, grafana, etc.
Related:
#603
#402
Things we want
We need to figure out the best approach for how to integrate this into prometheus, do we expose an aggregate http service (or use lightning for that) that collects up the metrics?
We probably don't want to use service discovery for monitoring? Do we?
There is an advantage of workers exposing their own
/metrics
server, makes the worker better for everyone.The text was updated successfully, but these errors were encountered: