-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GSoC Project Proposal]: Prometheus server- graphs and monitoring for ERDDAP™ #72
Comments
I'm running a grafana monitoring stack for the MBON dashboard server. I've read about prometheus but haven't tried it. Could be a great solution for improved ERDDAP monitoring and reporting. |
Is there an achitecture in mind for this? Will we have an erddap-monitoring module in the current repository or something like a new repository for everything related to monitoring? And since this example server will be containerized and runnable through docker, we use the existing image for prometheus on docker hub or do we get the entire prometheus server binary and configure that to be run through docker? |
@ayushsingh01042003 There are two main use cases I see for this. The more I think about it, the more I think they are distinct enough that it will require 2 separate configs ( and ways of running).
|
|
Yes, sorry my language was unspecific. We should be able to use the official Prometheus image and provide configuration for it. As for single or multiple ERDDAP™ instances. I think there are going to be different data/graphs that are important for the different audiences. For example for monitoring the fleet of ERDDAP™'s, I'd love to be able to see what percent have new feature flags turned on. For monitoring a single ERDDAP™ server (most admins are in this situation) they likely don't care at all to see the feature flag metrics. They likely care about how much traffic each dataset id is getting, but for monitoring many ERDDAP™s I'd want to see how much traffic dataset types and protocols are getting, not individual dataset ids. There's different needs for monitoring ERDDAP™ overall vs a single ERDDAP™ instance, so we're going to need 2 configs for Prometheus. |
Understood, thanks for clearing that up. |
Hi @ChrisJohnNOAA, I’m Jiahui Hu, a Master’s student in Computer Software Engineering at Northeastern University (Seattle). I’m very interested in contributing to this Prometheus Monitoring for ERDDAP™ project for GSoC. I have experience with Java, Docker, and system monitoring. During my software engineering internship, I configured Prometheus and Grafana to track system performance, reducing anomaly detection time by 50%. I also optimized Kafka-based data pipelines, cutting latency by 35%, and worked on API traffic management using AWS API Gateway, improving request handling by 20%. These experiences align well with setting up a Prometheus server, defining useful metrics, and containerizing the deployment for ERDDAP™. I reviewed ERDDAP’s existing metrics and saw that while many Prometheus metrics are available, areas like query response times, memory usage, and data request patterns could provide deeper insights. My plan is to integrate custom Prometheus metrics within ERDDAP™ to give admins better visibility into performance and bottlenecks. Would it be beneficial to include automated alerts for key performance thresholds, or is the focus primarily on visualization? Also, are there specific performance bottlenecks in ERDDAP™ that need more attention? Looking forward to your thoughts! Best, |
Hi @lareinahu-2023, thanks for your interest. Where did you do your internship? I recently added some additional Prometheus metrics which includes query response times and request patterns. The main file for the Prometheus metrics is here: https://github.com/ERDDAP/erddap/blob/main/WEB-INF/classes/gov/noaa/pfel/erddap/util/Metrics.java I agree there are more metrics that would be beneficial to add, even after those changes. As for automated alerts, they may be useful, in particular for individual server admins. Nobody has requested alerts (there have been many requests for visualization, particularly for things like what datasets are popular and how much traffic), but I do imagine some admins would appreciate having alerts. |
Hi @ChrisJohnNOAA, After reviewing the Prometheus metrics you've already implemented, I have a few clarification questions:
Thanks again for your guidance. I'm excited about the possibility of contributing to this project! |
@ChrisJohnNOAA I had a similar question, for the Prometheus config used by administrators, I understand that metrics like jvm, datset traffic and my guess is few but not all metrics from the status page are going to be implemented. However for the core team configured Prometheus are there some other metrics in mind apart from the percentage of certain tags being used across the fleet of servers, that you or the other members of the core team have in mind now? |
@ayushsingh01042003 My thoughts on metrics for the core team are in point 3 above. Several of the metrics for the status page are in the new Metrics. We could add others, but need to be careful about cardinality explosion for some of them. |
@ChrisJohnNOAA |
Hi @ChrisJohnNOAA, I'm eager to refine this proposal based on your expertise and project vision. I'm flexible and open to adjusting the scope and focus based on your guidance. |
@lareinahu-2023 Why do you recommend using Micrometer? What benefit would it provide ERDDAP™? Many of the metrics you mention adding are already collected in Metrics.java. For example JVM metrics are included through JvmMetrics. In early February I also added a number of additional metrics which include feature flag state, request information (including file format, protocol, response times, and more - you can see the metrics for our canary server here). There may be other metrics that would be useful to collect, but I don't want to duplicate metrics we already have. Something I don't see called out in the proposal is that we likely need two different configurations for dashboards. One for the ERDDAP™ team to better understand usage across servers from different organizations and the other for administrators running one to a small handful of servers to monitor their server(s). |
Dear [Chris John], I hope you’re doing well. My name is Mohamed Shehata, and I am from Egypt. As a student at the Faculty of Science, Menoufia University, I have developed a strong foundation in Java, which is the primary language used in my studies. My passion for Java, along with the certification I obtained and my experience with Docker, has driven me to explore Prometheus in depth. After conducting thorough research on Prometheus and its integration with monitoring systems, I believe I am well-suited for this project. I am eager to contribute and further enhance my skills while making a meaningful impact. This opportunity aligns perfectly with both my academic and professional aspirations, and I am excited about the possibility of joining the project. I would love to discuss how I can contribute effectively. Looking forward to your thoughts! Best regards, |
Hi @ChrisJohnNOAA, |
Hi @ChrisJohnNOAA , |
Project Description
We recently started adding Prometheus metrics to ERDDAP™. The main goal of this project is to build an example Prometheus Server which can monitor one or more ERDDAP™ instances. This may involve adding new metrics to the ERDDAP™ project which is where Java would be used.
Expected Outcomes
A Prometheus Server configuration that is runnable through Docker and can be used to monitor one or more ERDDAP™ instances. This will help ERDDAP™ admin's monitor their servers and provide usage insight that can help guide future ERDDAP™ development.
Skills Required
Java, Prometheus, YML, Docker
Additional Background/Issues
The main ERDDAP™ repo is here.
Mentor(s)
Chris John (@ChrisJohnNOAA) [email protected]
Expected Project Size
175 hours
Project Difficulty
Intermediate
The text was updated successfully, but these errors were encountered: