Setup monitoring of ECS nodes #104

hellais · 2024-09-25T12:35:28Z

Currently we don't have observability into the container host of the ECS cluster. Moreover we are only able to scrape aggregate metrics from the services that are behind the balancer, which means we end up with the metrics "flapping".

Ideally we would have a way of scraping metrics for the container host, but also the per-servicer docker containers.

In summary we would like to collect two classes of metrics:

Host container metrics (the ec2 nodes that run docker and we deploy docker containers to), using node_exporter
Docker container application metrics, which are exposed using the instrumentator and we would like to scrape independently per each host container

The text was updated successfully, but these errors were encountered:

hellais · 2024-09-25T12:36:08Z

Leaving here some links for things to look at:

LDiazN · 2025-01-23T12:38:20Z

I think this might be the way: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#ec2_sd_config

The problem I'm seeing right now is that the monitor server is not in AWS, so we have some issues with the connection between that server and the ec2 instances:

We have to set up IAM credentials for the server, as mentioned in the link above
The prometheus server needs a way to reach the ec2 instances (not the load balancer), but they're probably not open to internet traffic (and I don't think they should), what can we do about this?

LDiazN · 2025-01-29T11:24:58Z

To solve 2), I had some talk with @hellais , and we had the idea of using nginx as proxy. You send scrape requests to the nginx proxy and it knows to which host to send it based on the path of the request.

For example, if you want to request metrics from the node foo:

The prometheus server would send the request: get nginx.proxy.com/foo/metrics/
nginx would then perform the get foo/metrics using proxy_pass and forward it to the prometheus server

After testing this with a docker compose with nginx and two go servers, I think it's possible using the following nginx configuration:

events { }

http {
    server {
        listen 80;

        location ~ /([a-zA-Z0-9_\.]+)/([a-zA-Z0-9_]*) {
            proxy_pass http://$1:8080;
        }
    }
}

As a side note, you can't use the hostname of the docker service you want to reach, you have to use the local ip address

This PR adds the node exporter service to ECS cluster machines. This is necessary for #178 (and therefore for #104). Node exporter is installed with the user-data script that initializes cluster machines

This PR adds support for scraping application level metrics straight from the ECS nodes, without going through the load balancer, solving the flapping behavior described in #104 To achieve this we had to solve the following problems: - Reach the cluster nodes in AWS, this was solved by #182 - Discover ECS tasks with their corresponding port and IP address. This was a bit more tricky, we solved it by adding a cronjob that runs a python script that will request ECS data from AWS using the `boto3` client, and then storing that information into a [Prometheus file based discovery](https://prometheus.io/docs/guides/file-sd/) compatible file So, this PR will add: - A Python script to collect ECS tasks information from aws and writing it into a file - An Ansible configuration for creating and running this file with a cronjob - An update to the Nginx configuration used to proxy metrics requests from the monitoring host to the EC2 instances in AWS - Security groups configuration to allow traffic from the proxy host to the ECS cluster nodes - IAM credentials used for requesting tasks information from AWS This PR solves ooni/backend#937 and ooni/backend#938 and is related to #104

hellais self-assigned this Sep 25, 2024

hellais added the priority/medium Normal priority issue label Sep 25, 2024

DecFox self-assigned this Oct 7, 2024

hellais added the epic A large user story that needs to be broken down label Dec 9, 2024

hellais added this to Roadmap Jan 7, 2025

hellais moved this to Backlog in Roadmap Jan 13, 2025

hellais assigned LDiazN and unassigned hellais and DecFox Jan 22, 2025

hellais mentioned this issue Jan 22, 2025

Improve monitoring of ooniservices #157

Open

3 tasks

hellais moved this from Backlog to Sprint Backlog in Roadmap Jan 23, 2025

This was referenced Jan 27, 2025

Investigate quicker ways to deploy monitoring scripts ooni/backend#271

Closed

api: implement measurement collection endpoint in ooniprobe services ooni/backend#931

Open

hellais moved this from Sprint Backlog to Epic in Roadmap Jan 29, 2025

hellais added the funder/otffoss2025 label Jan 29, 2025

hellais moved this from 🏗 Planned to 🚀 In Progress in Roadmap Jan 31, 2025

LDiazN mentioned this issue Feb 6, 2025

Node exporter ecs monitoring #179

Merged

LDiazN mentioned this issue Feb 17, 2025

App level metrics monitoring with Prometheus #186

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setup monitoring of ECS nodes #104

Setup monitoring of ECS nodes #104

hellais commented Sep 25, 2024 •

edited

Loading

hellais commented Sep 25, 2024

LDiazN commented Jan 23, 2025

LDiazN commented Jan 29, 2025

Setup monitoring of ECS nodes #104

Setup monitoring of ECS nodes #104

Comments

hellais commented Sep 25, 2024 • edited Loading

hellais commented Sep 25, 2024

LDiazN commented Jan 23, 2025

LDiazN commented Jan 29, 2025

hellais commented Sep 25, 2024 •

edited

Loading