This Terraform script creates following Alarms to tagged EC2 instances.
- CPUUtilization
- mem_used_percent
- disk_used_percent
-
Proper AWS CLI permissions to run Terraform (EC2, CloudWatch, SNS)
-
A bunch of running EC2 instances with at least following tags
- Name:Value (Used to name the CloudWatch Metrics)
- Stack:Value (Used to fetch instance list)
-
CloudWatch Agent Must be running on all machines to fetch metrics like mem_used_percent and disk dimensions.
-
Besides Terraform,
jqandaws cliutilities.
JSON data will be regenerated by the null_resource block for every run of terraform apply or terraform plan if the depends_on line is uncommented in each module block. Normally one would not need to regenerate data for every terraform run when interacting manually, so the depends_on line can be commented out to drastically cut time. If this code is run in a pipeline on a schedule (maybe once or twice a week), leave it uncommented so the pipeline always has the correct instance data. Regenerating data too often can lead to AWS API limits.
There is a high liklihood that your environment is organized differently, so please edit the following lines in instances.sh with the correct tags for your machines.
--query 'Reservations[*].Instances[?Tags[?Key==`Environment` && (Value!=`Production` && Value!=`DR` && Value!=`Prod`)]].[InstanceId]' \
> nonprod_instance-ids.json
--query 'Reservations[*].Instances[*].[InstanceId]' \
--filters "Name=tag:Environment,Values=DR,Production,Prod" \
> prod_instance-ids.json
-
terraform init -
terraform plan -var "profile=default" -var "region=us-east-1" -var "tag_name=Stack" -var "tag_value=Test"-var "threshold_ec2_cpu=70" -var "threshold_ec2_mem=90" -var "threshold_ec2_disk=90"-var 'sns_arn=["arn:aws:sns:us-east-1:000000000000:test"]' -
terraform apply -var "profile=default" -var "region=us-east-1" -var "tag_name=Stack" -var "tag_value=Test"-var "threshold_ec2_cpu=70" -var "threshold_ec2_mem=90" -var "threshold_ec2_disk=90"-var 'sns_arn=["arn:aws:sns:us-east-1:000000000000:test"]'