-
Notifications
You must be signed in to change notification settings - Fork 9.8k
Description
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Terraform CLI and Terraform AWS Provider Version
2.66.0
Affected Resource(s)
- aws_iam_role
Terraform Configuration Files
Any iam role resource
resource "aws_iam_role" "ssm_role" {}Debug Output
[DEBUG] [aws-sdk-go] DEBUG: Request iam/CreateRole Details:
[DEBUG] [aws-sdk-go] DEBUG: Send Request iam/CreateRole failed, attempt 0/25, error RequestError: send request failed
[DEBUG] [aws-sdk-go] DEBUG: Retrying Request iam/CreateRole, attempt 1
[DEBUG] [aws-sdk-go] DEBUG: Request iam/CreateRole Details:
[WARN] WaitForState timeout after 30s
[WARN] WaitForState starting 30s refresh grace period
[DEBUG] [aws-sdk-go] DEBUG: Send Request iam/CreateRole failed, attempt 1/25, error RequestError: send request failed
[DEBUG] [aws-sdk-go] DEBUG: Retrying Request iam/CreateRole, attempt 2
[DEBUG] [aws-sdk-go] DEBUG: Request iam/CreateRole Details:
[ERROR] WaitForState exceeded refresh grace period
[DEBUG] [aws-sdk-go] DEBUG: Request iam/CreateRole Details:
Panic Output
Expected Behavior
IAM role creation succeeds in cases of temporary IAM timeouts
Actual Behavior
Previous iamconn.CreateRole() is still running when resource.Retry() timeout happens. In many cases this results double creation attempt, and eventually a failure in the plugin.
Error: Error creating IAM Role hello-world-ssm_role: EntityAlreadyExists: Role with name hello-world-ssm_role already exists.
status code: 409, request id: removed
on main.tf line 18, in resource "aws_iam_role" "ssm_role":
18: resource "aws_iam_role" "ssm_role"
Steps to Reproduce
terraform apply
Important Factoids
var createResp *iam.CreateRoleOutput
err := resource.Retry(30*time.Second, func() *resource.RetryError {
var err error
createResp, err = iamconn.CreateRole(request) <-- Has internally a retry loop, can block more then 30 seconds
// IAM users (referenced in Principal field of assume policy)
// can take ~30 seconds to propagate in AWS
if isAWSErr(err, "MalformedPolicyDocument", "Invalid principal in policy") {
return resource.RetryableError(err)
}
return resource.NonRetryableError(err)
})
if isResourceTimeoutError(err) { <-- Goroutine started in Retry (WaitForState) can still be running
createResp, err = iamconn.CreateRole(request) <-- Issues another blocking CreateRole
}
There is already a bug created to terraform plugin sdk for better timeout handling, however it is not getting any attention.
We have been running a patched version of terraform plugin sdk in production for several months with great success. However, the patch might be too crude to upstream as it just removes parts of the timeout handling that was found to be odd behaviour.
References
terraform-plugin-sdk issue: hashicorp/terraform-plugin-sdk#530
terraform-plugin-sdk patch: hashicorp/terraform-plugin-sdk#529