Skip to content

Save EC2 cloud when error code is ExpiredToken#1084

Open
salvomarino wants to merge 3 commits into
jenkinsci:masterfrom
salvomarino:bugfix/aws-exception-expired-token
Open

Save EC2 cloud when error code is ExpiredToken#1084
salvomarino wants to merge 3 commits into
jenkinsci:masterfrom
salvomarino:bugfix/aws-exception-expired-token

Conversation

@salvomarino
Copy link
Copy Markdown

@salvomarino salvomarino commented May 2, 2025

This PR enhances the handling of ExpiredToken error code for the AwsServiceException exception in the provision method of EC2Cloud. The previous PR #1008 successfully catches the exception error code, but the reconnectToEc2() call doesn't resolve the issue, and the AWS STS token remains expired. Our investigation revealed that saving the EC2 Cloud configuration refreshes the AWS STS token and resolves the problem. Currently, we're mitigating this by manually saving configurations
Screenshot 2025-05-02 at 14 45 03
or running the Groovy script below via the Script Console.

import hudson.plugins.ec2.*

def newSessionNamesMap = [:]
def j = Jenkins.get()
def ec2Clouds = j.clouds.findAll { it instanceof EC2Cloud }
if (ec2Clouds) {
    ec2Clouds.each { oldCloud ->
        println "[info] Handling EC2 Cloud: " + oldCloud.name
        String oldSessionName = oldCloud.roleSessionName
        def newSessionName
        String currentTimestamp = System.currentTimeMillis().toString()
        def lastHyphenIndex = oldSessionName.lastIndexOf('-')
        if (lastHyphenIndex != -1 && oldSessionName.substring(lastHyphenIndex + 1).matches("\\d+")) {
            newSessionName = oldSessionName.substring(0, lastHyphenIndex) + '-' + currentTimestamp
        } else {
            newSessionName = oldSessionName + '-' + currentTimestamp
        }
        newSessionNamesMap[oldCloud.name] = newSessionName
        println "[info] Current session name : " + oldSessionName + ", new session name : " + newSessionName
        def newCloud = new EC2Cloud(
            oldCloud.name,
            oldCloud.useInstanceProfileForCredentials,
            oldCloud.credentialsId,
            oldCloud.region,
            oldCloud.privateKey,
            oldCloud.sshKeysCredentialsId,
            oldCloud.instanceCapStr,
            oldCloud.templates,
            oldCloud.roleArn,
            newSessionName
        )
        j.clouds.replace(oldCloud, newCloud)
    }
    println "[info] Saving Jenkins instance configuration..."
    j.save()
    ec2Clouds = j.clouds.findAll { it instanceof EC2Cloud }
    ec2Clouds.each { cloud ->
        println "[info] Checking EC2 Cloud: '" + cloud.name + "'..."
        if (cloud.roleSessionName == newSessionNamesMap[cloud.name]) {
            println "[info] Session name has been updated as expected: " + cloud.roleSessionName
        } else {
            def errorMessage = "[error] Session name has not been updated as expected.\n" +
                "Current roleSessionName: " + cloud.roleSessionName + "\n" +
                "Expected roleSessionName: " + newSessionNamesMap[cloud.name] + "\n"
            println errorMessage
            throw new Exception(errorMessage)
        }
    }
} else {
    println "[info] No EC2 Cloud found."
}
return null

The script above performs the save on all the EC2 Clouds and changes the session name to validate that a change/save has been made on each cloud. However, saving the cloud without changing fields is enough to handle the issue. This PR automates this process to eliminate the need for external intervention and save the EC2 Cloud configuration only if and when necessary.

Testing done

We've been experiencing this issue in our production and testing environments, but couldn't reliably reproduce it.

To validate the solution:

  1. We monitored instances where token expiration occurred and confirmed that saving the EC2 Cloud configuration resolves the issue
  2. We implemented a scheduled job to save the EC2 Cloud configuration periodically, and this mitigated the ExpiredToken issue

The fix handles the token refresh transparently to users, maintaining the same behaviour but eliminating the need for external configuration saves.

Submitter checklist

  • Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
  • Ensure that the pull request title represents the desired changelog entry
  • Please describe what you did
  • Link to relevant issues in GitHub or Jira
  • Link to relevant pull requests, esp. upstream and downstream changes
  • Ensure you have provided tests - that demonstrates feature works or fixes the issue

@salvomarino salvomarino marked this pull request as draft May 2, 2025 10:23
@salvomarino salvomarino marked this pull request as ready for review May 19, 2025 10:33
@salvomarino salvomarino changed the title Save EC2 cloud when error code is ExpiredToken Save EC2 cloud when error code is ExpiredToken May 22, 2025
@salvomarino salvomarino changed the title Save EC2 cloud when error code is ExpiredToken Save EC2 cloud when error code is ExpiredToken May 22, 2025
@salvomarino salvomarino changed the title Save EC2 cloud when error code is ExpiredToken Save EC2 cloud when error code is ExpiredToken May 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant