Skip to content

Conversation

@claykirk
Copy link

@claykirk claykirk commented Apr 26, 2020

It appears that the "clusterSpecificAgentInstances" in the "DockerPlugin" class becomes stale and inconsistent over time. When a SERVER_PING request is sent from the server it provides a list of agents in the request which can be referenced using the "pluginRequest.listAgents()" method. The plugin iterates through this list to cleanup the agents. During cleanup, a fetch is made against the "clusterSpecificAgentInstances" using the "instancesCreatedAfterTimeout" method of the "DockerContainers" class when the "ServerPingRequestExecutor" calls the "performCleanupForACluster" method here. The agents in question never get cleaned up or disabled because the "DockerContainers" instance used is stale and hasn't been refreshed since the plugin was instantiated. The code to add these agents to the list of agents to disable exists here.

To address the inconsistent view of the "clusterSpecificAgentInstances" I've introduced a periodic force refresh of the "clusterSpecificAgentInstances" by scheduling a task each hour to reset the "refreshed" variable to false. This seems to help the problem but I'm not sure if there is a better solution for the root cause.

@ghost
Copy link

ghost commented Apr 26, 2020

CLA assistant check
All committers have signed the CLA.

@claykirk claykirk force-pushed the refresh-cluster-agents-periodically branch from 4944858 to d301b21 Compare April 26, 2020 21:13
@arvindsv arvindsv requested a review from GaneshSPatil April 27, 2020 09:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant