Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated periodic purging of stale agents #5938

Open
sorindumitru opened this issue Mar 10, 2025 · 5 comments
Open

Automated periodic purging of stale agents #5938

sorindumitru opened this issue Mar 10, 2025 · 5 comments
Labels
priority/backlog Issue is approved and in the backlog

Comments

@sorindumitru
Copy link
Collaborator

In #1836 we've added the capability to purge attested agents using the CLI and left making run periodically and the capability of purging TOFU node to a later date (see #1836 (comment)). Added this issue so we don't forget about.

@sorindumitru sorindumitru added the triage/in-progress Issue triage is in progress label Mar 10, 2025
@nweisenauer-sap
Copy link
Contributor

nweisenauer-sap commented Mar 10, 2025

Thanks, I was just gonna open this topic myself 👍

@sorindumitru sorindumitru added priority/backlog Issue is approved and in the backlog and removed triage/in-progress Issue triage is in progress labels Mar 11, 2025
@sorindumitru
Copy link
Collaborator Author

This sounds like a worthwhile improvement. We can:

  • Enable periodic purging of agents
  • Add an option to also allow purging non-reattestable nodes. The leftover entry in the database effectively bans the agent from re-attesting. Some users might want this. The configurable could take the form of a time.Duration to keep the expired agent around.
  • The purging of agents should not purge banned agents.

@nweisenauer-sap
Copy link
Contributor

nweisenauer-sap commented Mar 12, 2025

It would solve one of our problems if we could add an option to also purge agents that are non-reattestable (CanReattest set to false) if they have been expired for a configurable duration.
Rationale:
We are trying to get away from TOFU and only use node-attestors that have "CanReattest" set to true, however, we might be stuck with one node attestation plugin (in-house development) that requires TOFU for as long as the node and agent is up and running (we don't want another process to startup an agent on this node as long as it is active). Once this node is shut down and the agent expired, we can pretty much guarantee that this node and agent will not come back.
Currently we are left with lots and lots of database entries that cannot be easily cleaned up, because those agents used a TOFU based node attestor and have CanReattest set to false, so we need to come up with creative ways to cleanup the corresponding database tables of agents that have been expired for at least a few days.

@sorindumitru
Copy link
Collaborator Author

If it's an one-off thing, you can also do it manually:

expiredAgents = $(spire-server agent list -expiresBefore '2025-03-01 ...')
for each expiredAgent {
  spire-server agent evict $agent
}

It could be slow, but it should do the trick.

@nweisenauer-sap
Copy link
Contributor

Yes it is very slow once they accumulated, so an automatic, periodic cleanup would be preferable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/backlog Issue is approved and in the backlog
Projects
None yet
Development

No branches or pull requests

2 participants