You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Collators may stop producing blocks indefinitely if they are restarted during a session in which they have been benched by the new collator rotation mechanism.
With the new collator rotation feature, one collator is temporarily removed from the active collator set during odd sessions. I observed that if the benched collator node is restarted while it is not part of the active session set, it does not resume block production in the following session, even though it should become active again.
The collator remains synced and running, but it does not author blocks anymore. The issue persists indefinitely across subsequent sessions. Restarting the same collator again during a session in which it is expected to be active fixes the issue and block production resumes.
This looks like a bug in how the collator node handles session/authority changes when it starts while it is currently benched.
Expected Behavior
A collator restarted during a session in which it is benched should automatically resume block production once it becomes part of the active collator set again in the next applicable session.
The node should correctly detect the session transition and start authoring blocks again without requiring an additional manual restart.
Actual Behavior
If a collator is restarted while it is benched, it does not resume block production when it becomes eligible again.
Instead, it continues running but does not author blocks, even across later sessions where it should be active. The issue appears to persist indefinitely until the collator is manually restarted again during a session in which it is expected to produce blocks.
Possible Fix
The issue may be related to the node not properly updating or reinitializing its authoring role after a session transition when it was started while not included in the active collator set.
A possible area to investigate is the interaction between the new pallet-collator-rotation session manager wrapper and the collator authoring/session key logic on the node side.
The node should probably re-check whether it is part of the active authority/collator set at every session transition and enable block production accordingly, even if it was started during a session where it was temporarily benched.
Steps to Reproduce
Run a Basilisk collator that is part of the configured collator set.
Wait for a session in which this collator is benched by the collator rotation mechanism.
Restart the collator node during that benched session.
Wait until the following session, where the collator should become active again.
Observe that the collator does not resume block production.
Wait for additional sessions and observe that the collator still does not produce blocks.
Restart the collator again during a session in which it is expected to be active.
Observe that block production resumes after the restart.
Context
This issue affects collator reliability after the introduction of the collator rotation feature.
A routine restart, upgrade, crash recovery, or infrastructure maintenance operation may permanently stop a collator from producing blocks if it happens during the session where the collator is benched.
This is particularly problematic because the node appears to remain online and synced, but silently stops authoring blocks until another manual restart is performed at the right time.
Your Environment
Version used: Basilisk runtime including the new collator rotation feature introduced in PR feat: collator rotation #690
Description
Collators may stop producing blocks indefinitely if they are restarted during a session in which they have been benched by the new collator rotation mechanism.
With the new collator rotation feature, one collator is temporarily removed from the active collator set during odd sessions. I observed that if the benched collator node is restarted while it is not part of the active session set, it does not resume block production in the following session, even though it should become active again.
The collator remains synced and running, but it does not author blocks anymore. The issue persists indefinitely across subsequent sessions. Restarting the same collator again during a session in which it is expected to be active fixes the issue and block production resumes.
This looks like a bug in how the collator node handles session/authority changes when it starts while it is currently benched.
Expected Behavior
A collator restarted during a session in which it is benched should automatically resume block production once it becomes part of the active collator set again in the next applicable session.
The node should correctly detect the session transition and start authoring blocks again without requiring an additional manual restart.
Actual Behavior
If a collator is restarted while it is benched, it does not resume block production when it becomes eligible again.
Instead, it continues running but does not author blocks, even across later sessions where it should be active. The issue appears to persist indefinitely until the collator is manually restarted again during a session in which it is expected to produce blocks.
Possible Fix
The issue may be related to the node not properly updating or reinitializing its authoring role after a session transition when it was started while not included in the active collator set.
A possible area to investigate is the interaction between the new
pallet-collator-rotationsession manager wrapper and the collator authoring/session key logic on the node side.The node should probably re-check whether it is part of the active authority/collator set at every session transition and enable block production accordingly, even if it was started during a session where it was temporarily benched.
Steps to Reproduce
Context
This issue affects collator reliability after the introduction of the collator rotation feature.
A routine restart, upgrade, crash recovery, or infrastructure maintenance operation may permanently stop a collator from producing blocks if it happens during the session where the collator is benched.
This is particularly problematic because the node appears to remain online and synced, but silently stops authoring blocks until another manual restart is performed at the right time.
Your Environment