-
Notifications
You must be signed in to change notification settings - Fork 419
Stop enqueueing messages for disconnected peers #4094
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Stop enqueueing messages for disconnected peers #4094
Conversation
If a channel operation occurs while a peer is disconnected, we'll still have a `PeerState` for that peer. Historically, we haven't bothered to check if a peer is actually connected before we push new messages onto the `PeerState::pending_msg_events` field, leaving us sending messages into the void. This is generally not an issue as `ChannelManager::get_and_clear_pending_msg_events` should be called very regularly, removing these messages and then dropping them as `PeerManager` won't have anything to do with them. Still, there is a race condition here - if a peer manages to connect between the generation of one such message and when `get_and_clear_pending_msg_events` is called, we may enqueue a message to the peer which makes no sense and could lead to a spurious channel closure (especially in the case of an async `ChannelMonitorUpdate` completion or async signing operation, which often lead to normal channel message generation). Further, if a peer is slow to send their `channel_reestablish` message after connection this race could be substantially more likely, as such normal channel messages may be nonsense until we've completed the reestablish dance (i.e. the later reestablish dance may lead us to re-send the same messages again immediately). Here we remove most of the cases where we enqueue messages for disconnected peers. Note that we differentiate between two different checks for connected-ness - for cases where we're sending an `error` or gossip messages, we allow the messages to be enqueued if the peer is connected at all. For most other cases, we only allow messages to be enqueued if the peer is connected *and* the channel has completed its reestablish dance (if required, i.e. the channel is "connected").
If a channel is failed while a peer is disconnected, we'll still have a `PeerState` for that peer. Historically, we haven't bothered to check if a peer is actually connected before we push the `error` message onto the `PeerState::pending_msg_events` queue, leaving us sending messages into the void. This is generally not an issue as `ChannelManager::get_and_clear_pending_msg_events` should be called very regularly, removing these messages and then dropping them as `PeerManager` won't have anything to do with them. Further, when the the message is an `error`, if a peer happens to connect between when we push the message and when `get_and_clear_pending_msg_events` is called the worst that happens is they get the `error` message we'd end up sending them when they try to reestablish the channel anyway. Still, its awkward to leave the `error`s lying around in a message queue for a disconnected peer, so we remove them here.
In the previous two commits we stopped enqueueing messages for disconnected peers. Here we add some basic test coverage of these changes by asserting that there are no queued messages for a peer when they (re-)connect. We also add a few assertions that a peer is connected when we push a message onto the queu, in cases where we expect a peer to be connected.
👋 Thanks for assigning @tnull as a reviewer! |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4094 +/- ##
==========================================
- Coverage 88.61% 88.60% -0.01%
==========================================
Files 176 176
Lines 132099 132161 +62
Branches 132099 132161 +62
==========================================
+ Hits 117060 117103 +43
- Misses 12365 12386 +21
+ Partials 2674 2672 -2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
If a channel operation occurs while a peer is disconnected, we'll
still have a
PeerState
for that peer. Historically, we haven'tbothered to check if a peer is actually connected before we push
new messages onto the
PeerState::pending_msg_events
field,leaving us sending messages into the void.
This is generally not an issue as
ChannelManager::get_and_clear_pending_msg_events
should be calledvery regularly, removing these messages and then dropping them as
PeerManager
won't have anything to do with them.Still, there is a race condition here - if a peer manages to
connect between the generation of one such message and when
get_and_clear_pending_msg_events
is called, we may enqueue amessage to the peer which makes no sense and could lead to a
spurious channel closure (especially in the case of an async
ChannelMonitorUpdate
completion or async signing operation, whichoften lead to normal channel message generation).
Further, if a peer is slow to send their
channel_reestablish
message after connection this race could be substantially more
likely, as such normal channel messages may be nonsense until we've
completed the reestablish dance (i.e. the later reestablish dance
may lead us to re-send the same messages again immediately).
Here we remove most of the cases where we enqueue messages for
disconnected peers.
Note that we differentiate between two different checks for
connected-ness - for cases where we're sending an
error
or gossipmessages, we allow the messages to be enqueued if the peer is
connected at all. For most other cases, we only allow messages to
be enqueued if the peer is connected and the channel has
completed its reestablish dance (if required, i.e. the channel is
"connected").
Fixes #4036