-
Notifications
You must be signed in to change notification settings - Fork 315
fix(multipath): fix remote state actor termination #3676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feat-multipath
Are you sure you want to change the base?
Conversation
|
Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh/pr/3676/docs/iroh/ Last updated: 2025-11-18T13:53:21Z |
6ffcbb0 to
7c11665
Compare
| _ = self.local_addrs.updated() => { | ||
| _ = self.local_addrs.updated(), if self.local_addrs.is_connected() => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned this briefly while on the call, but I think we got distracted by something else before the answer:
Wouldn't it make sense to match on the Result you get back?
- In case it's
Err(Disconnected)you know that you can probably close the actor. This should only happen whenmagicsock::DiscoveredDirectAddrsis dropped, which should only happen whenMagicSockis dropped. Soo.. I think dropping out of the loop at this point is totally fine, even desired. - IIUC, the
, if self.local_addrs.is_connected()is a precondition, so it's possible that you start pollingupdated()while it's connected, but then it disconnects and resolves toErr(Disconnected). I don't think that was the intended change here. The fix for this would be to just match theResultfromupdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this. Though hope that one day we have structured concurrency and disconnected just can't happen.
This... might be another thing to look into as a fix for this. I went with spawning these actors detached because it was the fasted thing to do and get on with thinking about the logic I had to solve.
But we should totally improve this. They should be spawned in a JoinSet, but that might need an RemoteMap actor or something to manage these tasks and that itself be a child of the magicsock::Actor. Or manage the JoinSet immediately in the magicsock::Actor, but I didn't know if that violated too many boundaries.
Though having said all that, I think we should handle this disconnect stuff here correctly and do the structured-concurrency fix some later time.
| if self.connections.is_empty() && inbox.is_idle() && idle_timeout.is_none() { | ||
| let is_idle = self.connections.is_empty() && inbox.is_idle(); | ||
| if idle_timeout.is_none() && is_idle { | ||
| trace!("start idle timeout"); | ||
| idle_timeout | ||
| .as_mut() | ||
| .set_future(time::sleep(ACTOR_MAX_IDLE_TIMEOUT)); | ||
| } else if idle_timeout.is_some() { | ||
| } else if idle_timeout.is_some() && !is_idle { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I commented, I think I'd prefer the other version from my PR to this fix. There's lots of states that idle_timeout can be in, and it's probably easier to default it to sth that continuously gets reset further and further into the future.
| self.handle_msg_send_datagram(transmit).await?; | ||
| if let Err(err) = self.handle_msg_send_datagram(transmit).await { | ||
| warn!("failed to send datagram: {err:#}"); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a bug:
We should avoid short-circuiting not on the handle_msg_send_datagram level, but on the send_datagram level. Otherwise a single path that fails will make us stop sending datagrams to any other paths that might still work.
So in essence that just means this if let Err(err) = ... should be moved inside the handle_msg_send_datagram call.
| let task = task::spawn( | ||
| async move { | ||
| if let Err(err) = self.run(rx).await { | ||
| error!("actor failed: {err:#}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this was a poor-man's attempt at being able to have "fatal" errors. The right thing of course is to panic and spawn the tasks in a JoinSet and making sure the panic bubbles up to the endpoint. But that's for another day I guess. For now making this infallible is the safer approach I think. While if forces you to handle the fatal error everywhere it will probably result in better code.
| if any_match { | ||
| Err(io::Error::other("all available transports failed")) | ||
| Err(io::Error::other(format!( | ||
| "all available transports failed for destination {dst:?}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum, convention in the stdlib is that the caller should provide the context. Don't we have somewhere else to add context?
Description
Breaking Changes
Notes & open questions
Change checklist
quic-rpciroh-gossipiroh-blobsdumbpipesendme