Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote T's can get removed from the remoteTranscoders list if any T reconnects #2706

Closed
stronk-dev opened this issue Jan 3, 2023 · 4 comments · Fixed by #2707
Closed

Remote T's can get removed from the remoteTranscoders list if any T reconnects #2706

stronk-dev opened this issue Jan 3, 2023 · 4 comments · Fixed by #2707

Comments

@stronk-dev
Copy link
Contributor

Describe the bug

Whenever a T reconnects, they get a new entry in the remoteTranscoders list of transcoder objects held by the RemoteTranscoderManager
Their 'expired' entry in the remoteTranscoders list only gets removed when they later get selected for a session:

if _, ok := rtm.liveTranscoders[currentTranscoder.stream]; !ok {
// Remove the stream session because the transcoder is no longer live
if sessionExists {
rtm.completeStreamSession(sessionId)
}
// transcoder does not exist in table; remove and retry
rtm.remoteTranscoders = removeFromRemoteTranscoders(currentTranscoder, rtm.remoteTranscoders)
continue
}

Up to this point everything looks alright (although I wonder why does the Transcoder get removed from liveTranscoders immediately, but only gets removed from remoteTranscoders on selection)

Anyway, the error happens in the removeFromRemoteTranscoders function, where it splices the expired entry out of the list

func removeFromRemoteTranscoders(rt *RemoteTranscoder, remoteTranscoders []*RemoteTranscoder) []*RemoteTranscoder {
if len(remoteTranscoders) == 0 {
// No transocerds to remove, return
return remoteTranscoders
}
lastIndex := len(remoteTranscoders) - 1
last := remoteTranscoders[lastIndex]
if rt == last {
return remoteTranscoders[:lastIndex]
}
newRemoteTs := make([]*RemoteTranscoder, 0)
for i, t := range remoteTranscoders {
if t == rt {
if i == 0 {
return remoteTranscoders[1:]
}
newRemoteTs = remoteTranscoders[i-1 : i]
newRemoteTs = append(newRemoteTs, remoteTranscoders[i+1:]...)
break
}
}
return newRemoteTs
}

It creates the new list as

newRemoteTs = remoteTranscoders[i-1 : i] 
newRemoteTs = append(newRemoteTs, remoteTranscoders[i+1:]...) 

So if i is higher than 1, any transcoder in this list with an index lower than i-1 is going to get dropped from the list of remote transcoders!

In all regards the T is still considered a live transcoder connected to the Orchestrator, so no error pops up on the O or T side. They just won't receive any more streams

To Reproduce

Have multiple remote T's connected in a split O/T setup. Have one (or more) T's reconnect and wait for a few streams to come in. At some point you will see one T not receiving any more streams

@eliteprox
Copy link
Collaborator

Appears related to #2605

@thomshutt
Copy link
Contributor

Thanks @eliteprox - would you be able to work with @cyberj0g please @stronk-dev to help him reproduce the issue?

@stronk-dev
Copy link
Contributor Author

stronk-dev commented Jan 10, 2023

I haven't encountered #2605 yet so I can't speak as to reproducing that. but it does seem like that issue is caused by the bug as described in #2706.

Reproducing #2706 is as simple as having more than 2 remote transcoders and disconnecting the T with and index in rtm.remoteTranscoders higher than 1. All T's with an index lower than i-1 will get dropped when the disconnected T gets selected so I can imagine this can cause the set to get emptied under specific conditions
Might just be easiest to merge #2707 which fixes #2706 first to see if that also fixes #2605 as a side effect

@eliteprox
Copy link
Collaborator

eliteprox commented Feb 7, 2023

Thanks @eliteprox - would you be able to work with @cyberj0g please @stronk-dev to help him reproduce the issue?

I've opened #2747 for this issue. I did some troubleshooting with @0xb79 and @Titan-Node this evening and reproduced it in 0.5.37 under scenarios involving multiple streams. This function cleaned up quite a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants