-
|
At the moment, I have implemented a basic version of SFU based on str0m library. 12 people at the SFU room. However, there are issues with SFU: Most client videos (although not all) regularly (every second or two) freeze for a moment (a fraction of a second) and then continue to display. What could be causing this problem? My assumptions:
|
Beta Was this translation helpful? Give feedback.
Replies: 19 comments 1 reply
-
If this is a network environment with 0 loss you shouldn't be seeing any freezing like that. You should start by determining the cause of the freezing. You can run Chrome with debug flags and it might tell you, but also check |
Beta Was this translation helpful? Give feedback.
-
No. str0m holds a buffer for egress video. It responds to NACK automatically.
BWE includes pacing.
No. The loss based bwe controller is there to make existing BWE better. It's not required.
Unless you're testing your code in bad network conditions, no. If you're on localhost or over a local WIFI you should not see the problem discussed in #744.
We are using str0m in production code, with paying customers and do not see this problem. We are not running anything special in terms of BWE etc. It's very likely these kins of hickups is something to do with how you forward the data between nodes. Like, are you scheduling some cleanup job ever now and then? Are you doing too much in one thread? Etc… |
Beta Was this translation helpful? Give feedback.
-
|
@Razzwan did you run the |
Beta Was this translation helpful? Give feedback.
-
Many of my potential clients have poor network connectivity, mostly relying on LTE. I witnessed the issues this can cause yesterday.
Yes, I did. It works smoothly locally, but I haven't tested it in production obviously (only locally). The SFU I created is based on the chat example, and the only things I added where:
What do you think is the most likely cause of problems among the things I've added? |
Beta Was this translation helpful? Give feedback.
-
|
You need to look at the ingress track metrics to understand what's going on. There are too many things that could be going wrong for us to say without more information. Some general thoughts:
|
Beta Was this translation helpful? Give feedback.
-
|
@k0nserv thank you very much!
What will I see, if this is an issue? Is it really possible? How can I send more than BWE allows if I follow this steps?
Yes, that's definitely my case. Considering my tests, I don't think the following issues are relevant:
|
Beta Was this translation helpful? Give feedback.
-
As long as you maintain
Right, so high loss and webrtc-rs performs better than str0m? Did you verify NACK is working at all i.e. do you see |
Beta Was this translation helpful? Give feedback.
-
No. I have taken a different approach in the WebRTC-rs implementation. I added a fallback mechanism that triggers a PLI request when a high number of NACKs is detected, which seems improvement in some cases The video issues in webrtc rs was not freezing, but a gradual degradation of quality, seems related to the Transport-Wide Congestion Control (TWCC) implementation, which also was added by me. So, in general str0m better then webrtc rs in most cases. For example, right now even week Android smartphone works acceptable (webrtc rs it was not working at all, don't know why). Regarding server performance: now I use 2 layers (
Didn't noticed that. Will test it soon. Will let you know. |
Beta Was this translation helpful? Give feedback.
-
I would expect the receiving peer (assuming it's a browser) to generate a PLI when it detects a frozen stream. If you forward this to the other side things should work out. You might need to throttle this to avoid asking for too many keyframes.
This should happen with str0m too, but is probably due to not using simulcast before. The browser side uses TWCC to do BWE and then reduces the target bitrate of its encoder if it determines the bitrate of the link is too low. With simulcast you will in addition see it turn off layers i.e. in your case it would turn off |
Beta Was this translation helpful? Give feedback.
-
|
The primary issue is video freezing with 3+ users, causing lip sync errors and a poor viewing experience. (Two-user calls work smoothly.) Are there any recommendations on how to fix this? Profiling shows that the main loop is executed once per ~50-100mksec, which doesn't seems too bad, but 10 clients looks each other with small regular freezes |
Beta Was this translation helpful? Give feedback.
-
|
We can't really help you without knowing why the freezing is happening as I outlined here. Presumably when you say |
Beta Was this translation helpful? Give feedback.
-
|
50-100 microseconds. Let me understand how settings works... let mut rtc = Rtc::builder()
// other settings here
.enable_bwe(Some(Bitrate::kbps(40)))
.build();
rtc.bwe().set_desired_bitrate(Bitrate::kbps(150));
rtc.bwe().set_current_bitrate(Bitrate::kbps(40));And what should I specify in Event::EgressBitrateEstimate(bitrate_kind) => {
match bitrate_kind {
BweKind::Remb(mid, bitrate) => {}
BweKind::Twcc(bitrate) => {
if (bitrate > Bitrate::kbps(150)) {
self.waiting_rid = Rid(mid);
self.rtc.bwe().set_current_bitrate(bitrate);
} else {
self.waiting_rid = Rid(low);
self.rtc.bwe().set_current_bitrate(bitrate);
}
}
}
Propagated::Noop
}Does it right? |
Beta Was this translation helpful? Give feedback.
-
|
50-100 microseconds seems a bit low. For comparison our SFU does about 3% CPU usage on For BWE you should set the desired bitrate to the bitrate you need to send all the traffic you want and the current rate to the observed bitrate of what you are currently sending. BWE does not account for audio atm so you should not include it. As an example:
If you have a peer receiving 4 video tracks and 4 audio tracks you should set desired bitrate to |
Beta Was this translation helpful? Give feedback.
-
|
I would start looking at stats at the clients, do you have loss or extreme jitter that cause the freeze. |
Beta Was this translation helpful? Give feedback.
-
My main guess is that there's a bug somewhere in my code, not in the statistics or objective data. I don't see anything in the statistics that could cause freezes. The same amount of nacs plis and firs shouldn't lead freezes. |
Beta Was this translation helpful? Give feedback.
-
Sorry, I still could not find where I can see that? Stats not containing |
Beta Was this translation helpful? Give feedback.
-
I'm not talking about str0m's stats, I'm talking about Chrome. You can see the stats in |
Beta Was this translation helpful? Give feedback.
-
Oh, it works. Yes, retransmittedPacketsReceived not empty. Thank you! |
Beta Was this translation helpful? Give feedback.
-
|
I think I discovered two problems in my code:
It still not perfect, but much better @algesten perhaps It could be moved to Discussions? If not, can be closed |
Beta Was this translation helpful? Give feedback.
50-100 microseconds seems a bit low. For comparison our SFU does about 3% CPU usage on
t3.mediumon average per peer.For BWE you should set the desired bitrate to the bitrate you need to send all the traffic you want and the current rate to the observed bitrate of what you are currently sending. BWE does not account for audio atm so you should not include it.
As an example:
low250kbit/smid750kbit/sIf you have a peer receiving 4 video tracks and 4 audio tracks you should set desired bitrate to
4 * 750kbit/si.e. 3Mbit/s, this will allow you to send themidlayer for all 4 tracks. If you are currently sending 2 tracks atlowand 2 atmidyou should set current bitrate to2 * 250kbit/…