Short regular video freezes #762

Razzwan · 2025-11-24T07:49:22Z

Razzwan
Nov 24, 2025

At the moment, I have implemented a basic version of SFU based on str0m library. 12 people at the SFU room. However, there are issues with SFU:

Most client videos (although not all) regularly (every second or two) freeze for a moment (a fraction of a second) and then continue to display. What could be causing this problem?

My assumptions:

Is it necessary to manually implement NACK report forwarding?
Is it necessary to enable paicing separately from BWE discussed here or BWE also include paicing?
Is it necessary to enable_loss_controller using enable_experimental_loss_based_bwe method?
Could this be related to the issue 744?

Answered by k0nserv

Nov 27, 2025

50-100 microseconds seems a bit low. For comparison our SFU does about 3% CPU usage on t3.medium on average per peer.

For BWE you should set the desired bitrate to the bitrate you need to send all the traffic you want and the current rate to the observed bitrate of what you are currently sending. BWE does not account for audio atm so you should not include it.

As an example:

low 250kbit/s
mid 750kbit/s

If you have a peer receiving 4 video tracks and 4 audio tracks you should set desired bitrate to 4 * 750kbit/s i.e. 3Mbit/s, this will allow you to send the mid layer for all 4 tracks. If you are currently sending 2 tracks at low and 2 at mid you should set current bitrate to 2 * 250kbit/…

View full answer

k0nserv · 2025-11-24T10:06:23Z

k0nserv
Nov 24, 2025
Collaborator

No. However, you do need to forward PLI/FIRs
Pacing is included
Not really no
If you have a lot of loss yes

If this is a network environment with 0 loss you shouldn't be seeing any freezing like that. You should start by determining the cause of the freezing. You can run Chrome with debug flags and it might tell you, but also check chrome://webrtc-internals for packet loss. Also, check if unfreezing correlates with incoming keyframes

0 replies

algesten · 2025-11-24T10:07:55Z

algesten
Nov 24, 2025
Maintainer

Is it necessary to manually implement NACK report forwarding?

No. str0m holds a buffer for egress video. It responds to NACK automatically.

Is it necessary to enable paicing separately from [BWE discussed here]

(How to start use BWE and Simulcast? #752) or BWE also include paicing?

BWE includes pacing.

Is it necessary to enable_loss_controller using enable_experimental_loss_based_bwe method?

No. The loss based bwe controller is there to make existing BWE better. It's not required.

Could this be related to the issue 744?

Unless you're testing your code in bad network conditions, no. If you're on localhost or over a local WIFI you should not see the problem discussed in #744.

Most client videos (although not all) regularly (every second or two) freeze for a moment (a fraction of a second) and then continue to display. What could be causing this problem?

We are using str0m in production code, with paying customers and do not see this problem. We are not running anything special in terms of BWE etc. It's very likely these kins of hickups is something to do with how you forward the data between nodes. Like, are you scheduling some cleanup job ever now and then? Are you doing too much in one thread? Etc…

0 replies

algesten · 2025-11-24T10:10:13Z

algesten
Nov 24, 2025
Maintainer

@Razzwan did you run the chat example in str0m? That should produce smoooth video between two video tabs. No freezes.

0 replies

Razzwan · 2025-11-24T12:52:09Z

Razzwan
Nov 24, 2025
Author

Unless you're testing your code in bad network conditions, no. If you're on localhost or over a local WIFI you should not see the problem discussed in #744.

Many of my potential clients have poor network connectivity, mostly relying on LTE. I witnessed the issues this can cause yesterday.

@Razzwan did you run the chat example in str0m? That should produce smoooth video between two video tabs. No freezes.

Yes, I did. It works smoothly locally, but I haven't tested it in production obviously (only locally). The SFU I created is based on the chat example, and the only things I added where:

Enable BWE: Sending request from Event::EgressBitrateEstimate to publisher keyframe. For that, I put the link to mpsc::Sender<SfuRoomEvent> (seems like zero cost)
Similar events I send to stop/resume vide/audio streams. For that, I created tokio::sync::mpsc with buffer size 32, and I still read this buffer in a loop like rx.try_recv(). Perhaps that's the reason...

What do you think is the most likely cause of problems among the things I've added?

0 replies

k0nserv · 2025-11-24T13:59:14Z

k0nserv
Nov 24, 2025
Collaborator

You need to look at the ingress track metrics to understand what's going on. There are too many things that could be going wrong for us to say without more information.

Some general thoughts:

You might be over-saturating the link by sending more than BWE indicates the link can handle.
You might not be sending a steady packet stream due to implementation errors.
You might not be forwarding PLI/FIRs.
You might have a lot of loss, artificial or otherwise.
You might be making errors when sending (mixing up streams or simulcast layers)

0 replies

Razzwan · 2025-11-24T14:14:40Z

Razzwan
Nov 24, 2025
Author

@k0nserv thank you very much!

You might be over-saturating the link by sending more than BWE allows.

What will I see, if this is an issue? Is it really possible? How can I send more than BWE allows if I follow this steps?

You might have a lot of loss

Yes, that's definitely my case.

Considering my tests, I don't think the following issues are relevant:

You might not be sending a steady packet stream due to implementation errors.

You might not be forwarding PLI/FIRs.

You might be making errors when sending (mixing up streams or simulcast layers)

0 replies

k0nserv · 2025-11-24T14:18:41Z

k0nserv
Nov 24, 2025
Collaborator

What will I see, if this is an issue? Is it really possible? How can I send more than BWE allows if I follow #752?

As long as you maintain media_bandwidth < bwe_estimate you should be okay. We apply a moving average to the BWE because it's a bit too flip-floppy.

Yes, that's definitely my case.

Right, so high loss and webrtc-rs performs better than str0m? Did you verify NACK is working at all i.e. do you see retransmittedPacketsReceived on the ingress RTP stream?

0 replies

Razzwan · 2025-11-24T14:43:21Z

Razzwan
Nov 24, 2025
Author

Right, so high loss and webrtc-rs performs better than str0m? Did you verify NACK is working at all i.e. do you see retransmittedPacketsReceived on the ingress RTP stream?

No. I have taken a different approach in the WebRTC-rs implementation. I added a fallback mechanism that triggers a PLI request when a high number of NACKs is detected, which seems improvement in some cases

The video issues in webrtc rs was not freezing, but a gradual degradation of quality, seems related to the Transport-Wide Congestion Control (TWCC) implementation, which also was added by me.

So, in general str0m better then webrtc rs in most cases. For example, right now even week Android smartphone works acceptable (webrtc rs it was not working at all, don't know why).

Regarding server performance: now I use 2 layers (low and mid) and performance the same like it was when I use only one layer on webrtc rs (mid only) and twcc implementation seems much more better!

Did you verify NACK is working at all i.e. do you see retransmittedPacketsReceived on the ingress RTP stream?

Didn't noticed that. Will test it soon. Will let you know.

0 replies

k0nserv · 2025-11-24T15:09:37Z

k0nserv
Nov 24, 2025
Collaborator

No. I have taken a different approach in the WebRTC-rs implementation. I added a fallback mechanism that triggers a PLI request when a high number of NACKs is detected, which seems improvement in some cases

I would expect the receiving peer (assuming it's a browser) to generate a PLI when it detects a frozen stream. If you forward this to the other side things should work out. You might need to throttle this to avoid asking for too many keyframes.

The video issues in webrtc rs was not freezing, but a gradual degradation of quality, seems related to the Transport-Wide Congestion Control (TWCC) implementation, which also was added by me.

This should happen with str0m too, but is probably due to not using simulcast before. The browser side uses TWCC to do BWE and then reduces the target bitrate of its encoder if it determines the bitrate of the link is too low. With simulcast you will in addition see it turn off layers i.e. in your case it would turn off mid entirely. In this case you must detect this and change your allocation on the egress side of the SFU. In str0m this will be a pause event at which point you need to switch all egress streams to low and request a keyframe.

0 replies

Razzwan · 2025-11-27T14:32:50Z

Razzwan
Nov 27, 2025
Author

The primary issue is video freezing with 3+ users, causing lip sync errors and a poor viewing experience. (Two-user calls work smoothly.)

Are there any recommendations on how to fix this?

Profiling shows that the main loop is executed once per ~50-100mksec, which doesn't seems too bad, but 10 clients looks each other with small regular freezes

0 replies

k0nserv · 2025-11-27T14:38:58Z

k0nserv
Nov 27, 2025
Collaborator

We can't really help you without knowing why the freezing is happening as I outlined here. Presumably when you say mksec you meant ms (milliseconds) not us (microseconds), the latter would be quite bad

0 replies

Razzwan · 2025-11-27T15:00:55Z

Razzwan
Nov 27, 2025
Author

50-100 microseconds.

Let me understand how settings works...
Suppose I'm going to receive videos from 10 users and send 2 video streams to SFU with Rid(low)=40kbps and Rid(mid)=150kbps. I would like to start from Rid(low). What should I specify in the settings?

let mut rtc = Rtc::builder()
    // other settings here
    .enable_bwe(Some(Bitrate::kbps(40)))
    .build();

rtc.bwe().set_desired_bitrate(Bitrate::kbps(150));
rtc.bwe().set_current_bitrate(Bitrate::kbps(40));

And what should I specify in Event::EgressBitrateEstimate after that?

Event::EgressBitrateEstimate(bitrate_kind) => {
    match bitrate_kind {
        BweKind::Remb(mid, bitrate) => {}
        BweKind::Twcc(bitrate) => {
            if (bitrate > Bitrate::kbps(150)) {
                self.waiting_rid = Rid(mid);
                self.rtc.bwe().set_current_bitrate(bitrate);
            } else {
                self.waiting_rid = Rid(low);
                self.rtc.bwe().set_current_bitrate(bitrate);
            }
        }
    }

    Propagated::Noop
}

Does it right?

0 replies

k0nserv · 2025-11-27T15:11:45Z

k0nserv
Nov 27, 2025
Collaborator

50-100 microseconds seems a bit low. For comparison our SFU does about 3% CPU usage on t3.medium on average per peer.

For BWE you should set the desired bitrate to the bitrate you need to send all the traffic you want and the current rate to the observed bitrate of what you are currently sending. BWE does not account for audio atm so you should not include it.

As an example:

low 250kbit/s
mid 750kbit/s

If you have a peer receiving 4 video tracks and 4 audio tracks you should set desired bitrate to 4 * 750kbit/s i.e. 3Mbit/s, this will allow you to send the mid layer for all 4 tracks. If you are currently sending 2 tracks at low and 2 at mid you should set current bitrate to 2 * 250kbit/s + 2 * 750kbit/s or look at the actual bitrate over the last N seconds and use that.

0 replies

xnorpx · 2025-11-27T15:21:30Z

xnorpx
Nov 27, 2025
Collaborator

I would start looking at stats at the clients, do you have loss or extreme jitter that cause the freeze.

0 replies

Razzwan · 2025-11-27T21:40:53Z

Razzwan
Nov 27, 2025
Author

I would start looking at stats at the clients, do you have loss or extreme jitter that cause the freeze.

My main guess is that there's a bug somewhere in my code, not in the statistics or objective data. I don't see anything in the statistics that could cause freezes. The same amount of nacs plis and firs shouldn't lead freezes.

0 replies

Razzwan · 2025-11-28T03:35:25Z

Razzwan
Nov 28, 2025
Author

@k0nserv

Did you verify NACK is working at all i.e. do you see retransmittedPacketsReceived on the ingress RTP stream?

Sorry, I still could not find where I can see that? Stats not containing retransmittedPacketsReceived field

0 replies

k0nserv · 2025-11-28T09:30:55Z

k0nserv
Nov 28, 2025
Collaborator

Sorry, I still could not find where I can see that? Stats not containing retransmittedPacketsReceived field

I'm not talking about str0m's stats, I'm talking about Chrome. You can see the stats in chrome://webrtc-internals. Here's a good blog post about it and how do debug in other browsers

0 replies

Razzwan · 2025-11-28T19:42:38Z

Razzwan
Nov 28, 2025
Author

I'm not talking about str0m's stats, I'm talking about Chrome. You can see the stats in chrome://webrtc-internals. Here's a good blog post about it and how do debug in other browsers

Oh, it works. Yes, retransmittedPacketsReceived not empty. Thank you!

0 replies

Razzwan · 2025-11-28T19:50:30Z

Razzwan
Nov 28, 2025
Author

I think I discovered two problems in my code:

Fixed a hidden PLI issue: A component was generating a flood of PLI requests that circumvented the throttling system. This wasn't logged but was detectable via the statistics system.
Improved bitrate calculation: With help from @k0nserv, I identified and fixed an error in the desired bitrate calculation. The fix leads to much better performance

It still not perfect, but much better

@algesten perhaps It could be moved to Discussions? If not, can be closed

1 reply

algesten Nov 28, 2025
Maintainer

Done

Short regular video freezes #762

Uh oh!

Razzwan Nov 24, 2025

Replies: 19 comments · 1 reply

Uh oh!

Uh oh!

k0nserv Nov 24, 2025 Collaborator

Uh oh!

algesten Nov 24, 2025 Maintainer

Uh oh!

algesten Nov 24, 2025 Maintainer

Uh oh!

Uh oh!

Razzwan Nov 24, 2025 Author

Uh oh!

Uh oh!

k0nserv Nov 24, 2025 Collaborator

Uh oh!

Razzwan Nov 24, 2025 Author

Uh oh!

k0nserv Nov 24, 2025 Collaborator

Uh oh!

Uh oh!

Razzwan Nov 24, 2025 Author

Uh oh!

Uh oh!

k0nserv Nov 24, 2025 Collaborator

Uh oh!

Razzwan Nov 27, 2025 Author

Uh oh!

k0nserv Nov 27, 2025 Collaborator

Uh oh!

Uh oh!

Razzwan Nov 27, 2025 Author

Uh oh!

Uh oh!

k0nserv Nov 27, 2025 Collaborator

Uh oh!

xnorpx Nov 27, 2025 Collaborator

Uh oh!

Razzwan Nov 27, 2025 Author

Uh oh!

Uh oh!

Razzwan Nov 28, 2025 Author

Uh oh!

k0nserv Nov 28, 2025 Collaborator

Uh oh!

Razzwan Nov 28, 2025 Author

Uh oh!

Uh oh!

Razzwan Nov 28, 2025 Author

Uh oh!

algesten Nov 28, 2025 Maintainer

Razzwan
Nov 24, 2025

Replies: 19 comments 1 reply

k0nserv
Nov 24, 2025
Collaborator

algesten
Nov 24, 2025
Maintainer

algesten
Nov 24, 2025
Maintainer

Razzwan
Nov 24, 2025
Author

k0nserv
Nov 24, 2025
Collaborator

Razzwan
Nov 24, 2025
Author

k0nserv
Nov 24, 2025
Collaborator

Razzwan
Nov 24, 2025
Author

k0nserv
Nov 24, 2025
Collaborator

Razzwan
Nov 27, 2025
Author

k0nserv
Nov 27, 2025
Collaborator

Razzwan
Nov 27, 2025
Author

k0nserv
Nov 27, 2025
Collaborator

xnorpx
Nov 27, 2025
Collaborator

Razzwan
Nov 27, 2025
Author

Razzwan
Nov 28, 2025
Author

k0nserv
Nov 28, 2025
Collaborator

Razzwan
Nov 28, 2025
Author

Razzwan
Nov 28, 2025
Author

algesten Nov 28, 2025
Maintainer