Skip to content

Conversation

@MotorBottle
Copy link

@MotorBottle MotorBottle commented Oct 16, 2025

Long-Audio Slowdown in FunASR GPU Inferencing (Root cause: kwargs state leaks)

What I Observed

  • First pass on a 30 min+ recording finishes quickly, but running the same clip again almost doubles the time (sometimes even longer).
  • GPU stays on cuda:0 throughout (so it was not backend issue); the slowdown persists until the process is restarted.

Root Cause

  • FunASR's Automodel keeps runtime configuration (kwargs, vad_kwargs, punc_kwargs, spk_kwargs, etc.) in mutable dictionaries.
  • Long inferences mutate those dicts (e.g., torch_threads grows from the default 4 to the host’s 72 threads on my server, slowing down inference). FunASR never resets them, so the next request inherits the “dirty” state and slows down.

Fix

  • Snapshot every *_kwargs right after AutoModel builds its modules and restore that baseline before each inference (including VAD, punctuation, diarization).
  • Reapply the intended values like ncpu and only call torch.set_num_threads() when needed, preventing thread drift.
  • Result: even recordings can be processed repeatedly without the default params getting contaminated.

FunASR 长音频GPU推理降速问题(根本原因:AutoModel 在初始时把所有运行配置放在同一个全局 kwargs 字典里,多模型推理时,这个字典会被内部逻辑实时修改,例如调整 batch_size、ncpu 等参数,推理结束后不会恢复原值)

观察现象

  • 处理 30 分钟以上音频时,发现第一次推理速度很快,但同一段音频第二次推理耗时几乎翻倍甚至更长。
  • GPU 始终运行在 cuda:0说明不是推理设备问题,但性能劣化会一直持续,除非重启进程。

根本原因

  • FunASR 的 AutoModel 将运行时配置(kwargsvad_kwargspunc_kwargsspk_kwargs 等)保存为可变字典。
  • 长音频推理过程中,这些字典会被修改(例如 ncpu 默认是 4, 但同时运行的内部逻辑修改torch_threads,推理结束后torch_threads变为72)。由于 FunASR 不会恢复默认配置,下一次请求就会沿用污染过的状态,导致速度下降。

解决方案

  • AutoModel 构建完所有模块后,立即对每个 *_kwargs 做快照,并在每次推理前恢复这份基准配置(包括 VAD、标点、说话人识别模块)。
  • 重新写入期望的参数如 ncpu,仅在线程设置发生变化时调用 torch.set_num_threads(),防止线程数漂移。
  • 效果:长音频可多次连续推理而不会污染默认参数,性能恢复稳定。

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @MotorBottle, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a performance degradation issue in FunASR's AutoModel during long audio GPU inferencing. The problem stemmed from mutable runtime configuration dictionaries (kwargs) that were not reset between inference calls, leading to "state leaks" where parameters like the number of CPU threads (ncpu) would drift and negatively impact subsequent runs. The solution involves snapshotting the initial clean configuration of these dictionaries and restoring them before each inference, along with robust management of CPU thread settings, to ensure consistent and stable performance.

Highlights

  • State Leak Prevention: Implemented a mechanism to snapshot and restore kwargs configurations for AutoModel and its submodules (VAD, Punctuation, Speaker Diarization) before each inference, preventing runtime state modifications from affecting subsequent runs.
  • CPU Thread Management: Introduced a helper function _resolve_ncpu and logic to ensure ncpu (number of CPU threads) is consistently applied and reset, only calling torch.set_num_threads() when necessary to prevent thread count drift.
  • Performance Stability: Addresses a reported issue where long audio inferencing performance degraded significantly after the initial run due to ncpu state leaks, ensuring stable and consistent performance across multiple inferences.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a performance degradation issue in FunASR's GPU inferencing for long audio files. The root cause was identified as state leaks in the kwargs of the AutoModel, where runtime configurations were being mutated during long inferences and not reset, leading to slower subsequent requests. The fix involves snapshotting the *_kwargs after the AutoModel is built and restoring them before each inference. Additionally, the intended ncpu value is reapplied, and torch.set_num_threads() is called only when necessary to prevent thread drift. The changes include adding a _resolve_ncpu function, modifying the __init__ method to set default ncpu values for vad_kwargs, punc_kwargs, and spk_kwargs, updating the build_model method to use _resolve_ncpu and conditionally set the number of threads, and adding _store_base_configs and _reset_runtime_configs methods to handle snapshotting and restoring the configurations. These methods are called in __init__, generate, inference, and inference_with_vad to ensure consistent performance across multiple inferences.

Comment on lines +343 to +344
if kwargs is None:
self._reset_runtime_configs()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

It seems like the condition if kwargs is None: is redundant here, as kwargs is assigned self.kwargs if it's initially None on line 345. This check will always be false. Removing this check will simplify the code.

Also, the call to self._reset_runtime_configs() is only done when kwargs is None. This means that if the user provides a kwargs argument, the runtime configs will not be reset, which might lead to unexpected behavior. Consider always calling self._reset_runtime_configs() at the beginning of the function to ensure consistent behavior.

Suggested change
if kwargs is None:
self._reset_runtime_configs()
self._reset_runtime_configs()
kwargs = self.kwargs if kwargs is None else kwargs

Comment on lines +741 to +746
for name, value in base_map.items():
if name == "kwargs":
continue
config = getattr(self, name, None)
if isinstance(config, dict):
config.setdefault("ncpu", ncpu)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The loop iterates through base_map.items(), but only uses the name in the loop body. The value is not used. This might indicate a potential misunderstanding of the intended logic. If the intention is to iterate through the keys of base_map and use them to access attributes of self, then the loop should be changed to iterate through base_map.keys().

Also, the condition if name == "kwargs": continue skips the primary kwargs from having setdefault called on it. This seems inconsistent with the handling of the other kwargs and might lead to unexpected behavior. Consider whether the primary kwargs should also have setdefault called on it.

kwargs["batch_size"] = 1
kwargs["device"] = device

torch.set_num_threads(kwargs.get("ncpu", 4))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Setting the number of threads every time build_model is called might be excessive. Consider setting it only once at the beginning of the program or when the ncpu value actually changes to avoid unnecessary overhead.

Also, consider logging when the number of threads is actually changed for debugging purposes.

Comment on lines +209 to +210
if torch.get_num_threads() != ncpu:
torch.set_num_threads(ncpu)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This check is good to prevent unnecessary calls to torch.set_num_threads(). However, it might be beneficial to add a log message here to indicate when the number of threads is being changed, which can help in debugging and monitoring.

Consider adding a log message like logging.info(f"Setting number of threads to {ncpu}").

Comment on lines +747 to +748
if torch.get_num_threads() != ncpu:
torch.set_num_threads(ncpu)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider adding a log message here to indicate when the number of threads is being reset, which can help in debugging and monitoring. For example: logging.info(f"Resetting number of threads to {ncpu}").

@MotorBottle
Copy link
Author

MotorBottle commented Oct 29, 2025

My test with my fork was successful pip install --no-cache-dir git+https://github.com/MotorBottle/FunASR.git@main

Before Processing a long audio:
image

After processing, unexpected change happened to torch_threads param:
image

Re-run the processing, could see the arg got reapplied from stored value (avoiding the contamination):
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant