-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Work around FunASR kwargs state leaks #2698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @MotorBottle, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a performance degradation issue in FunASR's Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses a performance degradation issue in FunASR's GPU inferencing for long audio files. The root cause was identified as state leaks in the kwargs of the AutoModel, where runtime configurations were being mutated during long inferences and not reset, leading to slower subsequent requests. The fix involves snapshotting the *_kwargs after the AutoModel is built and restoring them before each inference. Additionally, the intended ncpu value is reapplied, and torch.set_num_threads() is called only when necessary to prevent thread drift. The changes include adding a _resolve_ncpu function, modifying the __init__ method to set default ncpu values for vad_kwargs, punc_kwargs, and spk_kwargs, updating the build_model method to use _resolve_ncpu and conditionally set the number of threads, and adding _store_base_configs and _reset_runtime_configs methods to handle snapshotting and restoring the configurations. These methods are called in __init__, generate, inference, and inference_with_vad to ensure consistent performance across multiple inferences.
| if kwargs is None: | ||
| self._reset_runtime_configs() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the condition if kwargs is None: is redundant here, as kwargs is assigned self.kwargs if it's initially None on line 345. This check will always be false. Removing this check will simplify the code.
Also, the call to self._reset_runtime_configs() is only done when kwargs is None. This means that if the user provides a kwargs argument, the runtime configs will not be reset, which might lead to unexpected behavior. Consider always calling self._reset_runtime_configs() at the beginning of the function to ensure consistent behavior.
| if kwargs is None: | |
| self._reset_runtime_configs() | |
| self._reset_runtime_configs() | |
| kwargs = self.kwargs if kwargs is None else kwargs |
| for name, value in base_map.items(): | ||
| if name == "kwargs": | ||
| continue | ||
| config = getattr(self, name, None) | ||
| if isinstance(config, dict): | ||
| config.setdefault("ncpu", ncpu) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The loop iterates through base_map.items(), but only uses the name in the loop body. The value is not used. This might indicate a potential misunderstanding of the intended logic. If the intention is to iterate through the keys of base_map and use them to access attributes of self, then the loop should be changed to iterate through base_map.keys().
Also, the condition if name == "kwargs": continue skips the primary kwargs from having setdefault called on it. This seems inconsistent with the handling of the other kwargs and might lead to unexpected behavior. Consider whether the primary kwargs should also have setdefault called on it.
| kwargs["batch_size"] = 1 | ||
| kwargs["device"] = device | ||
|
|
||
| torch.set_num_threads(kwargs.get("ncpu", 4)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting the number of threads every time build_model is called might be excessive. Consider setting it only once at the beginning of the program or when the ncpu value actually changes to avoid unnecessary overhead.
Also, consider logging when the number of threads is actually changed for debugging purposes.
| if torch.get_num_threads() != ncpu: | ||
| torch.set_num_threads(ncpu) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is good to prevent unnecessary calls to torch.set_num_threads(). However, it might be beneficial to add a log message here to indicate when the number of threads is being changed, which can help in debugging and monitoring.
Consider adding a log message like logging.info(f"Setting number of threads to {ncpu}").
| if torch.get_num_threads() != ncpu: | ||
| torch.set_num_threads(ncpu) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.



Long-Audio Slowdown in FunASR GPU Inferencing (Root cause: kwargs state leaks)
What I Observed
cuda:0throughout (so it was not backend issue); the slowdown persists until the process is restarted.Root Cause
kwargs,vad_kwargs,punc_kwargs,spk_kwargs, etc.) in mutable dictionaries.torch_threadsgrows from the default 4 to the host’s 72 threads on my server, slowing down inference). FunASR never resets them, so the next request inherits the “dirty” state and slows down.Fix
*_kwargsright afterAutoModelbuilds its modules and restore that baseline before each inference (including VAD, punctuation, diarization).ncpuand only calltorch.set_num_threads()when needed, preventing thread drift.FunASR 长音频GPU推理降速问题(根本原因:AutoModel 在初始时把所有运行配置放在同一个全局 kwargs 字典里,多模型推理时,这个字典会被内部逻辑实时修改,例如调整 batch_size、ncpu 等参数,推理结束后不会恢复原值)
观察现象
cuda:0说明不是推理设备问题,但性能劣化会一直持续,除非重启进程。根本原因
AutoModel将运行时配置(kwargs、vad_kwargs、punc_kwargs、spk_kwargs等)保存为可变字典。ncpu默认是 4, 但同时运行的内部逻辑修改torch_threads,推理结束后torch_threads变为72)。由于 FunASR 不会恢复默认配置,下一次请求就会沿用污染过的状态,导致速度下降。解决方案
AutoModel构建完所有模块后,立即对每个*_kwargs做快照,并在每次推理前恢复这份基准配置(包括 VAD、标点、说话人识别模块)。ncpu,仅在线程设置发生变化时调用torch.set_num_threads(),防止线程数漂移。