Open
Conversation
Owner
|
你用的啥模型?我之前修过这个,现在用火山的话基本话一说完直接按结束就行。 |
Author
|
我用的火山的,我一直会丢最后两三个字,体验比较差。 |
Author
|
可能因为我说完那一瞬间,我就按了停止键了。 |
Owner
|
怪了,明天查查,我就是说完马上按,很爽 |
Owner
|
你是流式识别和 LLM 处理都有这个问题吗?按现在的逻辑看,应该只有 LLM 处理会有问题,流式识别是会等一下完整请求返回的。 |
Author
|
我是用在语音识别之后,调用 LLM 那种润色模式。因为我喜欢 LLM 把我的那些嗯嗯啊啊去掉 |
Owner
|
那说得过去了,已经改掉了 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
背景
当前语音输入在“说完最后一个字立即按快捷键停止”时,最后一两个字容易丢失。
实际体验上,这部分内容通常已经说出口,甚至已经进入音频链路,但因为 stop 时机过于激进、ASR/LLM 收尾路径过早截断,导致尾部文本没有进入
最终结果。
复现方法
当我们按下快捷键时可以开始输入,但说完话后立即按下快捷键会发现最后两个字被截断无法输入。
修复内容
本 PR 主要从三个层面修复录音结束时的尾字截断问题:
1. 修复音频采集 stop 时尾包丢失
在
AudioCaptureEngine.stop()中,先同步排空AVCapture的 delegate 队列,再执行flushRemaining()。这样可以避免:
2. 调整 stop 语义,给尾音和最终识别留出收敛窗口
在
RecognitionSession.stopRecording()中:350ms)这样可以避免:
3. 移除流式 ASR + LLM 模式下的“快速断开”快路径
此前在流式引擎 + LLM 模式下,如果已经拿到一版 early text,会直接跳过最终 ASR finalize 并断开连接。
这会导致 stop 后才补出来的最后几个字根本没有机会进入最终 transcript。
本 PR 改为:
endAudio/finalize这样能保证正确性优先,不再因为快路径牺牲尾字完整性。
测试
新增回归测试,验证 stop 过程中:
endAudio()不能早于最后一块音频发送完成已验证:
swift test --filter RecognitionSessionTests影响
预期修复以下场景:
说明
这次修改优先保证 stop 时的识别完整性,接受极小的收尾等待成本,以换取明显更稳定的尾字保留效果。