langgz commited on
Commit
db69bd7
·
1 Parent(s): eeb66f4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -11,7 +11,7 @@ tags:
11
 
12
  Voice activity detection (VAD) plays a important role in speech recognition systems by detecting the beginning and end of effective speech. FunASR provides an efficient VAD model based on the [FSMN structure](https://arxiv.org/abs/1803.05030). To improve model discrimination, we use monophones as modeling units, given the relatively rich speech information. During inference, the VAD system requires post-processing for improved robustness, including operations such as threshold settings and sliding windows.
13
 
14
- This repository demonstrates how to leverage FSMN-VAD in conjunction with the funasr_onnx runtime. The underlying model is derived from [FunASR](https://github.com/alibaba-damo-academy/FunASR), which was trained on a massive 60,000-hour Mandarin dataset. Notably, Paraformer's performance secured the top spot on the [SpeechIO leaderboard](https://github.com/SpeechColab/Leaderboard), highlighting its exceptional capabilities in speech recognition.
15
 
16
  We have relesed numerous industrial-grade models, including speech recognition, voice activity detection, punctuation restoration, speaker verification, speaker diarization, and timestamp prediction (force alignment). To learn more about these models, kindly refer to the [documentation](https://alibaba-damo-academy.github.io/FunASR/en/index.html) available on FunASR. If you are interested in leveraging advanced AI technology for your speech-related projects, we invite you to explore the possibilities offered by [FunASR](https://github.com/alibaba-damo-academy/FunASR).
17
 
 
11
 
12
  Voice activity detection (VAD) plays a important role in speech recognition systems by detecting the beginning and end of effective speech. FunASR provides an efficient VAD model based on the [FSMN structure](https://arxiv.org/abs/1803.05030). To improve model discrimination, we use monophones as modeling units, given the relatively rich speech information. During inference, the VAD system requires post-processing for improved robustness, including operations such as threshold settings and sliding windows.
13
 
14
+ This repository demonstrates how to leverage FSMN-VAD in conjunction with the funasr_onnx runtime. The underlying model is derived from [FunASR](https://github.com/alibaba-damo-academy/FunASR), which was trained on a massive 5,000-hour dataset.
15
 
16
  We have relesed numerous industrial-grade models, including speech recognition, voice activity detection, punctuation restoration, speaker verification, speaker diarization, and timestamp prediction (force alignment). To learn more about these models, kindly refer to the [documentation](https://alibaba-damo-academy.github.io/FunASR/en/index.html) available on FunASR. If you are interested in leveraging advanced AI technology for your speech-related projects, we invite you to explore the possibilities offered by [FunASR](https://github.com/alibaba-damo-academy/FunASR).
17