You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like, if it is possible, what are the procedures used to filter the original dataset, for example; from YouTube.
Is there any script to you recommended being used for filter and cleanup?
I have used a Kaldi cleanup script /egs/wsj/s5/steps/cleanup/:
A) GMM (clean_and_segment_data.sh - find_badd_utts.sh). "Not worked perfect for me, especially if there are in systematic error in the dataset"
B) NNET (clean_and_segment_data_nnet3.sh - find_badd_utts_nnet3.sh). "It depends on the pretrained model, which is not good in my case"
You mentioned in the paper in section 3 Gigaspeech creation pipeline part 3.2 ,3.3 ,and 3.4 ; the step to take that. But I would like to know if you used different script than Kaldi, or what had been modified to the original script "cleanup"from Kaldi ?
Thanks in advance, I really appreciate any support.
The text was updated successfully, but these errors were encountered:
The pipeline was developed based on existing Kaldi scripts as you mentioned above, but with a lot of bug fixes and ad-hoc modifications. However we have no near plan to open source these tools, coz it may require non-trivial efforts to clean up & generalize the code.
I would like, if it is possible, what are the procedures used to filter the original dataset, for example; from YouTube.
Is there any script to you recommended being used for filter and cleanup?
I have used a Kaldi cleanup script /egs/wsj/s5/steps/cleanup/:
A) GMM (clean_and_segment_data.sh - find_badd_utts.sh). "Not worked perfect for me, especially if there are in systematic error in the dataset"
B) NNET (clean_and_segment_data_nnet3.sh - find_badd_utts_nnet3.sh). "It depends on the pretrained model, which is not good in my case"
You mentioned in the paper in section 3 Gigaspeech creation pipeline part 3.2 ,3.3 ,and 3.4 ; the step to take that. But I would like to know if you used different script than Kaldi, or what had been modified to the original script "cleanup"from Kaldi ?
Thanks in advance, I really appreciate any support.
The text was updated successfully, but these errors were encountered: