-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gigaspeech.json里没有audio/podcast/P0081-P0084 #126
Comments
We provide tools ( |
我重新运行了命令,仍然缺那四个文件 |
Which downloading host was it (you should be able to see it from the logs)? I got another person asking about a similar issue. @wwfcnu |
|
When you run the command |
host指定speechocean时会报错 bash utils/download_gigaspeech.sh --host speechocean |
照理说gigaspeech.json这个metadata文件会包含audio/podcast/P0081-P0084这四个文件的信息呀,用md5验证了这个json文件也是没问题的 |
There could be issues with the MagicData server. I'm downloading from tsinghua and see if we have the same issue. In the meanwhile could you try |
please provide more info such as the MD5 of your local gigaspeech.json |
19c777dc296ff3eb714bc677a80620a3 GigaSpeech.json |
And I just confirmed the resources on MagicData host are fine. You should always be able to re-run the download script to fix corrupted download session, and remember to use provided tools to enforce the correctness of your local copy. |
You are right, I was looking at the wrong category. It's possible that we removed all the segments of those few files from a certain version of the meta data due to quality issues but still kept those files because we wanted to keep the raw data as well. @wgb14 @chaisz19 do you still remember the details? |
The audio in podcast's P0081-P0084 belongs to RADIO. The original transcript of RADIO has some problems during processing that some text is missing. |
gigaspeech.json里没有audio/podcast/P0081-P0084这四个文件,但是files.yaml里面是有的,下载完也是缺这四个文件
The text was updated successfully, but these errors were encountered: