-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproducing results for multispeaker model #40
Comments
In the article, it is stated that you set pool_size and stride parameters to 2. Is it a mistake and they should be equal to 8? |
What SNR and LSD are you seeing? And the pool_size and stride should be 2. Where do you see 8? |
Here logs of a few last epochs: Epoch 46 of 50 took 509437.803s (1067 minibatches)
I saw it in this issue #27 (comment) . |
Looks like you are reproducing the paper's results. We reported an SNR of 15.0 and an LSD of 2.7 on the multispeaker task with an upsampling ratio of 4. Multispeaker with r=4 is a hard task. And that's a typo on my part. Thanks for flagging it. I've updated my answer. |
I also thought maybe there is a problem with my inference command. Could you check it? I am afraid to report lower results for your method than actual ones. Because it seems that samples outlined on this page https://anonymousqwerty.github.io/audio-sr/ are way better than what I get. From your experience do you think it is possible that your model may work as I showed before on some samples? |
I don't see anything wrong with your inference command. And performance can vary a lot depending on the sample. |
Ok, thank you very much for the help! |
Hi!
I am trying to reproduce the results of the paper.
For data preparation I use the following commands:
For training I use the following command:
python3 run.py train --train ../data/vctk/multispeaker/vctk-multispeaker-train.4.16000.8192.8192.h5 --val ../data/vctk/multispeaker/vctk-multispeaker-val.4.16000.8192.8192.h5.tmp -e 50 --batch-size 64 --lr 3e-4 --logname multispeaker --model audiotfilm --r 4 --layers 4 --piano false --speaker multi --pool_size 2 --strides 2 --full true
After training I use the following command to infer the model:
python run.py eval --logname ./model.ckpt-53351 --out-label mul-out --wav-file-list ./test_files.txt --r 4 --pool_size 2 --strides 2 --model audiotfilm --speaker multi
I got a poor performance of your model, spectrograms look like this (predicted is above, gt is below):
I doubt that this behavior is expected, could you please give me a hint where I may go wrong or better provide the model checkpoint for the multi-speaker model?
Thank you in advance!
The text was updated successfully, but these errors were encountered: