-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Range value of arousal, valence, dominance #22
Comments
Yes, databases tend to use different scales for arousal/valence/dominance like 0..5. |
I just created a 10-second, completely silent .wav file to test the model. If i understood correctly, the values should all be 0.5, i.e. completely neutral, right? |
The model was not trained on non-speech input (like silence) and might hence not be able to abstract to other inputs like silence or sound of objects. It is assumed that you always use a voice activity detection (VAD) and pass on only speech to the model. |
I wonder what the range value of arousal, valence, and dominance is. As far as I know, model output is a logit vector size of 3 representing that feature and looks like its values range [0, 1]. I see that you use MSP-Conversation Corpus for fine-tuning. But when I looked at The MSP-Conversation Corpus paper paperlink, they mentioned that
"Notice that the values of the traces are in the range between -100 and 100. The figure shows that extreme values are uncommon. Most of the annotations are concentrated between -40 to 40 for valence, -20 to 50 for arousal, and -20 to 40 for dominance"
Do you guys normalize that feature, or do something related?
The text was updated successfully, but these errors were encountered: