Awesome Singing Voice Synthesis and Singing Voice Conversion

A paper and project list about the cutting edge Speech Synthesis, Text-to-Speech (TTS), Singing Voice Synthesis (SVS), Voice Conversion (VC), Singing Voice Conversion (SVC), and related interesting works.

Welcome to PR or contact me via email ([email protected]) for updating papers and works.

Journals

IEEE/ACM TASLP, IEEE JSTSP, JSLHR, IEEE TPAMI

Conferences

NeuraIPS, ICLR, ICML, IJAI, AAAI, ACL, NAACL, EMNLP, ISMIR, ICASSP, INTERSPEECH, ACM MM, ICME

Workshops

ASRU, SLT

Singing Voice Conversion (Other Key Words: SVC, Singing Style Transfer)

Learn2Sing 2.0: Diffusion and Mutual Information-Based Target Speaker SVS by Learning from Singing Teacher | INTERSPEECH 2022 | ✔️Code | 🎧Demo
A Hierarchical Speaker Representation Framework for One-shot Singing Voice Conversion | INTERSPEECH 2022 | 🎧Demo
Controllable and Interpretable Singing Voice Decomposition via Assem-VC | NeurIPS 2021 Workshop | 🎧Demo
DiffSVC: A Diffusion Probabilistic Model for Singing Voice Conversion | ASRU 2021 | 🎧Demo
FastSVC: Fast Cross-Domain Singing Voice Conversion with Feature-wise Linear Modulation | ICME 2021 | 🎧Demo
Unsupervised WaveNet-based Singing Voice Conversion Using Pitch Augmentation and Two-phase Approach | 2021 | ✔️Code | 🎧Demo
Towards High-fidelity Singing Voice Conversion with Acoustic Reference and Contrastive Predictive Coding | 2021 | 🎧Demo
Zero-shot Singing Voice Conversion | ISMIR 2020 | 🎧Demo
PitchNet: Unsupervised Singing Voice Conversion with Pitch Adversarial Network | ICASSP 2020 | 🎧Demo
DurIAN-SC: Duration Informed Attention Network based Singing Voice Conversion System | INTERSPEECH 2020 | 🎧Demo
Unsupervised Cross-Domain Singing Voice Conversion | INTERSPEECH 2020 | 🎧Demo
VAW-GAN for Singing Voice Conversion with Non-parallel Training Data | APSIPA 2020 | ✔️Code | 🎧Demo
Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via Adversarial Training | 2020 | 🎧Demo | Unofficial Code

Dateset

Singing Technique Conversion

Zero-shot Singing Technique Conversion | CMMR 2021

Voice Conversion (Other Key Words: VC, Voice Cloning, Voice Style Transfer)

End-to-End Zero-Shot Voice Style Transfer with Location-Variable Convolutions | 2022 | 🎧Demo
A Comparative Study of Self-supervised Speech Representation Based Voice Conversion | IEEE JSTSP 2022
Diffusion-Based Voice Conversion with Fast Maximum Likelihood Sampling Scheme | ICLR 2022 | ✔️Code | 🎧Demo
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone | ICML 2022 | ✔️Code | 🎧Demo | 🎧Demo| 📝Blog
S3PRL-VC: Open-Source Voice Conversion Framework with Self-Supervised Speech Representations | ICASSP 2022 | ✔️Code
Assem-VC: Realistic Voice Conversion by Assembling Modern Speech Synthesis Techniques | ICASSP 2022 | ✔️Code | 🎧Demo
NVC-Net: End-to-End Adversarial Voice Conversion | ICASSP 2022 | ✔️Code | 🎧Demo
Training Robust Zero-Shot Voice Conversion Models with Self-supervised Features | ICASSP 2022 | 🎧Demo
Toward Degradation-Robust Voice Conversion | ICASSP 2022
DGC-vector: A new speaker embedding for zero-shot voice conversion | ICASSP 2022 | 🎧Demo
Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers | INTERSPEECH 2022 | 🎧Demo
Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion | INTERSPEECH 2022 | 🎧Demo
Any-to-Many Voice Conversion with Location-Relative Sequence-to-Sequence Modeling | IEEE/ACM TASLP 2021 | ✔️Code | 🎧Demo
Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations | NeurIPS 2021 | 🎧Demo | Unofficial Code
Improving Zero-shot Voice Style Transfer via Disentangled Representation Learning | ICLR 2021
Global Rhythm Style Transfer Without Text Transcriptions | ICML 2021 | ✔️Code
AGAIN-VC: A One-shot Voice Conversion using Activation Guidance and Adaptive Instance Normalization | ICASSP 2021 | ✔️Code | 🎧Demo
StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion | INTERSPEECH 2021 Best Paper Award | ✔️Code | 🎧Demo
S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations | INTERSPEECH 2021 | ✔️Code | 🎧Demo
Many-to-Many Voice Conversion based Feature Disentanglement using Variational Autoencoder | INTERSPEECH 2021 | ✔️Code | 🎧Demo
Speech Resynthesis from Discrete Disentangled Self-Supervised Representations | INTERSPEECH 2021 | 🎧Demo
On Prosody Modeling for ASR+TTS based Voice Conversion | ASRU 2021 | 🎧Demo
MediumVC: Any-to-any voice conversion using synthetic specific-speaker speeches as intermedium features | 2021 | ✔️Code | 🎧Demo
An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep Learning | IEEE/ACM TASLP 2020
Unsupervised Speech Decomposition via Triple Information Bottleneck | ICML 2020 | ✔️Code
AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss | ICML 2019 | ✔️Code | 🎧Demo
One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization | INTERSPEECH 2019 | ✔️Code

Emotional Voice Conversion

A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion | ICASSP 2022 | ✔️Code | 🎧Demo
Disentanglement of Emotional Style and Speaker Identity for Expressive Voice Conversion | INTERSPEECH 2022 | 🎧Demo
Cross-speaker Emotion Transfer Based On Prosody Compensation for End-to-End Speech Synthesis | INTERSPEECH 2022 | 🎧Demo
Emotion Intensity and its Control for Emotional Voice Conversion | IEEE Transactions on Affective Computing | ✔️Code | 🎧Demo
Limited Data Emotional Voice Conversion Leveraging Text-to-Speech: Two-stage Sequence-to-Sequence Training | INTERSPEECH 2021 | ✔️Code | 🎧Demo
Textless Speech Emotion Conversion using Discrete and Decomposed Representations | 2021 | 🎧Demo
Converting Anyone's Emotion: Towards Speaker-Independent Emotional Voice Conversion | INTERSPEECH 2020 | ✔️Code | 🎧Demo
Transforming Spectrum and Prosody for Emotional Voice Conversion with Non-Parallel Training Data | Odyssey 2020 | ✔️Code | 🎧Demo

Singing Voice Synthesis (Other Key Words: SVS)

WeSinger 2: Fully Parallel Singing Voice Synthesis via Multi-Singer Conditional Adversarial Training | 2022 | 🎧Demo
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism | AAAI 2022 | ✔️Code | 🎧Demo
Learning the Beauty in Songs: Neural Singing Voice Beautifier | ACL 2022 | ✔️Code | 🎧Demo
Muskits: an End-to-End Music Processing Toolkit for Singing Voice Synthesis | INTERSPEECH 2022 | ✔️Code
SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy | INTERSPEECH 2022 | ✔️Code
WeSinger: Data-augmented Singing Voice Synthesis with Auxiliary Losses | INTERSPEECH 2022 | 🎧Demo
Sinsy: A Deep Neural Network-Based Singing Voice Synthesis System | IEEE/ACM TASLP 2021 | ✔️Code

Dateset

M4Singer: a Multi-Style, Multi-Singer and Musical Score Provided Mandarin Singing Corpus | NeurIPS 2022 | 🔽Apply&Download | 🎧Demo

PopCS | AAAI 2022 | 🔽Apply&Download
Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for Singing Voice Synthesis | INTERSPEECH 2022 | 🔽Apply&Download

High-Quality Speech Synthesis (Other Key Words: Text-to-Speech, TTS)

BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis | ICLR 2022 | ✔️Code | 🎧Demo
FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis | IJCAI 2022 | ✔️Code | 🎧Demo

Prosody-Aware

Text-Free Prosody-Aware Generative Spoken Language Modeling | ACL 2022 | ✔️Code | 🎧Demo

Vocoder

Towards achieving robust universal neural vocoding | INTERSPEECH 2019 | ✔️Code 🎧Demo Unofficial Code

Speech Insertion

RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion | INTERSPEECH 2022 | 🎧Demo

Adversarial Attack

Defending Your Voice: Adversarial Attack on Voice Conversion | SLT 2021 | ✔️Code | 🎧Demo

Speech Data Augmentation

Data Augmenting Contrastive Learning of Speech Representations in the Time Domain | SLT 2021 | ✔️Code

Name		Name	Last commit message	Last commit date
Latest commit History 116 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Singing Voice Synthesis and Singing Voice Conversion

Journals

Conferences

Workshops

Singing Voice Conversion (Other Key Words: SVC, Singing Style Transfer)

Dateset

Singing Technique Conversion

Voice Conversion (Other Key Words: VC, Voice Cloning, Voice Style Transfer)

Emotional Voice Conversion

Singing Voice Synthesis (Other Key Words: SVS)

Dateset

High-Quality Speech Synthesis (Other Key Words: Text-to-Speech, TTS)

Prosody-Aware

Vocoder

Speech Insertion

Adversarial Attack

Speech Data Augmentation

Self-supervised/Unsupervised ASR

ASR Toolkits

TTS Toolkits

Other Frameworks and Toolkits

Competitions

References

About

Releases

Packages

mengjie-du/Awesome-Singing-Voice-Synthesis-and-Singing-Voice-Conversion

Folders and files

Latest commit

History

Repository files navigation

Awesome Singing Voice Synthesis and Singing Voice Conversion

Journals

Conferences

Workshops

Singing Voice Conversion (Other Key Words: SVC, Singing Style Transfer)

Dateset

Singing Technique Conversion

Voice Conversion (Other Key Words: VC, Voice Cloning, Voice Style Transfer)

Emotional Voice Conversion

Singing Voice Synthesis (Other Key Words: SVS)

Dateset

High-Quality Speech Synthesis (Other Key Words: Text-to-Speech, TTS)

Prosody-Aware

Vocoder

Speech Insertion

Adversarial Attack

Speech Data Augmentation

Self-supervised/Unsupervised ASR

ASR Toolkits

TTS Toolkits

Other Frameworks and Toolkits

Competitions

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages