A paper and project list about the cutting edge Speech Synthesis, Text-to-Speech (TTS), Singing Voice Synthesis (SVS), Voice Conversion (VC), Singing Voice Conversion (SVC), and related interesting works.
Welcome to PR or contact me via email ([email protected]) for updating papers and works.
IEEE/ACM TASLP, IEEE JSTSP, JSLHR, IEEE TPAMI
NeuraIPS, ICLR, ICML, IJAI, AAAI, ACL, NAACL, EMNLP, ISMIR, ICASSP, INTERSPEECH, ACM MM, ICME
ASRU, SLT
-
Learn2Sing 2.0: Diffusion and Mutual Information-Based Target Speaker SVS by Learning from Singing Teacher | INTERSPEECH 2022 | ✔️Code | 🎧Demo
-
A Hierarchical Speaker Representation Framework for One-shot Singing Voice Conversion | INTERSPEECH 2022 | 🎧Demo
-
Controllable and Interpretable Singing Voice Decomposition via Assem-VC | NeurIPS 2021 Workshop | 🎧Demo
-
DiffSVC: A Diffusion Probabilistic Model for Singing Voice Conversion | ASRU 2021 | 🎧Demo
-
FastSVC: Fast Cross-Domain Singing Voice Conversion with Feature-wise Linear Modulation | ICME 2021 | 🎧Demo
-
Unsupervised WaveNet-based Singing Voice Conversion Using Pitch Augmentation and Two-phase Approach | 2021 | ✔️Code | 🎧Demo
-
Towards High-fidelity Singing Voice Conversion with Acoustic Reference and Contrastive Predictive Coding | 2021 | 🎧Demo
-
Zero-shot Singing Voice Conversion | ISMIR 2020 | 🎧Demo
-
PitchNet: Unsupervised Singing Voice Conversion with Pitch Adversarial Network | ICASSP 2020 | 🎧Demo
-
DurIAN-SC: Duration Informed Attention Network based Singing Voice Conversion System | INTERSPEECH 2020 | 🎧Demo
-
Unsupervised Cross-Domain Singing Voice Conversion | INTERSPEECH 2020 | 🎧Demo
-
VAW-GAN for Singing Voice Conversion with Non-parallel Training Data | APSIPA 2020 | ✔️Code | 🎧Demo
-
Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via Adversarial Training | 2020 | 🎧Demo | Unofficial Code
- Zero-shot Singing Technique Conversion | CMMR 2021
-
End-to-End Zero-Shot Voice Style Transfer with Location-Variable Convolutions | 2022 | 🎧Demo
-
A Comparative Study of Self-supervised Speech Representation Based Voice Conversion | IEEE JSTSP 2022
-
Diffusion-Based Voice Conversion with Fast Maximum Likelihood Sampling Scheme | ICLR 2022 | ✔️Code | 🎧Demo
-
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone | ICML 2022 | ✔️Code | 🎧Demo | 🎧Demo| 📝Blog
-
S3PRL-VC: Open-Source Voice Conversion Framework with Self-Supervised Speech Representations | ICASSP 2022 | ✔️Code
-
Assem-VC: Realistic Voice Conversion by Assembling Modern Speech Synthesis Techniques | ICASSP 2022 | ✔️Code | 🎧Demo
-
NVC-Net: End-to-End Adversarial Voice Conversion | ICASSP 2022 | ✔️Code | 🎧Demo
-
Training Robust Zero-Shot Voice Conversion Models with Self-supervised Features | ICASSP 2022 | 🎧Demo
-
Toward Degradation-Robust Voice Conversion | ICASSP 2022
-
DGC-vector: A new speaker embedding for zero-shot voice conversion | ICASSP 2022 | 🎧Demo
-
Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers | INTERSPEECH 2022 | 🎧Demo
-
Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion | INTERSPEECH 2022 | 🎧Demo
-
Any-to-Many Voice Conversion with Location-Relative Sequence-to-Sequence Modeling | IEEE/ACM TASLP 2021 | ✔️Code | 🎧Demo
-
Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations | NeurIPS 2021 | 🎧Demo | Unofficial Code
-
Improving Zero-shot Voice Style Transfer via Disentangled Representation Learning | ICLR 2021
-
Global Rhythm Style Transfer Without Text Transcriptions | ICML 2021 | ✔️Code
-
AGAIN-VC: A One-shot Voice Conversion using Activation Guidance and Adaptive Instance Normalization | ICASSP 2021 | ✔️Code | 🎧Demo
-
StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion | INTERSPEECH 2021 Best Paper Award | ✔️Code | 🎧Demo
-
S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations | INTERSPEECH 2021 | ✔️Code | 🎧Demo
-
Many-to-Many Voice Conversion based Feature Disentanglement using Variational Autoencoder | INTERSPEECH 2021 | ✔️Code | 🎧Demo
-
Speech Resynthesis from Discrete Disentangled Self-Supervised Representations | INTERSPEECH 2021 | 🎧Demo
-
On Prosody Modeling for ASR+TTS based Voice Conversion | ASRU 2021 | 🎧Demo
-
MediumVC: Any-to-any voice conversion using synthetic specific-speaker speeches as intermedium features | 2021 | ✔️Code | 🎧Demo
-
An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep Learning | IEEE/ACM TASLP 2020
-
Unsupervised Speech Decomposition via Triple Information Bottleneck | ICML 2020 | ✔️Code
-
AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss | ICML 2019 | ✔️Code | 🎧Demo
-
One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization | INTERSPEECH 2019 | ✔️Code
-
A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion | ICASSP 2022 | ✔️Code | 🎧Demo
-
Disentanglement of Emotional Style and Speaker Identity for Expressive Voice Conversion | INTERSPEECH 2022 | 🎧Demo
-
Cross-speaker Emotion Transfer Based On Prosody Compensation for End-to-End Speech Synthesis | INTERSPEECH 2022 | 🎧Demo
-
Emotion Intensity and its Control for Emotional Voice Conversion | IEEE Transactions on Affective Computing | ✔️Code | 🎧Demo
-
Limited Data Emotional Voice Conversion Leveraging Text-to-Speech: Two-stage Sequence-to-Sequence Training | INTERSPEECH 2021 | ✔️Code | 🎧Demo
-
Textless Speech Emotion Conversion using Discrete and Decomposed Representations | 2021 | 🎧Demo
-
Converting Anyone's Emotion: Towards Speaker-Independent Emotional Voice Conversion | INTERSPEECH 2020 | ✔️Code | 🎧Demo
-
Transforming Spectrum and Prosody for Emotional Voice Conversion with Non-Parallel Training Data | Odyssey 2020 | ✔️Code | 🎧Demo
-
WeSinger 2: Fully Parallel Singing Voice Synthesis via Multi-Singer Conditional Adversarial Training | 2022 | 🎧Demo
-
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism | AAAI 2022 | ✔️Code | 🎧Demo
-
Learning the Beauty in Songs: Neural Singing Voice Beautifier | ACL 2022 | ✔️Code | 🎧Demo
-
Muskits: an End-to-End Music Processing Toolkit for Singing Voice Synthesis | INTERSPEECH 2022 | ✔️Code
-
SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy | INTERSPEECH 2022 | ✔️Code
-
WeSinger: Data-augmented Singing Voice Synthesis with Auxiliary Losses | INTERSPEECH 2022 | 🎧Demo
-
Sinsy: A Deep Neural Network-Based Singing Voice Synthesis System | IEEE/ACM TASLP 2021 | ✔️Code
M4Singer: a Multi-Style, Multi-Singer and Musical Score Provided Mandarin Singing Corpus | NeurIPS 2022 | 🔽Apply&Download | 🎧Demo
-
PopCS | AAAI 2022 | 🔽Apply&Download
-
Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for Singing Voice Synthesis | INTERSPEECH 2022 | 🔽Apply&Download
-
BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis | ICLR 2022 | ✔️Code | 🎧Demo
-
FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis | IJCAI 2022 | ✔️Code | 🎧Demo
- Towards achieving robust universal neural vocoding | INTERSPEECH 2019 | ✔️Code 🎧Demo Unofficial Code
- RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion | INTERSPEECH 2022 | 🎧Demo
- Data Augmenting Contrastive Learning of Speech Representations in the Time Domain | SLT 2021 | ✔️Code
- Awesome Speech Recognition Speech Synthesis Papers
- Awesome Voice Conversion Papers Projects
- TTS Papers
- 🐸 TTS papers
- Speech Synthesis Paper
- Papers With Code: Voice Conversion
- Papers With Code: Singing Voice Conversion
- Papers With Code: Singing Voice Synthesis
- Awesome Open Source: Voice Conversion
- ICASSP 2021 Paper List-VC