INTERSPEECH 2023 Papers: A complete collection of influential and exciting research papers from the INTERSPEECH 2023 conference. Explore the latest advances in speech and language processing. Code included. ⭐ the repository to support the advancement of speech technology!
Contributions to improve the completeness of this list are greatly appreciated. If you come across any overlooked papers, please feel free to create pull requests, open issues or contact me via email. Your participation is crucial to making this repository even better.
NOTE: Final paper links will be added post-conference.
# | Title | Repo | Paper |
---|---|---|---|
1781 | Chinese EFL Learners' Perception of English Prosodic Focus | ➖ | ➖ |
315 | Pitch Accent Variation and the Interpretation of Rising and Falling Intonation in American English | ➖ | ➖ |
1033 | Tonal Coarticulation as a Cue for Upcoming Prosodic Boundary | ➖ | ➖ |
2116 | Alignment of Beat Gestures and Prosodic Prominence in German | ➖ | ➖ |
1454 | Creak Prevalence and Prosodic Context in Australian English | ➖ | ➖ |
1651 | Speech Reduction: Position within French Prosodic Structure | ➖ | ➖ |
# | Title | Repo | Paper |
---|---|---|---|
1436 | DeePMOS: Deep Posterior Mean-Opinion-Score of Speech | ➖ | ➖ |
1644 | The Role of Formant and Excitation Source Features in Perceived Naturalness of Low Resource Tribal Language TTS: An Empirical Study | ➖ | ➖ |
811 | A No-reference Speech Quality Assessment Method based on Neural Network with Densely Connected Convolutional Architecture | ➖ | ➖ |
2507 | Probing Speech Quality Information in ASR Systems | ➖ | ➖ |
589 | Preference-based Training Framework for Automatic Speech Quality Assessment using Deep Neural Network | ➖ | ➖ |
389 | Crowdsourced Data Validation for ASR Training | ➖ | ➖ |
# | Title | Repo | Paper |
---|---|---|---|
1846 | Phonemic Competition in End-to-end ASR models | ➖ | ➖ |
443 | Automatic Speaker Recognition with Variation Across Vocal Conditions: a Controlled Experiment with Implications for Forensics | ➖ | ➖ |
1398 | Exploring Graph Theory Methods for the Analysis of Pronunciation Variation in Spontaneous Speech | ➖ | ➖ |
680 | Automatic Speaker Recognition Performance with Matched and Mismatched Female Bilingual Speech Data | ➖ | ➖ |
Spoken Language Processing: Translation, Information Retrieval, Summarization, Resources, and Evaluation
# | Title | Repo | Paper |
---|---|---|---|
1922 | A Neural Architecture for Selective Attention to Speech Features | ➖ | ➖ |
1122 | Quantifying Informational Masking due to Masker Intelligibility in Same-talker Speech-in-speech Perception | ➖ | ➖ |
1476 | On the Benefits of Self-supervised Learned Speech Representations for Predicting Human Phonetic Misperceptions | ➖ | ➖ |
2154 | Predicting Perceptual Centers Located at Vowel Onset in German Speech using Long Short-Term Memory Networks | ➖ | ➖ |
63 | Exploring the Mutual Intelligibility Breakdown Caused by Sculpting Speech from a Competing Speech Signal | ➖ | ➖ |
2103 | Perception of Incomplete Voicing Neutralization of Obstruents in Tohoku Japanese | ➖ | ➖ |
# | Title | Repo | Paper |
---|---|---|---|
1879 | The Emergence of Obstruent-intrinsic f0 and VOT as Cues to the Fortis/Lenis Contrast in West Central Bavarian | ➖ | ➖ |
431 | 〈'〉 in Tsimane': a Preliminary Investigation | ➖ | ➖ |
2200 | Segmental Features of Brazilian (Santa Catarina) Hunsrik | ➖ | ➖ |
2337 | Opening or closing? An Electroglottographic Analysis of Voiceless Coda Consonants in Australian English | ➖ | ➖ |
295 | Increasing Aspiration of Word-medial Fortis Plosives in Swiss Standard German | ➖ | ➖ |
1456 | Lexical Stress and Velar Palatalization in Italian: A Spatio-temporal Interaction | ➖ | ➖ |
# | Title | Repo | Paper |
---|---|---|---|
1832 | LanSER: Language-Model Supported Speech Emotion Recognition | ➖ | ➖ |
463 | Fine-tuned RoBERTa Model with a CNN-LSTM Network for Conversational Emotion Recognition | ➖ | ➖ |
1591 | Emotion Label Encoding using Word Embeddings for Speech Emotion Recognition | ➖ | ➖ |
2444 | Discrimination of the Different Intents Carried by the Same Text Through Integrating Multimodal Information | ➖ | ➖ |
510 | Meta-domain Adversarial Contrastive Learning for Alleviating Individual Bias in Self-sentiment Predictions | ➖ | ➖ |
413 | SWRR: Feature Map Classifier Based on Sliding Window Attention and High-Response Feature Reuse for Multimodal Emotion Recognition | ➖ | ➖ |
# | Title | Repo | Paper |
---|---|---|---|
206 | Aberystwyth English Pre-aspiration in Apparent Time | ➖ | ➖ |
1154 | Speech Entrainment in Chinese Story-Style Talk Shows: The Interaction Between Gender and Role | ➖ | ➖ |
1414 | Sociodemographic and Attitudinal Effects on Dialect Speakers' Articulation of the Standard Language: Evidence from German-Speaking Switzerland | ➖ | ➖ |
1704 | Vowel Normalisation in Latent Space for Sociolinguistics | ➖ | ➖ |
MERLIon CCS Challenge: Multilingual Everyday Recordings - Language Identification On Code-Switched Child-Directed Speech
# | Title | Repo | Paper |
---|---|---|---|
2038 | Classification of Vocal Intensity Category from Speech using the Wav2vec2 and Whisper Embeddings | ➖ | ➖ |
1668 | The Effect of Clinical Intervention on the Speech of Individuals with PTSD: Features and Recognition Performances | ➖ | ➖ |
470 | Analysis and Automatic Prediction of Exertion from Speech: Contrasting Objective and Subjective Measures Collected while Running | ➖ | ➖ |
894 | The Androids Corpus: A New Publicly Available Benchmark for Speech Based Depression Detection | ➖ | ➖ |
658 | Comparing Hand-Crafted Features to Spectrograms for Autism Severity Estimation | ➖ | ➖ |
839 | Acoustic Characteristics of Depression in Older Adults' Speech: the Role of Covariates | ➖ | ➖ |
# | Title | Repo | Paper |
---|---|---|---|
1212 | Parameter-Efficient Learning for Text-to-Speech Accent Adaptation | |