


default search action
20th Interspeech 2019: Graz, Austria
- Gernot Kubin, Zdravko Kacic:
20th Annual Conference of the International Speech Communication Association, Interspeech 2019, Graz, Austria, September 15-19, 2019. ISCA 2019
ISCA Medal 2019 Keynote Speech
- Keiichi Tokuda:
Statistical Approach to Speech Synthesis: Past, Present and Future.
Spoken Language Processing for Children’s Speech
- Fei Wu, Leibny Paola García-Perera, Daniel Povey, Sanjeev Khudanpur:
Advances in Automatic Speech Recognition for Child Speech Using Factored Time Delay Neural Network. 1-5 - Gary Yeung, Abeer Alwan:
A Frequency Normalization Technique for Kindergarten Speech Recognition Inspired by the Role of fo in Vowel Perception. 6-10 - Robert Gale, Liu Chen, Jill Dolata, Jan P. H. van Santen, Meysam Asgari:
Improving ASR Systems for Children with Autism and Language Impairment Using Domain-Focused DNN Transfer Techniques. 11-15 - Manuel Sam Ribeiro, Aciel Eshky, Korin Richmond
, Steve Renals
:
Ultrasound Tongue Imaging for Diarization and Alignment of Child Speech Therapy Sessions. 16-20 - Anastassia Loukina, Beata Beigman Klebanov, Patrick L. Lange, Yao Qian, Binod Gyawali, Nitin Madnani, Abhinav Misra, Klaus Zechner, Zuowei Wang, John Sabatini
:
Automated Estimation of Oral Reading Fluency During Summer Camp e-Book Reading with MyTurnToRead. 21-25 - Vanessa Lopes, João Magalhães, Sofia Cavaco
:
Sustained Vowel Game: A Computer Therapy Game for Children with Dysphonia. 26-30
Dynamics of Emotional Speech Exchanges in Multimodal Communication
- Anna Esposito
, Terry Amorese
, Marialucia Cuciniello, Maria Teresa Riviello, Antonietta Maria Esposito
, Alda Troncone, Gennaro Cordasco
:
The Dependability of Voice on Elders' Acceptance of Humanoid Agents. 31-35 - Oliver Niebuhr
, Uffe Schjoedt:
God as Interlocutor - Real or Imaginary? Prosodic Markers of Dialogue Speech and Expected Efficacy in Spoken Prayer. 36-40 - Michelle Cohn
, Georgia Zellou:
Expressiveness Influences Human Vocal Alignment Toward voice-AI. 41-45 - Catherine Lai, Beatrice Alex
, Johanna D. Moore, Leimin Tian, Tatsuro Hori, Gianpiero Francesca:
Detecting Topic-Oriented Speaker Stance in Conversational Speech. 46-50 - Jilt Sebastian, Piero Pierucci:
Fusion Techniques for Utterance-Level Emotion Recognition Combining Speech and Transcripts. 51-55 - Marvin Rajwadi, Cornelius Glackin, Julie A. Wall
, Gérard Chollet, Nigel Cannings:
Explaining Sentiment Classification. 56-60 - Ricardo Kleinlein, Cristina Luna Jiménez
, Juan Manuel Montero
, Zoraida Callejas, Fernando Fernández Martínez
:
Predicting Group-Level Skin Attention to Short Movies from Audio-Based LSTM-Mixture of Experts Models. 61-65
End-to-End Speech Recognition
- Ralf Schlüter:
Survey Talk: Modeling in Automatic Speech Recognition: Beyond Hidden Markov Models. - Ngoc-Quan Pham, Thai-Son Nguyen, Jan Niehues
, Markus Müller, Alex Waibel:
Very Deep Self-Attention Networks for End-to-End Speech Recognition. 66-70 - Jason Li, Vitaly Lavrukhin, Boris Ginsburg, Ryan Leary, Oleksii Kuchaiev, Jonathan M. Cohen, Huyen Nguyen, Ravi Teja Gadde:
Jasper: An End-to-End Convolutional Neural Acoustic Model. 71-75 - Niko Moritz, Takaaki Hori, Jonathan Le Roux:
Unidirectional Neural Network Architectures for End-to-End Automatic Speech Recognition. 76-80 - Yonatan Belinkov, Ahmed Ali, James R. Glass:
Analyzing Phonetic and Graphemic Representations in End-to-End Automatic Speech Recognition. 81-85
Speech Enhancement: Multi-Channel
- Naohiro Tawara, Tetsunori Kobayashi, Tetsuji Ogawa:
Multi-Channel Speech Enhancement Using Time-Domain Convolutional Denoising Autoencoder. 86-90 - Kristina Tesch
, Robert Rehr, Timo Gerkmann
:
On Nonlinear Spatial Filtering in Multichannel Speech Enhancement. 91-95 - Juan M. Martín-Doñas
, Jens Heitkaemper, Reinhold Haeb-Umbach
, Angel M. Gomez, Antonio M. Peinado:
Multi-Channel Block-Online Source Extraction Based on Utterance Adaptation. 96-100 - Saeed Bagheri, Daniele Giacobello:
Exploiting Multi-Channel Speech Presence Probability in Parametric Multi-Channel Wiener Filter. 101-105 - Masahito Togami, Tatsuya Komatsu:
Variational Bayesian Multi-Channel Speech Dereverberation Under Noisy Environments with Probabilistic Convolutive Transfer Function. 106-110 - Tomohiro Nakatani, Keisuke Kinoshita
:
Simultaneous Denoising and Dereverberation for Low-Latency Applications Using Frame-by-Frame Online Unified Convolutional Beamformer. 111-115
Speech Production: Individual Differences and the Brain
- Cathryn Snyder, Michelle Cohn
, Georgia Zellou:
Individual Variation in Cognitive Processing Style Predicts Differences in Phonetic Imitation of Device and Human Voices. 116-120 - Aravind Illa, Prasanta Kumar Ghosh:
An Investigation on Speaker Specific Articulatory Synthesis with Speaker Independent Articulatory Inversion. 121-125 - Xiaohan Zhang, Chongke Bi, Kiyoshi Honda, Wenhuan Lu, Jianguo Wei
:
Individual Difference of Relative Tongue Size and its Acoustic Effects. 126-130 - Tsukasa Yoshinaga
, Kazunori Nozaki, Shigeo Wada:
Individual Differences of Airflow and Sound Generation in the Vocal Tract of Sibilant /s/. 131-135 - Shashwat Uttam, Yaman Kumar
, Dhruva Sahrawat, Mansi Aggarwal, Rajiv Ratn Shah
, Debanjan Mahata, Amanda Stent:
Hush-Hush Speak: Speech Reconstruction Using Silent Videos. 136-140 - Pramit Saha, Muhammad Abdul-Mageed, Sidney S. Fels
:
SPEAK YOUR MIND! Towards Imagined Speech Recognition with Hierarchical Deep Learning. 141-145
Speech Signal Characterization 1
- Yu-An Chung, Wei-Ning Hsu, Hao Tang, James R. Glass:
An Unsupervised Autoregressive Model for Speech Representation Learning. 146-150 - Feng Huang, Péter Balázs:
Harmonic-Aligned Frame Mask Based on Non-Stationary Gabor Transform with Application to Content-Dependent Speaker Comparison. 151-155 - Gurunath Reddy M., K. Sreenivasa Rao, Partha Pratim Das:
Glottal Closure Instants Detection from Speech Signal by Deep Features Extracted from Raw Speech and Linear Prediction Residual. 156-160 - Santiago Pascual, Mirco Ravanelli
, Joan Serrà, Antonio Bonafonte, Yoshua Bengio:
Learning Problem-Agnostic Speech Representations from Multiple Self-Supervised Tasks. 161-165 - Bhanu Teja Nellore, Sri Harsha Dumpala, Karan Nathwani, Suryakanth V. Gangashetty
:
Excitation Source and Vocal Tract System Based Acoustic Features for Detection of Nasals in Continuous Speech. 166-170 - Aggelina Chatziagapi, Georgios Paraskevopoulos, Dimitris Sgouropoulos, Georgios Pantazopoulos, Malvina Nikandrou, Theodoros Giannakopoulos, Athanasios Katsamanis
, Alexandros Potamianos, Shrikanth Narayanan:
Data Augmentation Using GANs for Speech Emotion Recognition. 171-175
Neural Waveform Generation
- Zvi Kons, Slava Shechtman, Alexander Sorin, Carmel Rabinovitz, Ron Hoory:
High Quality, Lightweight and Adaptable TTS Using LPCNet. 176-180 - Jaime Lorenzo-Trueba, Thomas Drugman, Javier Latorre, Thomas Merritt, Bartosz Putrycz, Roberto Barra-Chicote, Alexis Moinet, Vatsal Aggarwal:
Towards Achieving Robust Universal Neural Vocoding. 181-185 - Paarth Neekhara, Chris Donahue, Miller S. Puckette, Shlomo Dubnov
, Julian J. McAuley
:
Expediting TTS Synthesis with Adversarial Vocoding. 186-190 - Ahmed Mustafa, Arijit Biswas, Christian Bergler, Julia Schottenhamml, Andreas K. Maier:
Analysis by Adversarial Synthesis - A Novel Approach for Speech Vocoding. 191-195 - Yi-Chiao Wu, Tomoki Hayashi, Patrick Lumban Tobing
, Kazuhiro Kobayashi, Tomoki Toda
:
Quasi-Periodic WaveNet Vocoder: A Pitch Dependent Dilated Convolution Model for Parametric Speech Generation. 196-200 - Xiaohai Tian, Eng Siong Chng
, Haizhou Li
:
A Speaker-Dependent WaveNet for Voice Conversion with Non-Parallel Data. 201-205
Attention Mechanism for Speaker State Recognition
- Kyu Jeong Han, Ramon Prieto, Tao Ma:
Survey Talk: When Attention Meets Speech Applications: Speech & Speaker Recognition Perspective. - Ziping Zhao, Zhongtian Bao, Zixing Zhang, Nicholas Cummins
, Haishuai Wang, Björn W. Schuller
:
Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition. 206-210 - Jeng-Lin Li, Chi-Chun Lee
:
Attentive to Individual: A Multimodal Emotion Recognition Network with Personalized Attention Profile. 211-215 - Ascensión Gallardo-Antolín
, Juan Manuel Montero
:
A Saliency-Based Attention LSTM Model for Cognitive Load Classification from Speech. 216-220 - Adria Mallol-Ragolta, Ziping Zhao, Lukas Stappen, Nicholas Cummins
, Björn W. Schuller
:
A Hierarchical Attention Network-Based Approach for Depression Detection from Transcribed Clinical Interviews. 221-225
ASR Neural Network Training — 1
- Andrea Carmantini, Peter Bell, Steve Renals
:
Untranscribed Web Audio for Low Resource Speech Recognition. 226-230 - Christoph Lüscher, Eugen Beck, Kazuki Irie, Markus Kitza, Wilfried Michel, Albert Zeyer, Ralf Schlüter
, Hermann Ney:
RWTH ASR Systems for LibriSpeech: Hybrid vs Attention. 231-235 - Naoyuki Kanda, Shota Horiguchi, Ryoichi Takashima, Yusuke Fujita, Kenji Nagamatsu, Shinji Watanabe
:
Auxiliary Interference Speaker Loss for Target-Speaker Speech Recognition. 236-240 - Zhong Meng, Yashesh Gaur, Jinyu Li
, Yifan Gong:
Speaker Adaptation for Attention-Based End-to-End Speech Recognition. 241-245 - Peidong Wang, Jia Cui, Chao Weng, Dong Yu:
Large Margin Training for Attention Based End-to-End Speech Recognition. 246-250 - Khoi-Nguyen C. Mac, Xiaodong Cui, Wei Zhang, Michael Picheny:
Large-Scale Mixed-Bandwidth Deep Neural Network Acoustic Modeling for Automatic Speech Recognition. 251-255
Zero-Resource ASR
- Benjamin Milde, Chris Biemann:
SparseSpeech: Unsupervised Acoustic Unit Discovery with Memory-Augmented Sequence Autoencoders. 256-260 - Lucas Ondel, Hari Krishna Vydana, Lukás Burget
, Jan Cernocký
:
Bayesian Subspace Hidden Markov Model for Acoustic Unit Discovery. 261-265 - Yosuke Higuchi, Naohiro Tawara, Tetsunori Kobayashi, Tetsuji Ogawa:
Speaker Adversarial Training of DPGMM-Based Feature Extractor for Zero-Resource Languages. 266-270 - Manasa Prasad, Daan van Esch, Sandy Ritchie, Jonas Fromseier Mortensen:
Building Large-Vocabulary ASR Systems for Languages Without Any Audio Training Data. 271-275 - Emmanuel Azuh, David Harwath, James R. Glass:
Towards Bilingual Lexicon Discovery From Visually Grounded Speech Audio. 276-280 - Siyuan Feng, Tan Lee
:
Improving Unsupervised Subword Modeling via Disentangled Speech Representation Learning and Transformation. 281-285
Sociophonetics
- Shawn L. Nissen, Sharalee Blunck, Anita Dromey, Christopher Dromey:
Listeners' Ability to Identify the Gender of Preadolescent Children in Different Linguistic Contexts. 286-290 - Wiebke Ahlers
, Philipp Meer
:
Sibilant Variation in New Englishes: A Comparative Sociophonetic Study of Trinidadian and American English /s(tr)/-Retraction. 291-295 - Michele Gubian, Jonathan Harrington, Mary Stevens, Florian Schiel, Paul Warren
:
Tracking the New Zealand English NEAR/SQUARE Merger Using Functional Principal Components Analysis. 296-300 - Iona Gessinger
, Bernd Möbius, Bistra Andreeva
, Eran Raveh, Ingmar Steiner:
Phonetic Accommodation in a Wizard-of-Oz Experiment: Intonation and Segments. 301-305 - Oliver Niebuhr
, Jan Michalsky:
PASCAL and DPA: A Pilot Study on Using Prosodic Competence Scores to Predict Communicative Skills for Team Working and Public Speaking. 306-310 - Jan Michalsky, Heike Schoormann, Thomas Schultze:
Towards the Prosody of Persuasion in Competitive Negotiation. The Relationship Between f0 and Negotiation Success in Same Sex Sales Tasks. 311-315
Resources – Annotation – Evaluation
- Jacob Sager, Ravi Shankar, Jacob Reinhold, Archana Venkataraman:
VESUS: A Crowd-Annotated Database to Study Emotion Production and Perception in Spoken English. 316-320 - Jia Xin Koh, Aqilah Mislan, Kevin Khoo, Brian Ang, Wilson Ang, Charmaine Ng, Ying-Ying Tan:
Building the Singapore English National Speech Corpus. 321-325 - Michael Picheny, Zoltán Tüske, Brian Kingsbury, Kartik Audhkhasi, Xiaodong Cui, George Saon
:
Challenging the Boundaries of Speech Recognition: The MALACH Corpus. 326-330 - Pravin Bhaskar Ramteke, Sujata Supanekar, Pradyoth Hegde, Hanna Nelson, Venkataraja Aithal, Shashidhar G. Koolagudi:
NITK Kids' Speech Corpus. 331-335 - Ahmed Ali, Salam Khalifa, Nizar Habash
:
Towards Variability Resistant Dialectal Speech Evaluation. 336-340 - Per Fallgren, Zofia Malisz
, Jens Edlund:
How to Annotate 100 Hours in 45 Minutes. 341-345
Speaker Recognition and Diarization
- Mireia Díez, Lukás Burget
, Shuai Wang, Johan Rohdin, Jan Cernocký:
Bayesian HMM Based x-Vector Clustering for Speaker Diarization. 346-350 - Ville Vestman, Kong Aik Lee
, Tomi H. Kinnunen, Takafumi Koshinaka:
Unleashing the Unused Potential of i-Vectors Enabled by GPU Acceleration. 351-355 - Suwon Shon, Najim Dehak
, Douglas A. Reynolds, James R. Glass:
MCE 2018: The 1st Multi-Target Speaker Detection and Identification Challenge Evaluation. 356-360 - Zhifu Gao, Yan Song, Ian McLoughlin
, Pengcheng Li, Yiheng Jiang, Li-Rong Dai:
Improving Aggregation and Loss Function for Better Embedding Learning in End-to-End Speaker Verification System. 361-365 - Qingjian Lin, Ruiqing Yin, Ming Li, Hervé Bredin, Claude Barras:
LSTM Based Similarity Measurement with Spectral Clustering for Speaker Diarization. 366-370 - Joon Son Chung, Bong-Jin Lee, Icksang Han:
Who Said That?: Audio-Visual Speaker Diarisation of Real-World Meetings. 371-375 - Jiamin Xie, Leibny Paola García-Perera, Daniel Povey, Sanjeev Khudanpur:
Multi-PLDA Diarization on Children's Speech. 376-380 - Alan McCree, Gregory Sell, Daniel Garcia-Romero:
Speaker Diarization Using Leave-One-Out Gaussian PLDA Clustering of DNN Embeddings. 381-385 - Omid Ghahabi, Volker Fischer:
Speaker-Corrupted Embeddings for Online Speaker Diarization. 386-390 - Tae Jin Park, Kyu Jeong Han, Jing Huang, Xiaodong He, Bowen Zhou, Panayiotis G. Georgiou, Shrikanth Narayanan:
Speaker Diarization with Lexical Information. 391-395 - Laurent El Shafey, Hagen Soltau, Izhak Shafran:
Joint Speech Recognition and Speaker Diarization via Sequence Transduction. 396-400 - Sandro Cumani:
Normal Variance-Mean Mixtures for Unsupervised Score Calibration. 401-405 - Hitoshi Yamamoto, Kong Aik Lee
, Koji Okabe, Takafumi Koshinaka:
Speaker Augmentation and Bandwidth Extension for Deep Speaker Embedding. 406-410 - Emre Yilmaz, Adem Derinel, Kun Zhou, Henk van den Heuvel, Niko Brummer, Haizhou Li
, David A. van Leeuwen:
Large-Scale Speaker Diarization of Radio Broadcast Archives. 411-415 - Harishchandra Dubey, Abhijeet Sangwan, John H. L. Hansen:
Toeplitz Inverse Covariance Based Robust Speaker Clustering for Naturalistic Audio Streams. 416-420
ASR for Noisy and Far-Field Speech
- György Kovács, László Tóth, Dirk Van Compernolle, Marcus Liwicki:
Examining the Combination of Multi-Band Processing and Channel Dropout for Robust Speech Recognition. 421-425 - Meet H. Soni, Ashish Panda:
Label Driven Time-Frequency Masking for Robust Continuous Speech Recognition. 426-430 - Long Wu, Hangting Chen, Li Wang, Pengyuan Zhang, Yonghong Yan:
Speaker-Invariant Feature-Mapping for Distant Speech Recognition via Adversarial Teacher-Student Learning. 431-435 - Ji Ming, Danny Crookes:
Full-Sentence Correlation: A Method to Handle Unpredictable Noise for Robust Speech Recognition. 436-440 - Meet H. Soni, Sonal Joshi
, Ashish Panda:
Generative Noise Modeling and Channel Simulation for Robust Speech Recognition in Unseen Conditions. 441-445 - Shashi Kumar, Shakti P. Rath:
Far-Field Speech Enhancement Using Heteroscedastic Autoencoder for Improved Speech Recognition. 446-450 - Marc Delcroix
, Shinji Watanabe
, Tsubasa Ochiai, Keisuke Kinoshita
, Shigeki Karita, Atsunori Ogawa, Tomohiro Nakatani:
End-to-End SpeakerBeam for Single Channel Target Speech Recognition. 451-455 - I-Hung Hsu, Ayush Jaiswal, Premkumar Natarajan:
NIESR: Nuisance Invariant End-to-End Speech Recognition. 456-460 - Takahito Suzuki, Jun Ogata, Takashi Tsunakawa
, Masafumi Nishida, Masafumi Nishimura:
Knowledge Distillation for Throat Microphone Speech Recognition. 461-465 - Jian Wu, Yong Xu, Shi-Xiong Zhang, Lianwu Chen, Meng Yu, Lei Xie, Dong Yu:
Improved Speaker-Dependent Separation for CHiME-5 Challenge. 466-470 - Peidong Wang, Ke Tan
, DeLiang Wang:
Bridging the Gap Between Monaural Speech Enhancement and Recognition with Distortion-Independent Acoustic Modeling. 471-475 - Peidong Wang, DeLiang Wang:
Enhanced Spectral Features for Distortion-Independent Acoustic Modeling. 476-480 - Paarth Neekhara, Shehzeen Hussain
, Prakhar Pandey, Shlomo Dubnov
, Julian J. McAuley
, Farinaz Koushanfar
:
Universal Adversarial Perturbations for Speech Recognition Systems. 481-485 - Masakiyo Fujimoto, Hisashi Kawai:
One-Pass Single-Channel Noisy Speech Recognition Using a Combination of Noisy and Enhanced Features. 486-490 - Bin Liu, Shuai Nie, Shan Liang, Wenju Liu, Meng Yu, Lianwu Chen, Shouye Peng, Changliang Li:
Jointly Adversarial Enhancement Training for Robust End-to-End Speech Recognition. 491-495
Social Signals Detection and Speaker Traits Analysis
- Zixiaofan Yang, Bingyan Hu, Julia Hirschberg:
Predicting Humor by Learning from Time-Aligned Comments. 496-500 - Yoan Dinkov, Ahmed Ali, Ivan Koychev, Preslav Nakov
:
Predicting the Leading Political Ideology of YouTube Channels Using Acoustic, Textual, and Metadata Information. 501-505 - Guozhen An, Rivka Levitan:
Mitigating Gender and L1 Differences to Improve State and Trait Recognition. 506-509 - Felix Weninger, Yang Sun, Junho Park, Daniel Willett, Puming Zhan:
Deep Learning Based Mandarin Accent Identification for Accent Robust ASR. 510-514 - Gábor Gosztolya, László Tóth:
Calibrating DNN Posterior Probability Estimates of HMM/DNN Models to Improve Social Signal Detection from Audio Data. 515-519 - Hiroki Mori
, Tomohiro Nagata, Yoshiko Arimoto
:
Conversational and Social Laughter Synthesis with WaveNet. 520-523 - Bogdan Ludusan, Petra Wagner:
Laughter Dynamics in Dyadic Conversations. 524-528 - Khiet P. Truong, Jürgen Trouvain, Michel-Pierre Jansen:
Towards an Annotation Scheme for Complex Laughter in Speech Corpora. 529-533 - Alice Baird, Shahin Amiriparian
, Nicholas Cummins
, Sarah Sturmbauer, Johanna Janson
, Eva-Maria Meßner, Harald Baumeister
, Nicolas Rohleder, Björn W. Schuller
:
Using Speech to Predict Sequentially Measured Cortisol Levels During a Trier Social Stress Test. 534-538 - Alice Baird, Eduardo Coutinho
, Julia Hirschberg, Björn W. Schuller
:
Sincerity in Acted Speech: Presenting the Sincere Apology Corpus and Results. 539-543 - Oliver Niebuhr
, Kerstin Fischer
:
Do not Hesitate! - Unless You Do it Shortly or Nasally: How the Phonetics of Filled Pauses Determine Their Subjective Frequency and Perceived Speaker Performance. 544-548 - Juan Camilo Vásquez-Correa
, Philipp Klumpp, Juan Rafael Orozco-Arroyave
, Elmar Nöth:
Phonet: A Tool Based on Gated Recurrent Neural Networks to Extract Phonological Posteriors from Speech. 549-553
Applications of Language Technologies
- Ching-Ting Chang, Shun-Po Chuang, Hung-yi Lee:
Code-Switching Sentence Generation by Generative Adversarial Networks and its Application to Data Augmentation. 554-558 - Moritz Meier, Celeste Mason, Felix Putze, Tanja Schultz
:
Comparative Analysis of Think-Aloud Methods for Everyday Activities in the Context of Cognitive Robotics. 559-563 - Doug Beeferman, William Brannon
, Deb Roy:
RadioTalk: A Large-Scale Corpus of Talk Radio Transcripts. 564-568 - Salima Mdhaffar
, Yannick Estève, Nicolas Hernandez, Antoine Laurent, Richard Dufour, Solen Quiniou:
Qualitative Evaluation of ASR Adaptation in a Lecture Context: Application to the PASTEL Corpus. 569-573 - Federico Marinelli, Alessandra Cervone, Giuliano Tortoreto, Evgeny A. Stepanov
, Giuseppe Di Fabbrizio, Giuseppe Riccardi:
Active Annotation: Bootstrapping Annotation Lexicon and Guidelines for Supervised NLU Learning. 574-578 - Gerardo Roa Dabike
, Jon Barker:
Automatic Lyric Transcription from Karaoke Vocal Tracks: Resources and a Baseline System. 579-583 - Qiang Huang, Thomas Hain
:
Detecting Mismatch Between Speech and Transcription Using Cross-Modal Attention. 584-588 - Jazmín Vidal, Luciana Ferrer, Leonardo Brambilla:
EpaDB: A Database for Development of Pronunciation Assessment Systems. 589-593 - Katrin Angerbauer, Heike Adel, Ngoc Thang Vu:
Automatic Compression of Subtitles with Neural Networks and its Effect on User Experience. 594-598 - Hongyin Luo, Mitra Mohtarami, James R. Glass, Karthik Krishnamurthy, Brigitte Richardson:
Integrating Video Retrieval and Moment Detection in a Unified Corpus for Video Question Answering. 599-603
Speech and Audio Characterization and Segmentation
- Sarah E. Gutz, Jun Wang, Yana Yunusova
, Jordan R. Green:
Early Identification of Speech Changes Due to Amyotrophic Lateral Sclerosis Using Machine Classification. 604-608 - Mohamed Ismail Yasar Arafath K, Aurobinda Routray:
Automatic Detection of Breath Using Voice Activity Detection and SVM Classifier with Application on News Reports. 609-613 - Hee-Soo Heo, Jee-weon Jung, Hye-jin Shim, Ha-Jin Yu:
Acoustic Scene Classification Using Teacher-Student Learning with Soft-Labels. 614-618 - Yanping Chen, Hongxia Jin:
Rare Sound Event Detection Using Deep Learning and Data Augmentation. 619-623 - Bidisha Sharma, Haizhou Li
:
A Combination of Model-Based and Feature-Based Strategy for Speech-to-Singing Alignment. 624-628 - Yosi Shrem, Matthew Goldrick
, Joseph Keshet
:
Dr.VOT: Measuring Positive and Negative Voice Onset Time in the Wild. 629-633 - Jun Hui, Yue Wei, Shutao Chen, Richard Hau Yue So:
Effects of Base-Frequency and Spectral Envelope on Deep-Learning Speech Separation and Recognition Models. 634-638 - Nirmesh J. Shah, Hemant A. Patil:
Phone Aware Nearest Neighbor Technique Using Spectral Transition Measure for Non-Parallel Voice Conversion. 639-643 - Ravi Shankar, Archana Venkataraman:
Weakly Supervised Syllable Segmentation by Vowel-Consonant Peak Classification. 644-648 - Lukás Mateju
, Petr Cerva
, Jindrich Zdánský:
An Approach to Online Speaker Change Point Detection Using DNNs and WFSTs. 649-653 - Zhenyu Tang
, John D. Kanu, Kevin Hogan, Dinesh Manocha:
Regression and Classification for Direction-of-Arrival Estimation with Convolutional Recurrent Neural Networks. 654-658
Neural Techniques for Voice Conversion and Waveform Generation
- Dipjyoti Paul, Yannis Pantazis, Yannis Stylianou:
Non-Parallel Voice Conversion Using Weighted Generative Adversarial Networks. 659-663 - Ju-Chieh Chou, Hung-yi Lee:
One-Shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization. 664-668 - Hui Lu
, Zhiyong Wu, Dongyang Dai, Runnan Li, Shiyin Kang, Jia Jia, Helen Meng:
One-Shot Voice Conversion with Global Speaker Embeddings. 669-673 - Patrick Lumban Tobing
, Yi-Chiao Wu, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda
:
Non-Parallel Voice Conversion with Cyclic Variational Autoencoder. 674-678 - Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo:
StarGAN-VC2: Rethinking Conditional Methods for StarGAN-Based Voice Conversion. 679-683 - Yusuke Kurita, Kazuhiro Kobayashi, Kazuya Takeda, Tomoki Toda
:
Robustness of Statistical Voice Conversion Based on Direct Waveform Modification Against Background Sounds. 684-688 - Shengkui Zhao, Trung Hieu Nguyen, Hao Wang, Bin Ma:
Fast Learning for Non-Parallel Many-to-Many Voice Conversion with Residual Star Generative Adversarial Networks. 689-693 - Lauri Juvela
, Bajibabu Bollepalli, Junichi Yamagishi, Paavo Alku
:
GELP: GAN-Excited Linear Prediction for Speech Synthesis from Mel-Spectrogram. 694-698 - Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim:
Probability Density Distillation with Generative Adversarial Networks for High-Quality Parallel Waveform Generation. 699-703 - Seyed Hamidreza Mohammadi, Taehwan Kim:
One-Shot Voice Conversion with Disentangled Representations by Leveraging Phonetic Posteriorgrams. 704-708 - Wen-Chin Huang, Yi-Chiao Wu, Chen-Chou Lo, Patrick Lumban Tobing
, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda
, Yu Tsao
, Hsin-Min Wang
:
Investigation of F0 Conditioning and Fully Convolutional Networks in Variational Autoencoder Based Voice Conversion. 709-713 - Songxiang Liu, Yuewen Cao, Xixin Wu, Lifa Sun, Xunying Liu, Helen Meng:
Jointly Trained Conversion Model and WaveNet Vocoder for Non-Parallel Voice Conversion Using Mel-Spectrograms and Phonetic Posteriorgrams. 714-718 - Li-Wei Chen, Hung-yi Lee, Yu Tsao
:
Generative Adversarial Networks for Unpaired Voice Transformation on Impaired Speech. 719-723 - Shaojin Ding, Ricardo Gutierrez-Osuna:
Group Latent Embedding for Vector Quantized Variational Autoencoder in Non-Parallel Voice Conversion. 724-728 - Cory Stephenson, Gokce Keskin, Anil Thomas, Oguz H. Elibol:
Semi-Supervised Voice Conversion with Amortized Variational Inference. 729-733
Model Adaptation for ASR
- Subhadeep Dey, Petr Motlícek
, Trung Bui, Franck Dernoncourt:
Exploiting Semi-Supervised Training Through a Dropout Regularization in End-to-End Speech Recognition. 734-738 - Chanwoo Kim, Minkyu Shin, Abhinav Garg, Dhananjaya Gowda:
Improved Vocal Tract Length Perturbation for a State-of-the-Art End-to-End Speech Recognition System. 739-743 - Han Zhu
, Li Wang, Pengyuan Zhang, Yonghong Yan:
Multi-Accent Adaptation Based on Gate Mechanism. 744-748 - Pengcheng Guo, Sining Sun, Lei Xie:
Unsupervised Adaptation with Adversarial Dropout Regularization for Robust Speech Recognition. 749-753 - Markus Kitza, Pavel Golik
, Ralf Schlüter
, Hermann Ney:
Cumulative Adaptation for BLSTM Acoustic Models. 754-758 - Xurong Xie, Xunying Liu, Tan Lee
, Lan Wang:
Fast DNN Acoustic Model Speaker Adaptation by Learning Hidden Unit Contribution Features. 759-763 - Emiru Tsunoo, Yosuke Kashiwagi, Satoshi Asakawa, Toshiyuki Kumakura:
End-to-End Adaptation with Backpropagation Through WFST for On-Device Speech Recognition System. 764-768 - Leda Sari, Samuel Thomas, Mark A. Hasegawa-Johnson:
Learning Speaker Aware Offsets for Speaker Adaptation of Neural Networks. 769-773 - Khe Chai Sim, Petr Zadrazil, Françoise Beaufays:
An Investigation into On-Device Personalization of End-to-End Automatic Speech Recognition Models. 774-778 - Abhinav Jain, Vishwanath P. Singh, Shakti P. Rath:
A Multi-Accent Acoustic Model Using Mixture of Experts for Speech Recognition. 779-783 - Joel Shor, Dotan Emanuel, Oran Lang, Omry Tuval, Michael P. Brenner, Julie Cattiau, Fernando Vieira, Maeve McNally, Taylor Charbonneau, Melissa Nollstadt, Avinatan Hassidim, Yossi Matias:
Personalizing ASR for Dysarthric and Accented Speech with Limited Data. 784-788
Dialogue Speech Understanding
- Denis Peskov, Joe Barrow, Pedro Rodriguez
, Graham Neubig, Jordan L. Boyd-Graber:
Mitigating Noisy Inputs for Question Answering. 789-793 - Rahul Gupta, Aman Alok, Shankar Ananthakrishnan:
One-vs-All Models for Asynchronous Training: An Empirical Analysis. 794-798 - Gabriel Marzinotto, Géraldine Damnati, Frédéric Béchet:
Adapting a FrameNet Semantic Parser for Spoken Language Understanding Using Adversarial Learning. 799-803 - Titouan Parcollet, Mohamed Morchid, Xavier Bost, Georges Linarès:
M2H-GAN: A GAN-Based Mapping from Machine to Human Transcripts for Speech Understanding. 804-808 - Munir Georges
, Krzysztof Czarnowski, Tobias Bocklet
:
Ultra-Compact NLU: Neuronal Network Binarization as Regularization. 809-813 - Loren Lugosch, Mirco Ravanelli
, Patrick Ignoto, Vikrant Singh Tomar, Yoshua Bengio:
Speech Model Pre-Training for End-to-End Spoken Language Understanding. 814-818 - Prashanth Gurunath Shivakumar, Mu Yang, Panayiotis G. Georgiou:
Spoken Language Intent Detection Using Confusion2Vec. 819-823 - Natalia A. Tomashenko
, Antoine Caubrière, Yannick Estève:
Investigating Adaptation and Transfer Learning for End-to-End Spoken Language Understanding from Speech. 824-828 - Yuanfeng Song, Di Jiang, Xueyang Wu, Qian Xu, Raymond Chi-Wing Wong, Qiang Yang:
Topic-Aware Dialogue Speech Recognition with Transfer Learning. 829-833 - Ryo Masumura, Tomohiro Tanaka, Atsushi Ando, Hosana Kamiyama, Takanobu Oba, Satoshi Kobashikawa, Yushi Aono:
Improving Conversation-Context Language Models with Multiple Spoken Language Understanding Models. 834-838 - Jen-Tzung Chien
, Wei Xiang Lieow:
Meta Learning for Hyperparameter Optimization in Dialogue System. 839-843 - Kyle Williams:
Zero Shot Intent Classification Using Long-Short Term Memory Networks. 844-848 - Mandy Korpusik, Zoe Liu, James R. Glass:
A Comparison of Deep Learning Methods for Language Understanding. 849-853 - Yuka Kobayashi, Takami Yoshida, Kenji Iwata, Hiroshi Fujimura:
Slot Filling with Weighted Multi-Encoders for Out-of-Domain Values. 854-858
Speech Production and Silent Interfaces
- Nadee Seneviratne, Ganesh Sivaraman
, Carol Y. Espy-Wilson:
Multi-Corpus Acoustic-to-Articulatory Speech Inversion. 859-863 - Debadatta Dash
, Alan Wisler
, Paul Ferrari
, Jun Wang:
Towards a Speaker Independent Speech-BCI Using Speaker Adaptation. 864-868 - Janaki Sheth
, Ariel Tankus, Michelle Tran, Lindy Comstock
, Itzhak Fried, William Speier
:
Identifying Input Features for Development of Real-Time Translation of Neural Signals to Text. 869-873 - Samuel S. Silva
, António J. S. Teixeira
, Conceição Cunha, Nuno Almeida
, Arun A. Joseph, Jens Frahm:
Exploring Critical Articulator Identification from 50Hz RT-MRI Data of the Vocal Tract. 874-878 - Ioannis K. Douros, Anastasiia Tsukanova, Karyna Isaieva
, Pierre-André Vuissoz, Yves Laprie:
Towards a Method of Dynamic Vocal Tract Shapes Generation by Combining Static 3D and Dynamic 2D MRI Speech Data. 879-883 - Oksana Rasskazova, Christine Mooshammer
, Susanne Fuchs:
Temporal Coordination of Articulatory and Respiratory Events Prior to Speech Initiation. 884-888 - Michele Gubian, Manfred Pastätter, Marianne Pouplier:
Zooming in on Spatiotemporal V-to-C Coarticulation with Functional PCA. 889-893 - Tamás Gábor Csapó, Mohammed Salah Al-Radhi
, Géza Németh
, Gábor Gosztolya, Tamás Grósz
, László Tóth, Alexandra Markó
:
Ultrasound-Based Silent Speech Interface Built on a Continuous Vocoder. 894-898 - Eugen Klein, Jana Brunner, Phil Hoole:
Assessing Acoustic and Articulatory Dimensions of Speech Motor Adaptation with Random Forests. 899-903 - Hironori Takemoto, Tsubasa Goto, Yuya Hagihara, Sayaka Hamanaka, Tatsuya Kitamura, Yukiko Nota, Kikuo Maekawa:
Speech Organ Contour Extraction Using Real-Time MRI and Machine Learning Method. 904-908 - K. G. van Leeuwen, P. Bos, Stefano Trebeschi, Maarten J. A. van Alphen, Luuk Voskuilen
, Ludi E. Smeele, Ferdi van der Heijden
, R. J. J. H. van Son:
CNN-Based Phoneme Classifier from Vocal Tract MRI Learns Embedding Consistent with Articulatory Topology. 909-913 - Doris Mücke, Anne Hermes
, Sam Tilsen:
Strength and Structure: Coupling Tones with Oral Constriction Gestures. 914-918
Speech Signal Characterization 2
- W. Bastiaan Kleijn
, Felicia S. C. Lim, Michael Chinen, Jan Skoglund
:
Salient Speech Representations Based on Cloned Networks. 919-923 - Manoj Kumar Ramanathi, Chiranjeevi Yarra
, Prasanta Kumar Ghosh:
ASR Inspired Syllable Stress Detection for Pronunciation Evaluation Without Using a Supervised Classifier and Syllable Level Features. 924-928 - Renuka Mannem, Jhansi Mallela, Aravind Illa, Prasanta Kumar Ghosh:
Acoustic and Articulatory Feature Based Speech Rate Estimation Using a Convolutional Dense Neural Network. 929-933 - Sebastian Springenberg, Egor Lakomkin, Cornelius Weber, Stefan Wermter
:
Predictive Auxiliary Variational Autoencoder for Representation Learning of Global Speech Characteristics. 934-938 - Georgios Paraskevopoulos, Efthymios Tzinis, Nikolaos Ellinas, Theodoros Giannakopoulos, Alexandros Potamianos:
Unsupervised Low-Rank Representations for Speech Emotion Recognition. 939-943 - Jitendra Kumar Dhiman
, Nagaraj Adiga, Chandra Sekhar Seelamantula:
On the Suitability of the Riesz Spectro-Temporal Envelope for WaveNet Based Speech Synthesis. 944-948 - Xinzhou Xu, Jun Deng, Nicholas Cummins
, Zixing Zhang, Li Zhao, Björn W. Schuller
:
Autonomous Emotion Learning in Speech: A View of Zero-Shot Speech Emotion Recognition. 949-953 - Sweekar Sudhakara, Manoj Kumar Ramanathi, Chiranjeevi Yarra
, Prasanta Kumar Ghosh:
An Improved Goodness of Pronunciation (GoP) Measure for Pronunciation Evaluation with DNN-HMM System Considering HMM Transition Probabilities. 954-958 - Atreyee Saha, Chiranjeevi Yarra
, Prasanta Kumar Ghosh:
Low Resource Automatic Intonation Classification Using Gated Recurrent Unit (GRU) Networks Pre-Trained with Synthesized Pitch Patterns. 959-963
Applications in Language Learning and Healthcare
- Juan Camilo Vásquez-Correa, Tomás Arias-Vergara, Philipp Klumpp, M. Strauss, Arne Küderle, Nils Roth, Sebastian P. Bayerl, Nicanor García-Ospina, Paula Andrea Pérez-Toro, L. Felipe Parra-Gallego, Cristian David Ríos-Urrego, Daniel Escobar-Grisales, Juan Rafael Orozco-Arroyave, Björn M. Eskofier, Elmar Nöth:
Apkinson: A Mobile Solution for Multimodal Assessment of Patients with Parkinson's Disease. 964-965 - Gábor Kiss, Dávid Sztahó, Klára Vicsi:
Depression State Assessment: Application for Detection of Depression by Speech. 966-967 - Chiranjeevi Yarra, Aparna Srinivasan, Sravani Gottimukkala, Prasanta Kumar Ghosh:
SPIRE-fluent: A Self-Learning App for Tutoring Oral Fluency to Second Language English Learners. 968-969 - Shawn L. Nissen, Rebecca Nissen:
Using Real-Time Visual Biofeedback for Second Language Instruction. 970-971 - Avin Miwardelli, Ian Gallagher, Jenny Gibson, Napoleon Katsos, Kate M. Knill, Helena Wood:
Splash: Speech and Language Assessment in Schools and Homes. 972-973 - Colin T. Annand, Maurice Lamb, Sarah Dugan, Sarah R. Li, Hannah M. Woeste, T. Douglas Mast, Michael A. Riley, Jack A. Masterson, Neeraja Mahalingam, Kathryn J. Eary, Caroline Spencer, Suzanne Boyce, Stephanie Jackson, Anoosha Baxi, Reneé Seward:
Using Ultrasound Imaging to Create Augmented Visual Biofeedback for Articulatory Practice. 974-975 - Vasiliy Radostev, Serge Berger, Justin Tabrizi, Pasha Kamyshev, Hisami Suzuki:
Speech-Based Web Navigation for Limited Mobility Users. 976-977
Keynote 2: Tanja Schultz
- Tanja Schultz:
Biosignal Processing for Human-Machine Interaction.
The Second DIHARD Speech Diarization Challenge (DIHARD II)
- Neville Ryant, Kenneth Church
, Christopher Cieri, Alejandrina Cristià, Jun Du, Sriram Ganapathy, Mark Y. Liberman
:
The Second DIHARD Diarization Challenge: Dataset, Task, and Baselines. 978-982 - Prachi Singh, Harsha Vardhan, Sriram Ganapathy, Ahilan Kanagasundaram:
LEAP Diarization System for the Second DIHARD Challenge. 983-987 - Ignacio Viñals, Pablo Gimeno, Alfonso Ortega Giménez
, Antonio Miguel, Eduardo Lleida
:
ViVoLAB Speaker Diarization System for the DIHARD 2019 Challenge. 988-992 - Zbynek Zajíc, Marie Kunesová
, Marek Hrúz
, Jan Vanek:
UWB-NTIS Speaker Diarization System for the DIHARD II 2019 Challenge. 993-997 - Tae Jin Park, Manoj Kumar, Nikolaos Flemotomos
, Monisankha Pal, Raghuveer Peri, Rimita Lahiri, Panayiotis G. Georgiou, Shrikanth Narayanan:
The Second DIHARD Challenge: System Description for USC-SAIL Team. 998-1002 - Sergey Novoselov, Aleksei Gusev, Artem Ivanov, Timur Pekhovsky, Andrey Shulipa, Anastasia Avdeeva, Artem Gorlanov, Alexandr Kozlov:
Speaker Diarization with Deep Speaker Embeddings for DIHARD Challenge II. 1003-1007
The 2019 Automatic Speaker Verification Spoofing and Countermeasures Challenge: ASVspoof Challenge — O
- Massimiliano Todisco, Xin Wang
, Ville Vestman, Md. Sahidullah, Héctor Delgado, Andreas Nautsch
, Junichi Yamagishi, Nicholas W. D. Evans, Tomi H. Kinnunen, Kong Aik Lee
:
ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection. 1008-1012
The 2019 Automatic Speaker Verification Spoofing and Countermeasures Challenge: ASVspoof Challenge — P
- Cheng-I Lai, Nanxin Chen, Jesús Villalba, Najim Dehak
:
ASSERT: Anti-Spoofing with Squeeze-Excitation and Residual Networks. 1013-1017 - Bhusan Chettri
, Daniel Stoller
, Veronica Morfi, Marco A. Martínez Ramírez
, Emmanouil Benetos
, Bob L. Sturm:
Ensemble Models for Spoofing Detection in Automatic Speaker Verification. 1018-1022 - Weicheng Cai, Haiwei Wu, Danwei Cai, Ming Li:
The DKU Replay Detection System for the ASVspoof 2019 Challenge: On Data Augmentation, Feature Representation, Classification, and Fusion. 1023-1027 - Radoslaw Bialobrzeski, Michal Kosmider, Mateusz Matuszewski, Marcin Plata, Alexander Rakowski:
Robust Bayesian and Light Neural Networks for Voice Spoofing Detection. 1028-1032 - Galina Lavrentyeva, Sergey Novoselov, Andzhukaev Tseren, Marina Volkova, Artem Gorlanov, Alexandr Kozlov:
STC Antispoofing Systems for the ASVspoof2019 Challenge. 1033-1037 - Yexin Yang, Hongji Wang, Heinrich Dinkel, Zhengyang Chen, Shuai Wang, Yanmin Qian, Kai Yu:
The SJTU Robust Anti-Spoofing System for the ASVspoof 2019 Challenge. 1038-1042 - K. N. R. K. Raju Alluri, Anil Kumar Vuppala:
IIIT-H Spoofing Countermeasures for Automatic Speaker Verification Spoofing and Countermeasures Challenge 2019. 1043-1047 - Rongjin Li, Miao Zhao, Zheng Li, Lin Li, Qingyang Hong:
Anti-Spoofing Speaker Verification System with Multi-Feature Integration and Multi-Task Learning. 1048-1052 - Jennifer Williams, Joanna Rownicka:
Speech Replay Detection with x-Vector Attack Embeddings and Spectral Features. 1053-1057 - Rohan Kumar Das
, Jichen Yang, Haizhou Li
:
Long Range Acoustic Features for Spoofed Speech Detection. 1058-1062 - Su-Yu Chang, Kai-Cheng Wu, Chia-Ping Chen:
Transfer-Representation Learning for Detecting Spoofing Attacks with Converted and Synthesized Speech in Automatic Speaker Verification System. 1063-1067 - Alejandro Gómez Alanís, Antonio M. Peinado, José A. González, Angel M. Gomez:
A Light Convolutional GRU-RNN Deep Feature Extractor for ASV Spoofing Detection. 1068-1072 - Hossein Zeinali, Themos Stafylakis
, Georgia Athanasopoulou, Johan Rohdin, Ioannis Gkinis, Lukás Burget
, Jan Cernocký
:
Detecting Spoofing Attacks Using VGG and SincNet: BUT-Omilia Submission to ASVspoof 2019 Challenge. 1073-1077 - Moustafa Alzantot, Ziqi Wang
, Mani B. Srivastava
:
Deep Residual Neural Networks for Audio Spoofing Detection. 1078-1082 - Jee-weon Jung, Hye-jin Shim, Hee-Soo Heo, Ha-Jin Yu:
Replay Attack Detection with Complementary High-Resolution Information Using End-to-End DNN for the ASVspoof 2019 Challenge. 1083-1087
The Zero Resource Speech Challenge 2019: TTS Without T
- Ewan Dunbar, Robin Algayres, Julien Karadayi, Mathieu Bernard, Juan Benjumea, Xuan-Nga Cao, Lucie Miskic, Charlotte Dugrain, Lucas Ondel, Alan W. Black, Laurent Besacier, Sakriani Sakti, Emmanuel Dupoux:
The Zero Resource Speech Challenge 2019: TTS Without T. 1088-1092 - Siyuan Feng, Tan Lee
, Zhiyuan Peng:
Combining Adversarial Training and Disentangled Speech Representation for Robust Zero-Resource Subword Modeling. 1093-1097 - Bolaji Yusuf, Alican Gök, Batuhan Gündogdu, Oyku Deniz Kose, Murat Saraclar
:
Temporally-Aware Acoustic Unit Discovery for Zerospeech 2019 Challenge. 1098-1102 - Ryan Eloff, André Nortje, Benjamin van Niekerk
, Avashna Govender, Leanne Nortje, Arnu Pretorius, Elan Van Biljon, Ewald van der Westhuizen, Lisa van Staden, Herman Kamper
:
Unsupervised Acoustic Unit Discovery for Speech Synthesis Using Discrete Latent-Variable Neural Networks. 1103-1107 - Andy T. Liu
, Po-Chun Hsu, Hung-yi Lee:
Unsupervised End-to-End Learning of Discrete Linguistic Units for Voice Conversion. 1108-1112 - Karthik Pandia D. S
, Hema A. Murthy:
Zero Resource Speech Synthesis Using Transcripts Derived from Perceptual Acoustic Units. 1113-1117 - Andros Tjandra, Berrak Sisman, Mingyang Zhang, Sakriani Sakti, Haizhou Li
, Satoshi Nakamura:
VQVAE Unsupervised Unit Discovery and Multi-Scale Code2Spec Inverter for Zerospeech Challenge 2019. 1118-1122
Speech Translation
- Jan Niehues:
Survey Talk: A Survey on Speech Translation. - Ye Jia, Ron J. Weiss, Fadi Biadsy, Wolfgang Macherey, Melvin Johnson, Zhifeng Chen, Yonghui Wu:
Direct Speech-to-Speech Translation with a Sequence-to-Sequence Model. 1123-1127 - Yuchen Liu, Hao Xiong, Jiajun Zhang, Zhongjun He, Hua Wu, Haifeng Wang
, Chengqing Zong
:
End-to-End Speech Translation with Knowledge Distillation. 1128-1132 - Mattia Antonino Di Gangi
, Matteo Negri
, Marco Turchi:
Adapting Transformer to End-to-End Spoken Language Translation. 1133-1137 - Steven Hillis, Anushree Prasanna Kumar, Alan W. Black:
Unsupervised Phonetic and Word Level Discovery for Speech to Speech Translation for Unwritten Languages. 1138-1142
Speaker Recognition 1
- Gautam Bhattacharya, Md. Jahangir Alam, Patrick Kenny:
Deep Speaker Recognition: Modular or Monolithic? 1143-1147 - Shuai Wang, Johan Rohdin, Lukás Burget
, Oldrich Plchot, Yanmin Qian, Kai Yu, Jan Cernocký
:
On the Usage of Phonetic Information for Text-Independent Speaker Embedding Extraction. 1148-1152 - Mirco Ravanelli
, Yoshua Bengio:
Learning Speaker Representations with Mutual Information. 1153-1157 - Lanhua You, Wu Guo, Li-Rong Dai, Jun Du:
Multi-Task Learning with High-Order Statistics for x-Vector Based Text-Independent Speaker Verification. 1158-1162 - Zhanghao Wu, Shuai Wang, Yanmin Qian, Kai Yu:
Data Augmentation Using Variational Autoencoder for Embedding Based Speaker Verification. 1163-1167 - Lanhua You, Wu Guo, Li-Rong Dai, Jun Du:
Deep Neural Network Embeddings with Gating Mechanisms for Text-Independent Speaker Verification. 1168-1172
Dialogue Understanding
- Riyaz Ahmad Bhat, John Chen, Rashmi Prasad, Srinivas Bangalore:
Neural Transition Systems for Modeling Hierarchical Semantic Representations. 1173-1177 - Vedran Vukotic, Christian Raymond:
Mining Polysemous Triplets with Recurrent Neural Networks for Spoken Language Understanding. 1178-1182 - Avik Ray, Yilin Shen, Hongxia Jin:
Iterative Delexicalization for Improved Spoken Language Understanding. 1183-1187 - Swapnil Bhosale, Imran A. Sheikh
, Sri Harsha Dumpala, Sunil Kumar Kopparapu:
End-to-End Spoken Language Understanding: Bootstrapping in Low Resource Scenarios. 1188-1192 - Hiroaki Takatsu, Katsuya Yokoyama, Yoichi Matsuyama, Hiroshi Honda
, Shinya Fujie, Tetsunori Kobayashi:
Recognition of Intentions of Users' Short Responses for Conversational News Delivery System. 1193-1197 - Antoine Caubrière, Natalia A. Tomashenko
, Antoine Laurent, Emmanuel Morin, Nathalie Camelin
, Yannick Estève:
Curriculum-Based Transfer Learning for an Effective End-to-End Spoken Language Understanding and Domain Portability. 1198-1202
Speech in the Brain
- Debadatta Dash
, Paul Ferrari
, Jun Wang:
Spatial and Spectral Fingerprint in the Brain: Speaker Identification from Single Trial MEG Signals. 1203-1207 - Annika Nijveld, Louis ten Bosch
, Mirjam Ernestus:
ERP Signal Analysis with Temporal Resolution Using a Time Window Bank. 1208-1212 - Louis ten Bosch
, Kimberley Mulder, Louis Boves:
Phase Synchronization Between EEG Signals as a Function of Differences Between Stimuli Characteristics. 1213-1217 - Mariya Kharaman, Manluolan Xu, Carsten Eulitz, Bettina Braun:
The Processing of Prosodic Cues to Rhetorical Question Interpretation: Psycholinguistic and Neurolinguistics Evidence. 1218-1222 - Odette Scharenborg
, Jiska Koemans, Cybelle Smith, Mark A. Hasegawa-Johnson, Kara D. Federmeier:
The Neural Correlates Underlying Lexically-Guided Perceptual Learning. 1223-1227 - Ivan Halim Parmonangan, Hiroki Tanaka
, Sakriani Sakti, Shinnosuke Takamichi, Satoshi Nakamura:
Speech Quality Evaluation of Synthesized Japanese Speech Using EEG. 1228-1232
Far-Field Speech Recognition
- Yiteng Huang, Turaj Zakizadeh Shabestary, Alexander Gruenstein, Li Wan:
Multi-Microphone Adaptive Noise Cancellation for Robust Hotword Detection. 1233-1237 - Shengkui Zhao, Chongjia Ni, Rong Tong, Bin Ma:
Multi-Task Multi-Network Joint-Learning of Deep Residual Networks and Cycle-Consistency Generative Adversarial Networks for Robust Speech Recognition. 1238-1242 - Yuri Y. Khokhlov, Alexander Zatvornitskiy
, Ivan Medennikov
, Ivan Sorokin, Tatiana Prisyach, Aleksei Romanenko
, Anton Mitrofanov
, Vladimir Bataev
, Andrei Andrusenko
, Mariya Korenevskaya, Oleg Petrov
:
R-Vectors: New Technique for Adaptation to Room Acoustics. 1243-1247 - Naoyuki Kanda, Christoph Böddeker, Jens Heitkaemper, Yusuke Fujita, Shota Horiguchi, Kenji Nagamatsu, Reinhold Haeb-Umbach
:
Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn University Joint Investigation for Dinner Party ASR. 1248-1252 - Lukas Drude, Jahn Heymann, Reinhold Haeb-Umbach
:
Unsupervised Training of Neural Mask-Based Beamforming. 1253-1257 - Feng Ma, Li Chai, Jun Du, Diyuan Liu, Zhongfu Ye, Chin-Hui Lee:
Acoustic Model Ensembling Using Effective Data Augmentation for CHiME-5 Challenge. 1258-1262
Speaker and Language Recognition 1
- Ming Li, Weicheng Cai, Danwei Cai:
Survey Talk: End-to-End Deep Neural Network Based Speaker and Language Recognition. - Bharat Padi, Anand Mohan, Sriram Ganapathy:
Attention Based Hybrid i-Vector BLSTM Model for Language Recognition. 1263-1267 - Jee-weon Jung, Hee-Soo Heo, Ju-ho Kim, Hye-jin Shim, Ha-Jin Yu:
RawNet: Advanced End-to-End Deep Neural Network Using Raw Waveforms for Text-Independent Speaker Verification. 1268-1272 - Wei Rao, Chenglin Xu, Eng Siong Chng
, Haizhou Li
:
Target Speaker Extraction for Multi-Talker Speaker Verification. 1273-1277 - Hanna Mazzawi, Xavi Gonzalvo, Aleks Kracun, Prashant Sridhar, Niranjan Subrahmanya, Ignacio López-Moreno, Hyun-Jin Park, Patrick Violette:
Improving Keyword Spotting and Language Identification via Neural Architecture Search at Scale. 1278-1282
Speech Synthesis: Towards End-to-End
- Yibin Zheng, Xi Wang, Lei He, Shifeng Pan, Frank K. Soong, Zhengqi Wen, Jianhua Tao:
Forward-Backward Decoding for Regularizing End-to-End TTS. 1283-1287 - Haohan Guo
, Frank K. Soong, Lei He, Lei Xie:
A New GAN-Based End-to-End TTS Training Algorithm. 1288-1292 - Mutian He, Yan Deng, Lei He:
Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS. 1293-1297 - Mingyang Zhang, Xin Wang
, Fuming Fang, Haizhou Li
, Junichi Yamagishi:
Joint Training Framework for Text-to-Speech and Voice Conversion Using Multi-Source Tacotron and WaveNet. 1298-1302 - Hieu-Thi Luong, Xin Wang
, Junichi Yamagishi, Nobuyuki Nishizawa:
Training Multi-Speaker Neural Text-to-Speech Systems Using Speaker-Imbalanced Speech Corpora. 1303-1307 - Takuma Okamoto, Tomoki Toda
, Yoshinori Shiga, Hisashi Kawai:
Real-Time Neural Text-to-Speech with Sequence-to-Sequence Acoustic Model and WaveGlow or Single Gaussian WaveRNN Vocoders. 1308-1312
Semantic Analysis and Classification
- Sushant Kafle, Cecilia Ovesdotter Alm, Matt Huenerfauth:
Fusion Strategy for Prosodic and Lexical Representations of Word Importance. 1313-1317 - Jen-Tzung Chien
, Chun-Wei Wang:
Self Attention in Variational Sequential Learning for Summarization. 1318-1322 - Zhongkai Sun, Prathusha Kameswara Sarma, William A. Sethares, Erik P. Bucy:
Multi-Modal Sentiment Analysis Using Deep Canonical Correlation Analysis. 1323-1327 - Yilin Shen, Wenhu Chen, Hongxia Jin:
Interpreting and Improving Deep Neural SLU Models via Vocabulary Importance. 1328-1332 - Máté Ákos Tündik, Valér Kaszás, György Szaszák:
Assessing the Semantic Space Bias Caused by ASR Error Propagation and its Effect on Spoken Document Summarization. 1333-1337 - Peisong Huang, Peijie Huang, Wencheng Ai, Jiande Ding, Jinchuan Zhang
:
Latent Topic Attention for Domain Classification. 1338-1342
Speech and Audio Source Separation and Scene Analysis 1
- Chaitanya Narisetty
:
A Unified Bayesian Source Modelling for Determined Blind Source Separation. 1343-1347 - Naoya Takahashi, Sudarsanam Parthasaarathy
, Nabarun Goswami
, Yuki Mitsufuji:
Recursive Speech Separation for Unknown Number of Speakers. 1348-1352 - Pieter Appeltans, Jeroen Zegers, Hugo Van hamme
:
Practical Applicability of Deep Neural Networks for Overlapping Speaker Separation. 1353-1357 - Zhaoyi Gu, Jing Lu, Kai Chen:
Speech Separation Using Independent Vector Analysis with an Amplitude Variable Gaussian Mixture Model. 1358-1362 - Gene-Ping Yang, Chao-I Tuan, Hung-yi Lee, Lin-Shan Lee:
Improved Speech Separation with Time-and-Frequency Cross-Domain Joint Embedding and Clustering. 1363-1367 - Gordon Wichern, Joe Antognini, Michael Flynn, Licheng Richard Zhu, Emmett McQuinn, Dwight Crow, Ethan Manilow, Jonathan Le Roux:
WHAM!: Extending Speech Separation to Noisy Environments. 1368-1372
Speech Intelligibility
- Andreas Nautsch:
Survey Talk: Preserving Privacy in Speaker and Speech Characterisation. - Carol Chermaz, Cassia Valentini-Botinhao, Henning F. Schepker, Simon King:
Evaluating Near End Listening Enhancement Algorithms in Realistic Environments. 1373-1377 - Amin Edraki, Wai-Yip Chan, Jesper Jensen, Daniel Fogerty:
Improvement and Assessment of Spectro-Temporal Modulation Analysis for Speech Intelligibility Estimation. 1378-1382 - Zhuohuang Zhang, Yi Shen
:
Listener Preference on the Local Criterion for Ideal Binary-Masked Speech. 1383-1387 - Tuan Dinh, Alexander Kain, Kris Tjaden:
Using a Manifold Vocoder for Spectral Voice and Style Conversion. 1388-1392
ASR Neural Network Architectures 1
- Patrick von Platen, Chao Zhang, Philip C. Woodland:
Multi-Span Acoustic Modelling Using Raw Waveform Signals. 1393-1397 - André Merboldt, Albert Zeyer, Ralf Schlüter
, Hermann Ney:
An Analysis of Local Monotonic Attention Variants. 1398-1402 - Eric Sun, Jinyu Li
, Yifan Gong:
Layer Trajectory BLSTM. 1403-1407 - Shigeki Karita, Nelson Enrique Yalta Soplin
, Shinji Watanabe
, Marc Delcroix
, Atsunori Ogawa, Tomohiro Nakatani:
Improving Transformer-Based End-to-End Speech Recognition with Connectionist Temporal Classification and Language Model Integration. 1408-1412 - Shucong Zhang, Erfan Loweimi
, Yumo Xu, Peter Bell, Steve Renals
:
Trainable Dynamic Subsampling for End-to-End Speech Recognition. 1413-1417 - Ding Zhao, Tara N. Sainath, David Rybach, Pat Rondon, Deepti Bhatia, Bo Li, Ruoming Pang:
Shallow-Fusion End-to-End Contextual Biasing. 1418-1422
Speech and Language Analytics for Mental Health
- Md. Nasir, Sandeep Nallan Chakravarthula, Brian R. W. Baucom
, David C. Atkins, Panayiotis G. Georgiou, Shrikanth Narayanan:
Modeling Interpersonal Linguistic Coordination in Conversations Using Word Mover's Distance. 1423-1427 - Wenchao Du, Louis-Philippe Morency, Jeffrey F. Cohn, Alan W. Black:
Bag-of-Acoustic-Words for Mental Health Assessment: A Deep Autoencoding Approach. 1428-1432 - Rohit Voleti, Stephanie Woolridge, Julie M. Liss, Melissa Milanovic, Christopher R. Bowie, Visar Berisha
:
Objective Assessment of Social Skills Using Automated Language Analysis for Identification of Schizophrenia and Bipolar Disorder. 1433-1437 - Katie Matton
, Melvin G. McInnis, Emily Mower Provost:
Into the Wild: Transitioning from Recognizing Mood in Clinical Interactions to Personal Conversations for Individuals with Bipolar Disorder. 1438-1442 - Morteza Rohanian, Julian Hough, Matthew Purver
:
Detecting Depression with Word-Level Multimodal Fusion. 1443-1447 - Carol Y. Espy-Wilson, Adam C. Lammert, Nadee Seneviratne, Thomas F. Quatieri:
Assessing Neuromotor Coordination in Depression Using Inverted Vocal Tract Variables. 1448-1452
Dialogue Modelling
- Shachi Paul, Rahul Goel, Dilek Hakkani-Tür
:
Towards Universal Dialogue Act Tagging for Task-Oriented Dialogues. 1453-1457 - Rahul Goel, Shachi Paul, Dilek Hakkani-Tür
:
HyST: A Hybrid Approach for Flexible and Accurate Dialogue State Tracking. 1458-1462 - Jirí Martínek
, Pavel Král, Ladislav Lenc
, Christophe Cerisara:
Multi-Lingual Dialogue Act Recognition with Deep Learning Methods. 1463-1467 - Guan-Lin Chao, Ian R. Lane:
BERT-DST: Scalable End-to-End Dialogue State Tracking with Bidirectional Encoder Representations from Transformer. 1468-1472 - David Griol, Zoraida Callejas:
Discovering Dialog Rules by Means of an Evolutionary Approach. 1473-1477 - Xi C. Chen, Adithya Sagar, Justine T. Kao, Tony Y. Li, Christopher Klein, Stephen Pulman, Ashish Garg, Jason D. Williams
:
Active Learning for Domain Classification in a Commercial Spoken Personal Assistant. 1478-1482
Speaker Recognition Evaluation
- Seyed Omid Sadjadi, Craig S. Greenberg, Elliot Singer, Douglas A. Reynolds, Lisa P. Mason, Jaime Hernandez-Cordero
:
The 2018 NIST Speaker Recognition Evaluation. 1483-1487 - Jesús Villalba, Nanxin Chen, David Snyder, Daniel Garcia-Romero, Alan McCree, Gregory Sell, Jonas Borgstrom, Fred Richardson, Suwon Shon, François Grondin, Réda Dehak, Leibny Paola García-Perera, Daniel Povey, Pedro A. Torres-Carrasquillo, Sanjeev Khudanpur, Najim Dehak
:
State-of-the-Art Speaker Recognition for Telephone and Video Speech: The JHU-MIT Submission for NIST SRE18. 1488-1492 - Daniel Garcia-Romero, David Snyder, Gregory Sell, Alan McCree, Daniel Povey, Sanjeev Khudanpur:
x-Vector DNN Refinement with Full-Length Recordings for Speaker Recognition. 1493-1496 - Kong Aik Lee
, Ville Hautamäki
, Tomi H. Kinnunen, Hitoshi Yamamoto, Koji Okabe, Ville Vestman, Jing Huang, Guohong Ding, Hanwu Sun, Anthony Larcher, Rohan Kumar Das
, Haizhou Li, Mickael Rouvier, Pierre-Michel Bousquet, Wei Rao, Qing Wang, Chunlei Zhang, Fahimeh Bahmaninezhad, Héctor Delgado, Massimiliano Todisco:
I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences. 1497-1501 - Elie Khoury
, Khaled Lakhdhar, Andrew Vaughan, Ganesh Sivaraman
, Parav Nagarsheth:
Pindrop Labs' Submission to the First Multi-Target Speaker Detection and Identification Challenge. 1502-1505 - Daniel Garcia-Romero, David Snyder, Shinji Watanabe
, Gregory Sell, Alan McCree, Daniel Povey, Sanjeev Khudanpur:
Speaker Recognition Benchmark Using the CHiME-5 Corpus. 1506-1510
Speech Synthesis: Data and Evaluation
- David Ayllón, Héctor A. Sánchez-Hevia
, Carol Figueroa, Pierre Lanchantin:
Investigating the Effects of Noisy and Reverberant Speech in Text-to-Speech Systems. 1511-1515 - Fang-Yu Kuo, Iris Chuoying Ouyang, Sandesh Aryal, Pierre Lanchantin:
Selection and Training Schemes for Improving TTS Voice Built on Found Data. 1516-1520 - David A. Braude, Matthew P. Aylett, Caoimhín Laoide-Kemp, Simone Ashby, Kristen M. Scott
, Brian Ó Raghallaigh, Anna Braudo, Alex Brouwer, Adriana Stan
:
All Together Now: The Living Audio Dataset. 1521-1525 - Heiga Zen
, Viet Dang, Rob Clark, Yu Zhang, Ron J. Weiss, Ye Jia, Zhifeng Chen, Yonghui Wu:
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech. 1526-1530 - Meysam Shamsi, Damien Lolive, Nelly Barbot, Jonathan Chevelu:
Corpus Design Using Convolutional Auto-Encoder Embeddings for Audio-Book Synthesis. 1531-1535 - Nobukatsu Hojo, Noboru Miyazaki:
Evaluating Intention Communication by TTS Using Explicit Definitions of Illocutionary Act Performance. 1536-1540 - Chen-Chou Lo
, Szu-Wei Fu, Wen-Chin Huang, Xin Wang
, Junichi Yamagishi, Yu Tsao
, Hsin-Min Wang
:
MOSNet: Deep Learning-Based Objective Assessment for Voice Conversion. 1541-1545 - Jason Fong, Pilar Oplustil Gallegos, Zack Hodari, Simon King:
Investigating the Robustness of Sequence-to-Sequence Text-to-Speech Models to Imperfectly-Transcribed Training Data. 1546-1550 - Avashna Govender, Anita E. Wagner, Simon King:
Using Pupil Dilation to Measure Cognitive Load When Listening to Text-to-Speech in Quiet and in Noise. 1551-1555 - Ioannis K. Douros, Jacques Felblinger, Jens Frahm, Karyna Isaieva
, Arun A. Joseph, Yves Laprie, Freddy Odille, Anastasiia Tsukanova, Dirk Voit, Pierre-André Vuissoz:
A Multimodal Real-Time MRI Articulatory Corpus of French for Speech Research. 1556-1560 - Jia-Xiang Chen, Zhen-Hua Ling, Li-Rong Dai:
A Chinese Dataset for Identifying Speakers in Novels. 1561-1565 - Kyubyong Park, Thomas Mulc:
CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages. 1566-1570
Model Training for ASR
- Ievgen Karaulov, Dmytro Tkanov:
Attention Model for Articulatory Features Detection. 1571-1575 - Sibo Tong, Apoorv Vyas, Philip N. Garner
, Hervé Bourlard:
Unbiased Semi-Supervised LF-MMI Training Using Dropout. 1576-1580 - Xiaodong Cui, Michael Picheny:
Acoustic Model Optimization Based on Evolutionary Stochastic Gradient Descent with Anchors for Automatic Speech Recognition. 1581-1585 - Nirmesh J. Shah, Hardik B. Sailor
, Hemant A. Patil:
Whether to Pretrain DNN or not?: An Empirical Analysis for Voice Conversion. 1586-1590 - Mohit Goyal, Varun Srivastava, Prathosh A. P.:
Detection of Glottal Closure Instants from Raw Speech Using Convolutional Neural Networks. 1591-1595 - Joachim Fainberg, Ondrej Klejch, Steve Renals, Peter Bell:
Lattice-Based Lightly-Supervised Acoustic Model Training. 1596-1600 - Wilfried Michel, Ralf Schlüter
, Hermann Ney:
Comparison of Lattice-Free and Lattice-Based Sequence Discriminative Training Criteria for LVCSR. 1601-1605 - Ryo Masumura, Hiroshi Sato, Tomohiro Tanaka, Takafumi Moriya, Yusuke Ijima, Takanobu Oba:
End-to-End Automatic Speech Recognition with a Reconstruction Criterion Using Speech-to-Text and Text-to-Speech Encoder-Decoders. 1606-1610 - Abdelwahab Heba, Thomas Pellegrini, Jean-Pierre Lorré, Régine André-Obrecht:
Char+CV-CTC: Combining Graphemes and Consonant/Vowel Units for CTC-Based ASR Using Multitask Learning. 1611-1615 - Gakuto Kurata, Kartik Audhkhasi:
Guiding CTC Posterior Spike Timings for Improved Posterior Fusion and Knowledge Distillation. 1616-1620 - Takashi Fukuda, Masayuki Suzuki, Gakuto Kurata:
Direct Neuron-Wise Fusion of Cognate Neural Networks. 1621-1625 - Pranav Ladkat, Oleg Rybakov, Radhika Arava, Sree Hari Krishnan Parthasarathi, I-Fan Chen, Nikko Strom:
Two Tiered Distributed Training Algorithm for Acoustic Modeling. 1626-1630 - Pin-Tuan Huang, Hung-Shin Lee, Syu-Siang Wang
, Kuan-Yu Chen, Yu Tsao
, Hsin-Min Wang
:
Exploring the Encoder Layers of Discriminative Autoencoders for LVCSR. 1631-1635 - Gakuto Kurata, Kartik Audhkhasi:
Multi-Task CTC Training with Auxiliary Feature Reconstruction for End-to-End Speech Recognition. 1636-1640 - Mohan Li, Yuanjiang Cao, Weicong Zhou, Min Liu:
Framewise Supervised Training Towards End-to-End Speech Recognition Models: First Results. 1641-1645
Network Architectures for Emotion and Paralinguistics Recognition
- Efthymios Georgiou
, Charilaos Papaioannou, Alexandros Potamianos:
Deep Hierarchical Fusion with Application in Sentiment Analysis. 1646-1650 - Vikramjit Mitra, Sue Booker, Erik Marchi, David Scott Farrar, Ute Dorothea Peitz, Bridget Cheng, Ermine Teves, Anuj Mehta, Devang Naik:
Leveraging Acoustic Cues and Paralinguistic Embeddings to Detect Expression from Voice. 1651-1655 - Jack Parry, Dimitri Palaz, Georgia Clarke, Pauline Lecomte, Rebecca Mead, Michael Berger, Gregor Hofer:
Analysis of Deep Learning Architectures for Cross-Corpus Speech Emotion Recognition. 1656-1660 - Bo Wang, Maria Liakata, Hao Ni
, Terry J. Lyons, Alejo J. Nevado-Holgado
, Kate Saunders:
A Path Signature Approach for Speech Emotion Recognition. 1661-1665 - Olga Egorow, Tarik Mrech, Norman Weißkirchen, Andreas Wendemuth:
Employing Bottleneck and Convolutional Features for Speech-Based Physical Load Detection on Limited Data Amounts. 1666-1670 - Jinming Zhao, Shizhe Chen, Jingjun Liang, Qin Jin:
Speech Emotion Recognition in Dyadic Dialogues with Attentive Interaction Modeling. 1671-1675 - Shun-Chang Zhong, Yun-Shao Lin, Chun-Min Chang, Yi-Ching Liu, Chi-Chun Lee
:
Predicting Group Performances Using a Personality Composite-Network Architecture During Collaborative Task. 1676-1680 - Gao-Yi Chao, Yun-Shao Lin, Chun-Min Chang, Chi-Chun Lee
:
Enforcing Semantic Consistency for Cross Corpus Valence Regression from Speech Using Adversarial Discrepancy Learning. 1681-1685 - Shuiyang Mao, P. C. Ching, Tan Lee
:
Deep Learning of Segment-Level Feature Representation with Multiple Instance Learning for Utterance-Level Speech Emotion Recognition. 1686-1690 - Andreas Triantafyllopoulos, Gil Keren, Johannes Wagner, Ingmar Steiner, Björn W. Schuller
:
Towards Robust Speech Emotion Recognition Using Deep Residual Networks for Speech Enhancement. 1691-1695 - Zhixuan Li, Liang He
, Jingyang Li, Li Wang, Wei-Qiang Zhang:
Towards Discriminative Representations and Unbiased Predictions: Class-Specific Angular Softmax for Speech Emotion Recognition. 1696-1700 - Md Asif Jalal, Erfan Loweimi
, Roger K. Moore
, Thomas Hain
:
Learning Temporal Clusters Using Capsule Routing for Speech Emotion Recognition. 1701-1705
Acoustic Phonetics
- Sonia D'Apolito, Barbara Gili Fivela:
L2 Pronunciation Accuracy and Context: A Pilot Study on the Realization of Geminates in Italian as L2 by French Learners. 1706-1710 - Nisad Jamakovic, Robert Fuchs
:
The Monophthongs of Formal Nigerian English: An Acoustic Analysis. 1711-1715 - Pablo Arantes, Anders Eriksson:
Quantifying Fundamental Frequency Modulation as a Function of Language, Speaking Style and Speaker. 1716-1720 - Niamh E. Kelly, Lara Keshishian:
The Voicing Contrast in Stops and Affricates in the Western Armenian of Lebanon. 1721-1725 - Adèle Jatteau, Ioana Vasilescu, Lori Lamel, Martine Adda-Decker, Nicolas Audibert:
" Gra[f] e!" Word-Final Devoicing of Obstruents in Standard French: An Acoustic Study Based on Large Corpora. 1726-1730 - Chih-Hsiang Huang, Huang-Cheng Chou
, Yi-Tong Wu, Chi-Chun Lee
, Yi-Wen Liu:
Acoustic Indicators of Deception in Mandarin Daily Conversations Recorded from an Interactive Game. 1731-1735 - Barbara Schuppler
, Margaret Zellers
:
Prosodic Effects on Plosive Duration in German and Austrian German. 1736-1740 - Cibu Johny, Alexander Gutkin
, Martin Jansche:
Cross-Lingual Consistency of Phonological Features: An Empirical Study. 1741-1745 - Fanny Guitard-Ivent, Gabriele Chignoli, Cécile Fougeron, Laurianne Georgeton:
Are IP Initial Vowels Acoustically More Distinct? Results from LDA and CNN Classifications. 1746-1750 - Xizi Wei, Melvyn Hunt, Adrian Skilling:
Neural Network-Based Modeling of Phonetic Durations. 1751-1755 - Janina Molczanow
, Beata Lukaszewicz, Anna Lukaszewicz
:
An Acoustic Study of Vowel Undershoot in a System with Several Degrees of Prominence. 1756-1760 - Stephanie Berger
, Oliver Niebuhr
, Margaret Zellers
:
A Preliminary Study of Charismatic Speech on YouTube: Correlating Prosodic Variation with Counts of Subscribers, Views and Likes. 1761-1765 - Shan Luo:
Phonetic Detail Encoding in Explaining the Size of Speech Planning Window. 1766-1770 - Dina El Zarka, Barbara Schuppler
, Francesco Cangemi
:
Acoustic Cues to Topic and Narrow Focus in Egyptian Arabic. 1771-1775 - Kowovi Comivi Alowonou, Jianguo Wei
, Wenhuan Lu, Zhicheng Liu, Kiyoshi Honda, Jianwu Dang:
Acoustic and Articulatory Study of Ewe Vowels: A Comparative Study of Male and Female. 1776-1780
Speech Enhancement: Noise Attenuation
- Ya'nan Guo, Ziping Zhao, Yide Ma, Björn W. Schuller
:
Speech Augmentation via Speaker-Specific Noise in Unseen Environment. 1781-1785 - Xiang Hao, Xiangdong Su, Zhiyu Wang, Hui Zhang, Batushiren:
UNetGAN: A Robust Speech Enhancement Approach in Time Domain for Extremely Low Signal-to-Noise Ratio Condition. 1786-1790 - Santiago Pascual, Joan Serrà, Antonio Bonafonte:
Towards Generalized Speech Enhancement with Generative Adversarial Networks. 1791-1795 - Xiaoqi Li, Yaxing Li, Meng Li, Shan Xu, Yuanjie Dong, Xinrong Sun, Shengwu Xiong
:
A Convolutional Neural Network with Non-Local Module for Speech Enhancement. 1796-1800 - Yu-Chen Lin, Yi-Te Hsu, Szu-Wei Fu, Yu Tsao
, Tei-Wei Kuo
:
IA-NET: Acceleration and Compression of Speech Enhancement Using Integer-Adder Deep Neural Network. 1801-1805 - Li Chai, Jun Du, Chin-Hui Lee:
KL-Divergence Regularized Deep Neural Network Adaptation for Low-Resource Speaker-Dependent Speech Enhancement. 1806-1810 - Jorge Llombart, Dayana Ribas, Antonio Miguel, Luis Vicente
, Alfonso Ortega Giménez
, Eduardo Lleida
:
Speech Enhancement with Wide Residual Networks in Reverberant Environments. 1811-1815 - Chandan K. A. Reddy, Ebrahim Beyrami, Jamie Pool, Ross Cutler, Sriram Srinivasan, Johannes Gehrke:
A Scalable Noisy Speech Dataset and Online Subjective Test Framework. 1816-1820 - Nagaraj Adiga, Yannis Pantazis, Vassilis Tsiaras, Yannis Stylianou:
Speech Enhancement for Noise-Robust Speech Synthesis Using Wasserstein GAN. 1821-1825 - P. V. Muhammed Shifas, Nagaraj Adiga, Vassilis Tsiaras, Yannis Stylianou:
A Non-Causal FFTNet Architecture for Speech Enhancement. 1826-1830 - Daniel T. Braithwaite, W. Bastiaan Kleijn
:
Speech Enhancement with Variance Constrained Autoencoders. 1831-1835
Language Learning and Databases
- Konstantinos Kyriakopoulos, Kate M. Knill, Mark J. F. Gales:
A Deep Learning Approach to Automatic Characterisation of Rhythm in Non-Native English Speech. 1836-1840 - Danny Merkx
, Stefan L. Frank, Mirjam Ernestus:
Language Learning Using Speech to Image Retrieval. 1841-1845 - Lucy Skidmore
, Roger K. Moore
:
Using Alexa for Flashcard-Based Learning. 1846-1850 - John H. L. Hansen, Aditya Joglekar, Meena Chandra Shekhar, Vinay Kothapally, Chengzhu Yu, Lakshmish Kaushik, Abhijeet Sangwan:
The 2019 Inaugural Fearless Steps Challenge: A Giant Leap for Naturalistic Audio. 1851-1855 - Kuan-Yu Chen, Che-Ping Tsai, Da-Rong Liu, Hung-yi Lee, Lin-Shan Lee:
Completely Unsupervised Phoneme Recognition by a Generative Adversarial Network Harmonized with Iteratively Refined Hidden Markov Models. 1856-1860 - Tasavat Trisitichoke, Shintaro Ando, Daisuke Saito, Nobuaki Minematsu:
Analysis of Native Listeners' Facial Microexpressions While Shadowing Non-Native Speech - Potential of Shadowers' Facial Expressions for Comprehensibility Prediction. 1861-1865 - Reima Karhila, Anna-Riikka Smolander, Sari Ylinen, Mikko Kurimo:
Transparent Pronunciation Scoring Using Articulatorily Weighted Phoneme Edit Distance. 1866-1870 - Su-Youn Yoon, Chong Min Lee, Klaus Zechner, Keelan Evanini:
Development of Robust Automated Scoring Models Using Adversarial Input for Oral Proficiency Assessment. 1871-1875 - Yiting Lu, Mark J. F. Gales, Kate M. Knill, P. P. Manakul, Linlin Wang, Yu Wang:
Impact of ASR Performance on Spoken Grammatical Error Detection. 1876-1880 - Seung Hee Yang, Minhwa Chung:
Self-Imitating Feedback Generation Using GAN for Computer-Assisted Pronunciation Training. 1881-1885
Emotion and Personality in Conversation
- Chiori Hori, Anoop Cherian, Tim K. Marks, Takaaki Hori:
Joint Student-Teacher Learning for Audio-Visual Scene-Aware Dialog. 1886-1890 - Karthik Gopalakrishnan, Behnam Hedayatnia, Qinlang Chen, Anna Gottardi, Sanjeev Kwatra, Anu Venkatesh, Raefer Gabriel, Dilek Hakkani-Tür
:
Topical-Chat: Towards Knowledge-Grounded Open-Domain Conversations. 1891-1895 - Uliyana Kubasova, Gabriel Murray, McKenzie Braley:
Analyzing Verbal and Nonverbal Features for Predicting Group Performance. 1896-1900 - Victor R. Martinez, Nikolaos Flemotomos
, Victor Ardulov, Krishna Somandepalli, Simon B. Goldberg, Zac E. Imel, David C. Atkins, Shrikanth Narayanan:
Identifying Therapist and Client Personae for Therapeutic Alliance Estimation. 1901-1905 - Kristin Haake, Sarah Schimke, Simon Betz, Sina Zarrieß:
Do Hesitations Facilitate Processing of Partially Defective System Utterances? An Exploratory Eye Tracking Study. 1906-1910 - Bin Li, Yuan Jia:
Influence of Contextuality on Prosodic Realization of Information Structure in Chinese Dialogues. 1911-1915 - Kristijan Gjoreski, Aleksandar Gjoreski, Ivan Kraljevski
, Diane Hirschfeld:
Cross-Lingual Transfer Learning for Affective Spoken Dialogue Systems. 1916-1920 - Mingzhi Yu, Emer Gilmartin
, Diane J. Litman:
Identifying Personality Traits Using Overlap Dynamics in Multiparty Dialogue. 1921-1925 - Zakaria Aldeneh, Mimansa Jaiswal, Michael Picheny, Melvin G. McInnis, Emily Mower Provost:
Identifying Mood Episodes Using Dialogue Features from Clinical Interviews. 1926-1930 - Nichola Lubold, Stephanie A. Borrie
, Tyson S. Barrett, Megan M. Willi, Visar Berisha
:
Do Conversational Partners Entrain on Articulatory Precision? 1931-1935 - Zheng Lian
, Jianhua Tao, Bin Liu, Jian Huang:
Conversational Emotion Analysis via Attention Mechanisms. 1936-1940
Voice Quality, Speech Perception, and Prosody
- Emma O'Neill
, Julie Carson-Berndsen
:
The Effect of Phoneme Distribution on Perceptual Similarity in English. 1941-1945 - Sofoklis Kakouros
, Antti Suni
, Juraj Simko, Martti Vainio
:
Prosodic Representations of Prominence Classification Neural Networks and Autoencoders Using Bottleneck Features. 1946-1950 - Sharon Peperkamp
, Alvaro Martin Iturralde Zurita:
Compensation for French Liquid Deletion During Auditory Sentence Processing. 1951-1955 - Daniil Kocharov
, Tatiana Kachkovskaia, Pavel A. Skrelin
:
Prosodic Factors Influencing Vowel Reduction in Russian. 1956-1960 - Christer Gobl
, Ailbhe Ní Chasaide:
Time to Frequency Domain Mapping of the Voice Source: The Influence of Open Quotient and Glottal Skew on the Low End of the Source Spectrum. 1961-1965 - Eleanor Chodroff
, Jennifer S. Cole
:
Testing the Distinctiveness of Intonational Tunes: Evidence from Imitative Productions in American English. 1966-1970 - Sangwook Park, David K. Han, Mounya Elhilali
:
A Study of a Cross-Language Perception Based on Cortical Analysis Using Biomimetic STRFs. 1971-1975 - Pavel Sturm, Jan Volín
:
Perceptual Evaluation of Early versus Late F0 Peaks in the Intonation Structure of Czech Question-Word Questions. 1976-1980 - Anneliese Kelterer
, Barbara Schuppler
:
Acoustic Correlates of Phonation Type in Chichimec. 1981-1985 - Yu-Ren Chien, Michal Borský, Jón Guðnason
:
F0 Variability Measures Based on Glottal Closure Instants. 1986-1989 - Lauri Tavi, Tanel Alumäe
, Stefan Werner:
Recognition of Creaky Voice from Emergency Calls. 1990-1994
Speech Signal Characterization 3
- Shuzhuang Xu, Hiroshi Shimodaira
:
Direct F0 Estimation with Neural-Network-Based Regression. 1995-1999 - Tanay Sharma, Rohith Chandrashekar Aralikatti, Dilip Kumar Margam, Abhinav Thanda, Sharad Roy, Pujitha Appan Kandala, Shankar M. Venkatesan:
Real Time Online Visual End Point Detection Using Unidirectional LSTM. 2000-2004 - Luc Ardaillon, Axel Roebel
:
Fully-Convolutional Network for Pitch Estimation of Speech Signals. 2005-2009 - Mingye Dong, Jie Wu, Jian Luan:
Vocal Pitch Extraction in Polyphonic Music Using Convolutional Residual Network. 2010-2014 - Bidisha Sharma, Rohan Kumar Das
, Haizhou Li
:
Multi-Level Adaptive Speech Activity Detector for Speech in Naturalistic Environments. 2015-2019 - Bidisha Sharma, Rohan Kumar Das
, Haizhou Li
:
On the Importance of Audio-Source Separation for Singer Identification in Polyphonic Music. 2020-2024 - Hiroko Terasawa, Kenta Wakasa, Hideki Kawahara, Ken-Ichi Sakakibara:
Investigating the Physiological and Acoustic Contrasts Between Choral and Operatic Singing. 2025-2029 - Ruixi Lin, Charles Costello, Charles Jankowski, Vishwas Mruthyunjaya
:
Optimizing Voice Activity Detection for Noisy Conditions. 2030-2034 - Taiki Yamamoto, Ryota Nishimura, Masayuki Misaki, Norihide Kitaoka:
Small-Footprint Magic Word Detection Method Using Convolutional LSTM Neural Network. 2035-2039 - Chitralekha Gupta
, Emre Yilmaz, Haizhou Li
:
Acoustic Modeling for Automatic Lyrics-to-Audio Alignment. 2040-2044 - Anastasios Vafeiadis, Eleftherios Fanioudakis, Ilyas Potamitis, Konstantinos Votis
, Dimitrios Giakoumis
, Dimitrios Tzovaras
, Liming Chen
, Raouf Hamzaoui:
Two-Dimensional Convolutional Recurrent Neural Networks for Speech Activity Detection. 2045-2049 - Tokihiko Kaburagi:
A Study of Soprano Singing in Light of the Source-Filter Interaction. 2050-2054
Speech Synthesis: Pronunciation, Multilingual, and Low Resource
- Yuxiang Zou, Linhao Dong, Bo Xu:
Boosting Character-Based Chinese Speech Synthesis via Multi-Task Learning and Dictionary Tutoring. 2055-2059 - Liumeng Xue, Wei Song, Guanghui Xu, Lei Xie, Zhizheng Wu:
Building a Mixed-Lingual Neural TTS System with Only Monolingual Data. 2060-2064 - Alex Sokolov, Tracy Rohlin, Ariya Rastrow:
Neural Machine Translation for Multilingual Grapheme-to-Phoneme Conversion. 2065-2069 - Jason Taylor, Korin Richmond
:
Analysis of Pronunciation Learning in End-to-End Speech Synthesis. 2070-2074 - Yuan-Jui Chen, Tao Tu, Cheng-chieh Yeh, Hung-yi Lee:
End-to-End Text-to-Speech for Low-Resource Languages by Cross-Lingual Transfer Learning. 2075-2079 - Yu Zhang, Ron J. Weiss, Heiga Zen
, Yonghui Wu, Zhifeng Chen, R. J. Skerry-Ryan, Ye Jia, Andrew Rosenberg, Bhuvana Ramabhadran:
Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning. 2080-2084 - Markéta Juzová, Daniel Tihelka
, Jakub Vít:
Unified Language-Independent DNN-Based G2P Converter. 2085-2089 - Dongyang Dai, Zhiyong Wu, Shiyin Kang, Xixin Wu, Jia Jia, Dan Su, Dong Yu, Helen Meng:
Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-Trained BERT. 2090-2094 - Sevinj Yolchuyeva, Géza Németh
, Bálint Gyires-Tóth:
Transformer Based Grapheme-to-Phoneme Conversion. 2095-2099 - Harry Bleyan, Sandy Ritchie, Jonas Fromseier Mortensen, Daan van Esch:
Developing Pronunciation Models in New Languages Faster by Exploiting Common Grapheme-to-Phoneme Correspondences Across Languages. 2100-2104 - Mengnan Chen, Minchuan Chen, Shuang Liang, Jun Ma, Lei Chen, Shaojun Wang, Jing Xiao:
Cross-Lingual, Multi-Speaker Text-To-Speech Synthesis Using Neural Speaker Embedding. 2105-2109 - Zexin Cai, Yaogen Yang, Chuxiong Zhang, Xiaoyi Qin, Ming Li:
Polyphone Disambiguation for Mandarin Chinese Using Conditional Neural Network with Multi-Level Embedding Features. 2110-2114 - Hao Sun, Xu Tan, Jun-Wei Gan, Hongzhi Liu
, Sheng Zhao, Tao Qin
, Tie-Yan Liu:
Token-Level Ensemble Distillation for Grapheme-to-Phoneme Conversion. 2115-2119
Cross-Lingual and Multilingual ASR
- Xinjian Li, Siddharth Dalmia, Alan W. Black, Florian Metze:
Multilingual Speech Recognition with Corpus Relatedness Sampling. 2120-2124 - Harish Arsikere, Ashtosh Sapru, Sri Garimella:
Multi-Dialect Acoustic Modeling Using Phone Mapping and Online i-Vectors. 2125-2129 - Anjuli Kannan, Arindrima Datta, Tara N. Sainath, Eugene Weinstein, Bhuvana Ramabhadran, Yonghui Wu, Ankur Bapna, Zhifeng Chen, Seungji Lee:
Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model. 2130-2134 - Carlos Mendes, Alberto Abad
, João Paulo Neto, Isabel Trancoso
:
Recognition of Latin American Spanish Using Multi-Task Learning. 2135-2139 - Thibault Viglino, Petr Motlícek
, Milos Cernak:
End-to-End Accented Speech Recognition. 2140-2144 - Sheng Li
, Chenchen Ding, Xugang Lu, Peng Shen, Tatsuya Kawahara
, Hisashi Kawai:
End-to-End Articulatory Attribute Modeling for Low-Resource Multilingual Speech Recognition. 2145-2149 - Karan Taneja, Satarupa Guha, Preethi Jyothi, Basil Abraham:
Exploiting Monolingual Speech Corpora for Code-Mixed Speech Recognition. 2150-2154 - Ke Hu, Antoine Bruguier, Tara N. Sainath, Rohit Prabhavalkar
, Golan Pundak:
Phoneme-Based Contextualization for Cross-Lingual Speech Recognition in End-to-End Models. 2155-2159 - Yerbolat Khassanov, Haihua Xu, Van Tung Pham, Zhiping Zeng, Eng Siong Chng
, Chongjia Ni, Bin Ma:
Constrained Output Embeddings for End-to-End Code-Switching Speech Recognition with Only Monolingual Data. 2160-2164 - Zhiping Zeng, Yerbolat Khassanov, Van Tung Pham, Haihua Xu, Eng Siong Chng
, Haizhou Li
:
On the End-to-End Solution to Mandarin-English Code-Switching Speech Recognition. 2165-2169 - Shiliang Zhang, Yuan Liu, Ming Lei, Bin Ma, Lei Xie:
Towards Language-Universal Mandarin-English Speech Recognition. 2170-2174
Spoken Term Detection, Confidence Measure, and End-to-End Speech Recognition
- Prakhar Swarup, Roland Maas, Sri Garimella, Sri Harish Mallidi, Björn Hoffmeister:
Improving ASR Confidence Scores for Alexa Using Acoustic and Hypothesis Embeddings. 2175-2179 - Shiliang Zhang, Ming Lei, Zhijie Yan:
Investigation of Transformer Based Spelling Correction Model for CTC-Based End-to-End Mandarin Speech Recognition. 2180-2184 - Cal Peyser, Hao Zhang, Tara N. Sainath, Zelin Wu:
Improving Performance of End-to-End ASR on Numeric Sequences. 2185-2189 - Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengqi Wen, Zhengkun Tian, Chenghao Zhao, Cunhang Fan:
A Time Delay Neural Network with Shared Weight Self-Attention for Small-Footprint Keyword Spotting. 2190-2194 - Chieh-Chi Kao, Ming Sun, Yixin Gao, Shiv Vitaladevuni, Chao Wang:
Sub-Band Convolutional Neural Networks for Small-Footprint Spoken Term Classification. 2195-2199 - Sheng Li
, Xugang Lu, Chenchen Ding, Peng Shen, Tatsuya Kawahara
, Hisashi Kawai:
Investigating Radical-Based End-to-End Speech Recognition Systems for Chinese Dialects and Japanese. 2200-2204 - Jiaqi Guo, Yongbin You, Yanmin Qian, Kai Yu:
Joint Decoding of CTC Based Systems for Speech Recognition. 2205-2209 - Tomohiro Tanaka, Ryo Masumura, Takafumi Moriya, Takanobu Oba, Yushi Aono:
A Joint End-to-End and DNN-HMM Hybrid Automatic Speech Recognition System with Transferring Sharable Knowledge. 2210-2214 - Karan Malhotra, Shubham Bansal, Sriram Ganapathy:
Active Learning Methods for Low Resource End-to-End Speech Recognition. 2215-2219 - Martin Karafiát
, Murali Karthick Baskar, Shinji Watanabe
, Takaaki Hori, Matthew Wiesner, Jan Cernocký
:
Analysis of Multilingual Sequence-to-Sequence Speech Recognition Systems. 2220-2224 - Michal Zapotoczny, Piotr Pietrzak, Adrian Lancucki, Jan Chorowski
:
Lattice Generation in Attention-Based Speech Recognition Models. 2225-2229 - Martin Jansche, Alexander Gutkin
:
Sampling from Stochastic Finite Automata with Applications to CTC Decoding. 2230-2234 - Lukasz Dudziak, Mohamed S. Abdelfattah, Ravichander Vipperla, Stefanos Laskaridis, Nicholas D. Lane:
ShrinkML: End-to-End ASR Model Compression Using Reinforcement Learning. 2235-2239 - Yashesh Gaur, Jinyu Li
, Zhong Meng, Yifan Gong:
Acoustic-to-Phrase Models for Speech Recognition. 2240-2244 - Ruizhi Li, Gregory Sell, Hynek Hermansky
:
Performance Monitoring for End-to-End Speech Recognition. 2245-2249
Speech Perception
- Michelle Cohn
, Georgia Zellou, Santiago Barreda:
The Role of Musical Experience in the Perceptual Weighting of Acoustic Cues for the Obstruent Coda Voicing Contrast in American English. 2250-2254 - Natalie Lewandowski
, Daniel Duran
:
Individual Differences in Implicit Attention to Phonetic Detail in Speech Perception. 2255-2259 - Kaylah Lalonde:
Effects of Natural Variability in Cross-Modal Temporal Correlations on Audiovisual Speech Recognition Benefit. 2260-2264 - Martijn Bentum, Louis ten Bosch
, Antal van den Bosch
, Mirjam Ernestus:
Listening with Great Expectations: An Investigation of Word Form Anticipations in Naturalistic Speech. 2265-2269 - Martijn Bentum, Louis ten Bosch
, Antal van den Bosch
, Mirjam Ernestus:
Quantifying Expectation Modulation in Human Speech Processing. 2270-2274 - Daniel R. Turner
, Ann R. Bradlow, Jennifer S. Cole
:
Perception of Pitch Contours in Speech and Nonspeech. 2275-2279 - Louis ten Bosch
, Lou Boves, Kimberley Mulder:
Analyzing Reaction Time and Error Sequences in Lexical Decision Experiments. 2280-2284 - Li Liu
, Jianze Li
, Gang Feng, Xiao-Ping (Steven) Zhang:
Automatic Detection of the Temporal Segmentation of Hand Movements in British English Cued Speech. 2285-2289 - Yuriko Yokoe:
Place Shift as an Autonomous Process: Evidence from Japanese Listeners. 2290-2294 - Julien Meyer
, Laure Dentel, Silvain Gerber, Rachid Ridouane
:
A Perceptual Study of CV Syllables in Both Spoken and Whistled Speech: A Tashlhiyt Berber Perspective. 2295-2299 - Han-Chi Hsieh, Wei-Zhong Zheng, Ko-Chiang Chen, Ying-Hui Lai:
Consonant Classification in Mandarin Based on the Depth Image Feature: A Pilot Study. 2300-2304 - Shiri Lev-Ari, Robin Dodsworth, Jeff Mielke
, Sharon Peperkamp
:
The Different Roles of Expectations in Phonetic and Lexical Processing. 2305-2309 - Bruno Ferenc Segedin, Michelle Cohn
, Georgia Zellou:
Perceptual Adaptation to Device and Human Voices: Learning and Generalization of a Phonetic Shift Across Real and Voice-AI Talkers. 2310-2314 - Katerina Papadimitriou, Gerasimos Potamianos:
End-to-End Convolutional Sequence Learning for ASL Fingerspelling Recognition. 2315-2319
Topics in Speech and Audio Signal Processing
- Krishna Somandepalli, Naveen Kumar, Arindam Jati, Panayiotis G. Georgiou, Shrikanth Narayanan:
Multiview Shared Subspace Learning Across Speakers and Speech Commands. 2320-2324 - Chelzy Belitz, Hussnain Ali, John H. L. Hansen:
A Machine Learning Based Clustering Protocol for Determining Hearing Aid Initial Configurations from Pure-Tone Audiograms. 2325-2329 - Truc Nguyen, Franz Pernkopf
:
Acoustic Scene Classification with Mismatched Devices Using CliqueNets and Mixup Data Augmentation. 2330-2334 - Mohsin Y. Ahmed, Md. Mahbubur Rahman, Jilong Kuang:
DeepLung: Smartphone Convolutional Neural Network-Based Inference of Lung Anomalies for Pulmonary Patients. 2335-2339 - Roger K. Moore
, Lucy Skidmore:
On the Use/Misuse of the Term 'Phoneme'. 2340-2344 - Hannah Muckenhirn, Vinayak Abrol, Mathew Magimai-Doss, Sébastien Marcel:
Understanding and Visualizing Raw Waveform-Based CNNs. 2345-2349 - Kevin Kilgour, Mauricio Zuluaga, Dominik Roblek, Matthew Sharifi:
Fréchet Audio Distance: A Reference-Free Metric for Evaluating Music Enhancement Algorithms. 2350-2354 - Yuan Gong
, Jian Yang, Jacob Huber, Mitchell MacKnight, Christian Poellabauer
:
ReMASC: Realistic Replay Attack Corpus for Voice Controlled Systems. 2355-2359 - Balamurali B. T., Jer-Ming Chen
:
Analyzing Intra-Speaker and Inter-Speaker Vocal Tract Impedance Characteristics in a Low-Dimensional Feature Space Using t-SNE. 2360-2363
Speech Processing and Analysis
- Geon Woo Lee, Jung Hyuk Lee, Seong Ju Kim, Hong Kook Kim:
Directional Audio Rendering Using a Neural Network Based Personalized HRTF. 2364-2365 - Wikus Pienaar, Daan Wissing:
Online Speech Processing and Analysis Suite. 2366-2367 - Dieter Maurer, Heidy Suter, Christian d'Hereuse, Volker Dellwo:
Formant Pattern and Spectral Shape Ambiguity of Vowel Sounds, and Related Phenomena of Vowel Acoustics - Exemplary Evidence. 2368-2369 - Anton Noll, Jonathan Stuefer, Nicola Klingler, Hannah Leykum, Carina Lozo, Jan Luttenberger, Michael Pucher, Carolin Schmid:
Sound Tools eXtended (STx) 5.0 - A Powerful Sound Analysis Tool Optimized for Speech. 2370-2371 - Mohamed Eldesouki, Naassih Gopee, Ahmed Ali, Kareem Darwish:
FarSpeech: Arabic Natural Language Processing for Live Arabic Speech. 2372-2373 - Fasih Haider, Saturnino Luz:
A System for Real-Time Privacy Preserving Data Collection for Ambient Assisted Living. 2374-2375 - Chitralekha Gupta, Karthika Vijayan, Bidisha Sharma, Xiaoxue Gao, Haizhou Li:
NUS Speak-to-Sing: A Web Platform for Personalized Speech-to-Singing Conversion. 2376-2377
Keynote 3: Manfred Kaltenbacher
- Manfred Kaltenbacher:
Physiology and Physics of Voice Production.
The Interspeech 2019 Computational Paralinguistics Challenge (ComParE)
- Björn W. Schuller
, Anton Batliner, Christian Bergler, Florian B. Pokorny
, Jarek Krajewski, Margaret Cychosz, Ralf Vollmann
, Sonja-Dana Roelen, Sebastian Schnieder, Elika Bergelson, Alejandrina Cristià, Amanda Seidl, Anne S. Warlaumont, Lisa Yankowitz, Elmar Nöth, Shahin Amiriparian
, Simone Hantke, Maximilian Schmitt:
The INTERSPEECH 2019 Computational Paralinguistics Challenge: Styrian Dialects, Continuous Sleepiness, Baby Sounds & Orca Activity. 2378-2382 - S. Pavankumar Dubagunta, Mathew Magimai-Doss:
Using Speech Production Knowledge for Raw Waveform Modelling Based Styrian Dialect Identification. 2383-2387 - Daniel Elsner, Stefan Langer, Fabian Ritz
, Robert Müller, Steffen Illium
:
Deep Neural Baselines for Computational Paralinguistics. 2388-2392 - Thomas Kisler, Raphael Winkelmann, Florian Schiel:
Styrian Dialect Classification: Comparing and Fusing Classifiers Based on a Feature Selection Using a Genetic Algorithm. 2393-2397 - Sung-Lin Yeh, Gao-Yi Chao, Bo-Hao Su, Yu-Lin Huang, Meng-Han Lin, Yin-Chun Tsai, Yu-Wen Tai, Zheng-Chi Lu, Chieh-Yu Chen, Tsung-Ming Tai, Chiu-Wang Tseng, Cheng-Kuang Lee, Chi-Chun Lee
:
Using Attention Networks and Adversarial Augmentation for Styrian Dialect Continuous Sleepiness and Baby Sound Recognition. 2398-2402 - Peter Wu, Sai Krishna Rallabandi, Alan W. Black, Eric Nyberg:
Ordinal Triplet Loss: Investigating Sleepiness Detection from Speech. 2403-2407 - Vijay Ravi, Soo Jin Park, Amber Afshan, Abeer Alwan:
Voice Quality and Between-Frame Entropy for Sleepiness Estimation. 2408-2412 - Gábor Gosztolya:
Using Fisher Vector and Bag-of-Audio-Words Representations to Identify Styrian Dialects, Sleepiness, Baby & Orca Sounds. 2413-2417 - Rohan Kumar Das
, Haizhou Li
:
Instantaneous Phase and Long-Term Acoustic Cues for Orca Activity Detection. 2418-2422 - Dominik Schiller, Tobias Huber
, Florian Lingenfelser, Michael Dietz, Andreas Seiderer
, Elisabeth André
:
Relevance-Based Feature Masking: Improving Neural Network Based Whale Classification Through Explainable Artificial Intelligence. 2423-2427 - Marie-José Caraty, Claude Montacié
:
Spatial, Temporal and Spectral Multiresolution Analysis for the INTERSPEECH 2019 ComParE Challenge. 2428-2432 - Haiwei Wu, Weiqing Wang, Ming Li:
The DKU-LENOVO Systems for the INTERSPEECH 2019 Computational Paralinguistic Challenge. 2433-2437
The VOiCES from a Distance Challenge — O
- Mahesh Kumar Nandwana, Julien van Hout, Colleen Richey, Mitchell McLaren, María Auxiliadora Barrios, Aaron Lawson:
The VOiCES from a Distance Challenge 2019. 2438-2442 - Sergey Novoselov, Aleksei Gusev, Artem Ivanov, Timur Pekhovsky, Andrey Shulipa, Galina Lavrentyeva, Vladimir Volokhov, Alexandr Kozlov:
STC Speaker Recognition Systems for the VOiCES from a Distance Challenge. 2443-2447 - Pavel Matejka, Oldrich Plchot, Hossein Zeinali, Ladislav Mosner, Anna Silnova, Lukás Burget
, Ondrej Novotný, Ondrej Glembek:
Analysis of BUT Submission in Far-Field Scenarios of VOiCES 2019 Challenge. 2448-2452 - Ivan Medennikov
, Yuri Y. Khokhlov, Aleksei Romanenko
, Ivan Sorokin, Anton Mitrofanov
, Vladimir Bataev
, Andrei Andrusenko
, Tatiana Prisyach, Mariya Korenevskaya, Oleg Petrov
, Alexander Zatvornitskiy
:
The STC ASR System for the VOiCES from a Distance Challenge 2019. 2453-2457 - Tze Yuang Chong, Kye Min Tan, Kah Kuan Teh, Chang Huai You, Hanwu Sun, Tran Huy Dat:
The I2R's ASR System for the VOiCES from a Distance Challenge 2019. 2458-2462
The VOiCES from a Distance Challenge — P
- Mahesh Kumar Nandwana, Julien van Hout, Colleen Richey, Mitchell McLaren, Maria Alejandra Barrios, Aaron Lawson:
The VOiCES from a Distance Challenge 2019. - Sergey Novoselov, Aleksei Gusev, Artem Ivanov, Timur Pekhovsky, Andrey Shulipa, Galina Lavrentyeva, Vladimir Volokhov, Alexandr Kozlov:
STC Speaker Recognition Systems for the VOiCES from a Distance Challenge. - Pavel Matejka, Oldrich Plchot, Hossein Zeinali, Ladislav Mosner, Anna Silnova, Lukás Burget, Ondrej Novotný, Ondrej Glembek:
Analysis of BUT Submission in Far-Field Scenarios of VOiCES 2019 Challenge. - Ivan Medennikov, Yuri Y. Khokhlov, Aleksei Romanenko, Ivan Sorokin, Anton Mitrofanov, Vladimir Bataev, Andrei Andrusenko, Tatiana Prisyach, Mariya Korenevskaya, Oleg Petrov, Alexander Zatvornitskiy:
The STC ASR System for the VOiCES from a Distance Challenge 2019. - Tze Yuang Chong, Kye Min Tan, Kah Kuan Teh, Chang Huai You, Hanwu Sun, Tran Huy Dat:
The I2R's ASR System for the VOiCES from a Distance Challenge 2019. - Arindam Jati, Raghuveer Peri, Monisankha Pal, Tae Jin Park, Naveen Kumar, Ruchir Travadi, Panayiotis G. Georgiou, Shrikanth Narayanan:
Multi-Task Discriminative Training of Hybrid DNN-TVM Model for Speaker Verification with Noisy and Far-Field Speech. 2463-2467 - David Snyder, Jesús Villalba, Nanxin Chen, Daniel Povey, Gregory Sell, Najim Dehak
, Sanjeev Khudanpur:
The JHU Speaker Recognition System for the VOiCES 2019 Challenge. 2468-2472 - Jonathan Huang, Tobias Bocklet
:
Intel Far-Field Speaker Recognition System for VOiCES Challenge 2019. 2473-2477 - Hanwu Sun, Kah Kuan Teh, Ivan Kukanov, Tran Huy Dat:
The I2R's Submission to VOiCES Distance Speaker Recognition Challenge 2019. 2478-2482 - Yulong Liang, Lin Yang, Xuyang Wang, Yingjie Li, Chen Jia, Junjie Wang:
The LeVoice Far-Field Speech Recognition System for VOiCES from a Distance Challenge 2019. 2483-2487 - Yiming Wang, David Snyder, Hainan Xu, Vimal Manohar, Phani Sankar Nidadavolu, Daniel Povey, Sanjeev Khudanpur:
The JHU ASR System for VOiCES from a Distance Challenge 2019. 2488-2492 - Danwei Cai, Xiaoyi Qin, Weicheng Cai, Ming Li:
The DKU System for the Speaker Recognition Task of the 2019 VOiCES from a Distance Challenge. 2493-2497
Voice Quality Characterization for Clinical Voice Assessment: Voice Production, Acoustics, and Auditory Perception
- Yermiyahu Hauptman, Ruth Aloni-Lavi, Itshak Lapidot, Tanya Gurevich, Yael Manor, Stav Naor, Noa Diamant, Irit Opher:
Identifying Distinctive Acoustic and Spectral Features in Parkinson's Disease. 2498-2502 - Carlo Drioli, Philipp Aichinger:
Aerodynamics and Lumped-Masses Combined with Delay Lines for Modeling Vertical and Anterior-Posterior Phase Differences in Pathological Vocal Fold Vibration. 2503-2507 - Sudarsana Reddy Kadiri
, Paavo Alku
:
Mel-Frequency Cepstral Coefficients of Voice Source Waveforms for Classification of Phonation Types in Speech. 2508-2512 - Sunghye Cho
, Mark Y. Liberman
, Neville Ryant, Meredith Cola
, Robert T. Schultz, Julia Parish-Morris
:
Automatic Detection of Autism Spectrum Disorder in Children Using Acoustic and Text Features from Brief Natural Conversations. 2513-2517 - Jean Schoentgen, Philipp Aichinger:
Analysis and Synthesis of Vocal Flutter and Vocal Jitter. 2518-2522 - Felix Schaeffler, Stephen Jannetts, Janet Beck:
Reliability of Clinical Voice Parameters Captured with Smartphones - Measurements of Added Noise and Spectral Tilt. 2523-2527 - Meredith Moore
, Michael Saxon, Hemanth Venkateswara
, Visar Berisha
, Sethuraman Panchanathan:
Say What? A Dataset for Exploring the Error Patterns That Two ASR Engines Make. 2528-2532
Prosody
- Nigel G. Ward:
Survey Talk: Prosody Research and Applications: The State of the Art. - Simon Roessig
, Doris Mücke, Lena Pagel:
Dimensions of Prosodic Prominence in an Attractor Model. 2533-2537 - Antti Suni
, Marcin Wlodarczak
, Martti Vainio
, Juraj Simko:
Comparative Analysis of Prosodic Characteristics Using WaveNet Embeddings. 2538-2542 - Andy Murphy
, Irena Yanushevskaya
, Ailbhe Ní Chasaide, Christer Gobl
:
The Role of Voice Quality in the Perception of Prominence in Synthetic Speech. 2543-2547 - Rachel Albar, Hiyon Yoo:
Phonological Awareness of French Rising Contours in Japanese Learners. 2548-2552
Speech and Audio Classification 1
- Masaki Okawa, Takuya Saito, Naoki Sawada, Hiromitsu Nishizaki:
Audio Classification of Bit-Representation Waveform. 2553-2557 - Manjunath Mulimani, Shashidhar G. Koolagudi:
Locality-Constrained Linear Coding Based Fused Visual Features for Robust Acoustic Event Classification. 2558-2562 - Yu-Han Shen
, Ke-Xin He, Wei-Qiang Zhang:
Learning How to Listen: A Temporal-Frequential Attention Model for Sound Event Detection. 2563-2567 - Logan Ford, Hao Tang, François Grondin, James R. Glass:
A Deep Residual Network for Large-Scale Acoustic Scene Analysis. 2568-2572 - Chandan K. A. Reddy, Ross Cutler, Johannes Gehrke:
Supervised Classifiers for Audio Impairments with Noisy Labels. 2573-2577 - Lorenzo Tarantino, Philip N. Garner
, Alexandros Lazaridis:
Self-Attention for Speech Emotion Recognition. 2578-2582
Singing and Multimodal Synthesis
- Eliya Nachmani, Lior Wolf:
Unsupervised Singing Voice Conversion. 2583-2587 - Juheon Lee, Hyeong-Seok Choi, Chang-Bin Jeon, Junghyun Koo, Kyogu Lee:
Adversarially Trained End-to-End Korean Singing Voice Synthesis System. 2588-2592 - Yuan-Hao Yi, Yang Ai, Zhen-Hua Ling, Li-Rong Dai:
Singing Voice Synthesis Using Deep Autoregressive Neural Networks for Acoustic Modeling. 2593-2597 - Sara Dahmani, Vincent Colotte, Valérian Girard, Slim Ouni
:
Conditional Variational Auto-Encoder for Text-Driven Expressive AudioVisual Speech Synthesis. 2598-2602 - David Ayllón, Fernando Villavicencio, Pierre Lanchantin:
A Strategy for Improved Phone-Level Lyrics-to-Audio Alignment for Speech-to-Singing Synthesis. 2603-2607 - Théo Biasutto-Lervat, Sara Dahmani, Slim Ouni
:
Modeling Labial Coarticulation with Bidirectional Gated Recurrent Networks and Transfer Learning. 2608-2612
ASR Neural Network Training — 2
- Daniel S. Park, William Chan, Yu Zhang, Chung-Cheng Chiu, Barret Zoph, Ekin D. Cubuk, Quoc V. Le:
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition. 2613-2617 - Kartik Audhkhasi, George Saon
, Zoltán Tüske, Brian Kingsbury, Michael Picheny:
Forget a Bit to Learn Better: Soft Forgetting for CTC-Based Automatic Speech Recognition. 2618-2622 - Haoran Miao
, Gaofeng Cheng, Pengyuan Zhang, Ta Li, Yonghong Yan:
Online Hybrid CTC/Attention Architecture for End-to-End Speech Recognition. 2623-2627 - Wei Zhang, Xiaodong Cui, Ulrich Finkler, George Saon
, Abdullah Kayi, Alper Buyuktosunoglu, Brian Kingsbury, David S. Kung, Michael Picheny:
A Highly Efficient Distributed Deep Learning System for Automatic Speech Recognition. 2628-2632 - Wangyou Zhang, Xuankai Chang, Yanmin Qian:
Knowledge Distillation for End-to-End Monaural Multi-Talker ASR System. 2633-2637 - Tobias Menne, Ilya Sklyar, Ralf Schlüter
, Hermann Ney:
Analysis of Deep Clustering as Preprocessing for Automatic Speech Recognition of Sparsely Overlapping Speech. 2638-2642
Bilingualism, L2, and Non-Nativeness
- Ann R. Bradlow:
Survey Talk: Recognition of Foreign-Accented Speech: Challenges and Opportunities for Human and Computer Speech Communication. - John S. Novak III, Daniel Bunn, Robert V. Kenyon:
The Effects of Time Expansion on English as a Second Language Individuals. 2643-2647 - Shuju Shi, Chilin Shih, Jinsong Zhang
:
Capturing L1 Influence on L2 Pronunciation by Simulating Perceptual Space Using Acoustic Features. 2648-2652 - Juqiang Chen
, Catherine T. Best, Mark Antoniou
:
Cognitive Factors in Thai-Naïve Mandarin Speakers' Imitation of Thai Lexical Tones. 2653-2657 - Annie Tremblay, Mirjam Broersma
:
Foreign-Language Knowledge Enhances Artificial-Language Segmentation. 2658-2662
Spoken Term Detection
- Abdalghani Abujabal, Judith Gaspers:
Neural Named Entity Recognition from Subword Units. 2663-2667 - Saurabhchand Bhati, Shekhar Nayak
, K. Sri Rama Murty
, Najim Dehak
:
Unsupervised Acoustic Segmentation and Clustering Using Siamese Network Embeddings. 2668-2672 - Bolaji Yusuf, Murat Saraclar
:
An Empirical Evaluation of DTW Subsampling Methods for Keyword Search. 2673-2677 - Zixiaofan Yang, Julia Hirschberg:
Linguistically-Informed Training of Acoustic Word Embeddings for Low-Resource Languages. 2678-2682 - Liming Wang, Mark A. Hasegawa-Johnson:
Multimodal Word Discovery and Retrieval with Phone Sequence and Image Concepts. 2683-2687 - Marcely Zanon Boito, Aline Villavicencio
, Laurent Besacier:
Empirical Evaluation of Sequence-to-Sequence Models for Word Discovery in Low-Resource Settings. 2688-2692
Speech and Audio Source Separation and Scene Analysis 2
- Wei Xue, Ying Tong, Guohong Ding, Chao Zhang, Tao Ma, Xiaodong He, Bowen Zhou:
Direct-Path Signal Cross-Correlation Estimation for Sound Source Localization in Reverberation. 2693-2697 - François Grondin, James R. Glass:
Multiple Sound Source Localization with SVD-PHAT. 2698-2702 - Wangyou Zhang, Ying Zhou, Yanmin Qian:
Robust DOA Estimation Based on Convolutional Neural Network and Time-Frequency Masking. 2703-2707 - Yoshiki Masuyama, Masahito Togami, Tatsuya Komatsu:
Multichannel Loss Function for Supervised Speech Source Separation by Mask-Based Beamforming. 2708-2712 - Guanjun Li, Shan Liang, Shuai Nie, Wenju Liu, Meng Yu, Lianwu Chen, Shouye Peng, Changliang Li:
Direction-Aware Speaker Beam for Multi-Channel Speaker Extraction. 2713-2717 - Tsubasa Ochiai, Marc Delcroix
, Keisuke Kinoshita
, Atsunori Ogawa, Tomohiro Nakatani:
Multimodal SpeakerBeam: Single Channel Target Speech Extraction with Audio-Visual Speaker Clues. 2718-2722
Speech Enhancement: Single Channel 2
- François G. Germain, Qifeng Chen
, Vladlen Koltun:
Speech Denoising with Deep Feature Losses. 2723-2727 - Quan Wang, Hannah Muckenhirn, Kevin W. Wilson, Prashant Sridhar, Zelin Wu, John R. Hershey, Rif A. Saurous, Ron J. Weiss, Ye Jia, Ignacio López-Moreno:
VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking. 2728-2732 - Chien-Feng Liao, Yu Tsao, Xugang Lu, Hisashi Kawai:
Incorporating Symbolic Sequential Modeling for Speech Enhancement. 2733-2737 - Pejman Mowlaee
, Daniel Scheran, Johannes Stahl, Sean U. N. Wood, W. Bastiaan Kleijn
:
Maximum a posteriori Speech Enhancement Based on Double Spectrum. 2738-2742 - Jian Yao, Ahmad Al-Dahle:
Coarse-to-Fine Optimization for Speech Enhancement. 2743-2747 - Like Hui, Siyuan Ma, Mikhail Belkin:
Kernel Machines Beat Deep Neural Networks on Mask-Based Single-Channel Speech Enhancement. 2748-2752
Multimodal ASR
- Florian Metze:
Survey Talk: Multimodal Processing of Speech and Language. - Nilay Shrivastava, Astitwa Saxena, Yaman Kumar
, Rajiv Ratn Shah
, Amanda Stent, Debanjan Mahata, Preeti Kaur, Roger Zimmermann:
MobiVSR : Efficient and Light-Weight Neural Network for Visual Speech Recognition on Mobile Devices. 2753-2757 - Pujitha Appan Kandala, Abhinav Thanda, Dilip Kumar Margam, Rohith Chandrashekar Aralikatti, Tanay Sharma, Sharad Roy, Shankar M. Venkatesan:
Speaker Adaptation for Lip-Reading Using Visual Identity Vectors. 2758-2762 - Alexandros Koumparoulis, Gerasimos Potamianos:
MobiLipNet: Resource-Efficient Deep Learning Based Lipreading. 2763-2767 - Leyuan Qu, Cornelius Weber, Stefan Wermter
:
LipSound: Neural Mel-Spectrogram Reconstruction for Lip Reading. 2768-2772
ASR Neural Network Architectures 2
- Tara N. Sainath, Ruoming Pang, David Rybach, Yanzhang He, Rohit Prabhavalkar, Wei Li, Mirkó Visontai, Qiao Liang, Trevor Strohman, Yonghui Wu, Ian McGraw, Chung-Cheng Chiu:
Two-Pass End-to-End Speech Recognition. 2773-2777 - Max W. Y. Lam, Jun Wang, Xunying Liu, Helen Meng, Dan Su, Dong Yu:
Extract, Adapt and Recognize: An End-to-End Neural Network for Corrupted Monaural Speech Recognition. 2778-2782 - Dhananjaya Gowda, Abhinav Garg, Kwangyoun Kim, Mehul Kumar, Chanwoo Kim:
Multi-Task Multi-Resolution Char-to-BPE Cross-Attention Decoder for End-to-End Speech Recognition. 2783-2787 - Kyu Jeong Han, Jing Huang, Yun Tang, Xiaodong He, Bowen Zhou:
Multi-Stride Self-Attention for Speech Recognition. 2788-2792 - Shoukang Hu, Xurong Xie, Shansong Liu, Max W. Y. Lam, Jianwei Yu, Xixin Wu, Xunying Liu, Helen Meng:
LF-MMI Training of Bayesian and Gaussian Process Time Delay Neural Networks for Speech Recognition. 2793-2797 - Liang Lu, Eric Sun, Yifan Gong:
Self-Teaching Networks. 2798-2802
Training Strategy for Speech Emotion Recognition
- Yuanchao Li, Tianyu Zhao, Tatsuya Kawahara
:
Improved End-to-End Speech Emotion Recognition Using Self Attention Mechanism and Multitask Learning. 2803-2807 - Maximilian Schmitt, Nicholas Cummins
, Björn W. Schuller
:
Continuous Emotion Recognition in Speech - Do We Need Recurrence? 2808-2812 - Anda Ouyang, Ting Dang
, Vidhyasaharan Sethu
, Eliathamby Ambikairajah
:
Speech Based Emotion Prediction: Can a Linear Model Work? 2813-2817 - Atsushi Ando, Ryo Masumura, Hosana Kamiyama, Satoshi Kobashikawa, Yushi Aono:
Speech Emotion Recognition Based on Multi-Label Emotion Existence Model. 2818-2822 - Cristina Gorrostieta, Reza Lotfian, Kye Taylor, Richard Brutti, John Kane:
Gender De-Biasing in Speech Emotion Recognition. 2823-2827 - Fang Bao, Michael Neumann, Ngoc Thang Vu:
CycleGAN-Based Emotion Style Transfer as Data Augmentation for Speech Emotion Recognition. 2828-2832
Voice Conversion for Style, Accent, and Emotion
- Bajibabu Bollepalli, Lauri Juvela
, Paavo Alku
:
Lombard Speech Synthesis Using Transfer Learning in a Tacotron Text-to-Speech System. 2833-2837 - Shreyas Seshadri
, Lauri Juvela
, Paavo Alku
, Okko Räsänen
:
Augmented CycleGANs for Continuous Scale Normal-to-Lombard Speaking Style Conversion. 2838-2842 - Guanlong Zhao, Shaojin Ding, Ricardo Gutierrez-Osuna:
Foreign Accent Conversion by Synthesizing Speech from Phonetic Posteriorgrams. 2843-2847 - Ravi Shankar, Jacob Sager, Archana Venkataraman:
A Multi-Speaker Emotion Morphing Model Using Highway Networks and Maximum Likelihood Objective. 2848-2852 - Itshak Lapidot, Jean-François Bonastre
:
Effects of Waveform PMF on Anti-Spoofing Detection. 2853-2857 - Jian Gao, Deep Chakraborty
, Hamidou Tembine, Olaitan Olaleye:
Nonparallel Emotional Speech Conversion. 2858-2862
Speaker Recognition 2
- Themos Stafylakis
, Johan Rohdin, Oldrich Plchot, Petr Mizera, Lukás Burget
:
Self-Supervised Speaker Embeddings. 2863-2867 - Andreas Nautsch
, Jose Patino, Amos Treiber, Themos Stafylakis
, Petr Mizera, Massimiliano Todisco, Thomas Schneider, Nicholas W. D. Evans:
Privacy-Preserving Speaker Recognition with Cohort Score Normalisation. 2868-2872 - Yi Liu, Liang He
, Jia Liu:
Large Margin Softmax Loss for Speaker Verification. 2873-2877 - Amirhossein Hajavi
, Ali Etemad:
A Deep Neural Network for Short-Segment Speaker Recognition. 2878-2882 - Jianfeng Zhou, Tao Jiang, Zheng Li, Lin Li, Qingyang Hong:
Deep Speaker Embedding Extraction with Channel-Wise Feature Responses and Additive Supervision Softmax Loss Function. 2883-2887 - Suwon Shon, Hao Tang, James R. Glass:
VoiceID Loss: Speech Enhancement for Speaker Verification. 2888-2892
Speaker Recognition and Anti-Spoofing
- Anderson R. Avila, Jahangir Alam, Douglas D. O'Shaughnessy, Tiago H. Falk:
Blind Channel Response Estimation for Replay Attack Detection. 2893-2897 - Ankur T. Patil, Rajul Acharya, Pulikonda Krishna Aditya Sai, Hemant A. Patil:
Energy Separation-Based Instantaneous Frequency Estimation for Cochlear Cepstral Feature for Replay Spoof Detection. 2898-2902 - Victoria Mingote
, Antonio Miguel, Dayana Ribas, Alfonso Ortega Giménez
, Eduardo Lleida
:
Optimization of False Acceptance/Rejection Rates and Decision Threshold for End-to-End Text-Dependent Speaker Verification Systems. 2903-2907 - Lei Fan, Qing-Yuan Jiang, Ya-Qi Yu, Wu-Jun Li:
Deep Hashing for Speaker Identification and Retrieval. 2908-2912 - Mirko Marras, Pawel Korus
, Nasir D. Memon
, Gianni Fenu:
Adversarial Optimization for Dictionary Attacks on Speaker Verification. 2913-2917 - Tharshini Gunendradasan, Eliathamby Ambikairajah
, Julien Epps, Haizhou Li
:
An Adaptive-Q Cochlear Model for Replay Spoofing Detection. 2918-2922 - Sungrack Yun, Janghoon Cho, Jungyun Eum, Wonil Chang, Kyuwoong Hwang:
An End-to-End Text-Independent Speaker Verification Framework with a Keyword Adversarial Network. 2923-2927 - Soonshin Seo, Daniel Jun Rim, Minkyu Lim, Donghyun Lee, Hosung Park, Junseok Oh, Changmin Kim, Ji-Hwan Kim:
Shortcut Connections Based Deep Speaker Embeddings for End-to-End Speaker Verification System. 2928-2932 - Chang Huai You, Jichen Yang, Huy Dat Tran:
Device Feature Extractor for Replay Spoofing Detection. 2933-2937 - Hongji Wang, Heinrich Dinkel, Shuai Wang, Yanmin Qian, Kai Yu:
Cross-Domain Replay Spoofing Attack Detection Using Domain Adversarial Training. 2938-2942