default search action
ASRU 2023: Taipei, Taiwan
- IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023, Taipei, Taiwan, December 16-20, 2023. IEEE 2023, ISBN 979-8-3503-0689-7
- Shilong Wu, Jun Du, Mao-Kui He, Shutong Niu, Hang Chen, Haitao Tang, Chin-Hui Lee:
Semi-Supervised Multi-Channel Speaker Diarization With Cross-Channel Attention. 1-8 - Da-Hee Yang, Joon-Hyuk Chang:
Towards Robust Packet Loss Concealment System With ASR-Guided Representations. 1-8 - Feng-Ting Liao, Yung-Chieh Chan, Yi-Chang Chen, Chan-Jan Hsu, Da-Shan Shiu:
Zero-Shot Domain-Sensitive Speech Recognition with Prompt-Conditioning Fine-Tuning. 1-8 - Zexu Pan, Gordon Wichern, Yoshiki Masuyama, François G. Germain, Sameer Khurana, Chiori Hori, Jonathan Le Roux:
Scenario-Aware Audio-Visual TF-Gridnet for Target Speech Extraction. 1-8 - Cheng-I Jeff Lai, Freda Shi, Puyuan Peng, Yoon Kim, Kevin Gimpel, Shiyu Chang, Yung-Sung Chuang, Saurabhchand Bhati, David D. Cox, David Harwath, Yang Zhang, Karen Livescu, James R. Glass:
Audio-Visual Neural Syntax Acquisition. 1-8 - Sheikh Shams Azam, Tatiana Likhomanenko, Martin Pelikan, Jan Honza Silovsky:
Importance of Smoothness Induced by Optimizers in Fl4Asr: Towards Understanding Federated Learning for End-To-End ASR. 1-8 - Houjian Guo, Chaoran Liu, Carlos Toshinori Ishi, Hiroshi Ishiguro:
QUICKVC: A Lightweight VITS-Based Any-to-Many Voice Conversion Model using ISTFT for Faster Conversion. 1-7 - Ivan Fung, Lahiru Samarakoon, Samuel J. Broughton:
Robust End-to-End Diarization with Domain Adaptive Training and Multi-Task Learning. 1-7 - Bahman Mirheidari, Ronan O'Malley, Daniel Blackburn, Heidi Christensen:
Identifying People with Mild Cognitive Impairment at Risk of Developing Dementia using Speech Analysis. 1-6 - Sathvik Udupa, Jesuraja Bandekar, Deekshitha G, Saurabh Kumar, Prasanta Kumar Ghosh, Sandhya Badiger, Abhayjeet Singh, Savitha Murthy, Priyanka Pai, Srinivasa Raghavan K. M., Raoul Nanavati:
Gated Multi Encoders and Multitask Objectives for Dialectal Speech Recognition in Indian Languages. 1-8 - Yuhao Liang, Mohan Shi, Fan Yu, Yangze Li, Shiliang Zhang, Zhihao Du, Qian Chen, Lei Xie, Yanmin Qian, Jian Wu, Zhuo Chen, Kong Aik Lee, Zhijie Yan, Hui Bu:
The Second Multi-Channel Multi-Party Meeting Transcription Challenge (M2MeT 2.0): A Benchmark for Speaker-Attributed ASR. 1-8 - Mark Lindsey, Nathaniel R. Robinson, Francis Kubala, Richard M. Stern:
Reducing the Cost of Spoof Detection Labeling using Mixed-Strategy Active Learning and Pretrained Models. 1-7 - Alexandra Antonova:
Wiki-En-ASR-Adapt: Large-Scale Synthetic Dataset for English ASR Customization. 1-8 - Jiatong Shi, William Chen, Dan Berrebbi, Hsiu-Hsuan Wang, Wei-Ping Huang, En-Pei Hu, Ho-Lam Chuang, Xuankai Chang, Yuxun Tang, Shang-Wen Li, Abdelrahman Mohamed, Hung-Yi Lee, Shinji Watanabe:
Findings of the 2023 ML-Superb Challenge: Pre-Training And Evaluation Over More Languages And Beyond. 1-8 - Dake Guo, Xinfa Zhu, Liumeng Xue, Tao Li, Yuanjun Lv, Yuepeng Jiang, Lei Xie:
HIGNN-TTS: Hierarchical Prosody Modeling With Graph Neural Networks for Expressive Long-Form TTS. 1-7 - Anusha Prakash, Srinivasan Umesh, Hema A. Murthy:
Towards Developing State-of-The-Art TTS Synthesisers for 13 Indian Languages with Signal Processing Aided Alignments. 1-8 - Kaixun Huang, Ao Zhang, Binbin Zhang, Tianyi Xu, Xingchen Song, Lei Xie:
Spike-Triggered Contextual Biasing for End-to-End Mandarin Speech Recognition. 1-8 - Yu Yu, Chao-Han Huck Yang, Jari Kolehmainen, Prashanth Gurunath Shivakumar, Yile Gu, Sungho Ryu, Roger Ren, Qi Luo, Aditya Gourav, I-Fan Chen, Yi-Chieh Liu, Tuan Dinh, Ankur Gandhe, Denis Filimonov, Shalini Ghosh, Andreas Stolcke, Ariya Rastrow, Ivan Bulyko:
Low-Rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition. 1-8 - Yuewei Zhang, Huanbin Zou, Jie Zhu:
Vsanet: Real-Time Speech Enhancement Based on Voice Activity Detection and Causal Spatial Attention. 1-8 - Chenglin Xu, Xiguang Zheng, Chen Zhang, Chao Zhou, Qi Huang, Bing Yu:
KAQ: A Non-Intrusive Stacking Framework for Mean Opinion Score Prediction with Multi-Task Learning. 1-8 - Junfeng Hou, Peiyao Wang, Jincheng Zhang, Meng Yang, Minwei Feng, Jingcheng Yin:
CTC Blank Triggered Dynamic Layer-Skipping for Efficient Ctc-Based Speech Recognition. 1-5 - Muhammad Umar Farooq, Rehan Ahmad, Thomas Hain:
MUST: A Multilingual Student-Teacher Learning Approach for Low-Resource Speech Recognition. 1-6 - Gan Song, Zelin Wu, Golan Pundak, Angad Chandorkar, Kandarp Joshi, Xavier Velez, Diamantino Caseiro, Ben Haynor, Weiran Wang, Nikhil Siddhartha, Pat Rondon, Khe Chai Sim:
Contextual Spelling Correction with Large Language Models. 1-8 - Yuke Li, Xinfa Zhu, Yi Lei, Hai Li, Junhui Liu, Danming Xie, Lei Xie:
Zero-Shot Emotion Transfer for Cross-Lingual Speech Synthesis. 1-8 - Elaf Islam, Thomas Hain, Protima Nomo Sudro:
Simulation of Teacher-Learner Interaction in English Language Pronunciation Learning. 1-6 - Jae-Hong Lee, Do-Hee Kim, Joon-Hyuk Chang:
AWMC: Online Test-Time Adaptation Without Mode Collapse for Continual Adaptation. 1-8 - Roshan S. Sharma, William Chen, Takatomo Kano, Ruchira Sharma, Siddhant Arora, Shinji Watanabe, Atsunori Ogawa, Marc Delcroix, Rita Singh, Bhiksha Raj:
Espnet-Summ: Introducing a Novel Large Dataset, Toolkit, and a Cross-Corpora Evaluation of Speech Summarization Systems. 1-8 - Jiachen Lian, Alexei Baevski, Wei-Ning Hsu, Michael Auli:
Av-Data2Vec: Self-Supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations. 1-8 - Daniel Galvez, Tim Kaldewey:
GPU-Accelerated Wfst Beam Search Decoder for CTC-Based Speech Recognition. 1-7 - Ke Hu, Tara N. Sainath, Bo Li, Yu Zhang, Yong Cheng, Tao Wang, Yujing Zhang, Frederick Liu:
Improving Multilingual and Code-Switching ASR Using Large Language Model Generated Text. 1-7 - Xingyu Cai, David Qiu, Shaojin Ding, Dongseong Hwang, Weiran Wang, Antoine Bruguier, Rohit Prabhavalkar, Tara N. Sainath, Yanzhang He:
Efficient Cascaded Streaming ASR System Via Frame Rate Reduction. 1-8 - Alexander Blatt, Badr M. Abdullah, Dietrich Klakow:
Ending the Blind Flight: Analyzing the Impact of Acoustic and Lexical Factors on WAV2VEC 2.0 in Air-Traffic Control. 1-8 - Jarod Duret, Benjamin O'Brien, Yannick Estève, Titouan Parcollet:
Enhancing Expressivity Transfer in Textless Speech-to-Speech Translation. 1-8 - Jeff Hwang, Moto Hira, Caroline Chen, Xiaohui Zhang, Zhaoheng Ni, Guangzhi Sun, Pingchuan Ma, Ruizhe Huang, Vineel Pratap, Yuekai Zhang, Anurag Kumar, Chin-Yun Yu, Chuang Zhu, Chunxi Liu, Jacob Kahn, Mirco Ravanelli, Peng Sun, Shinji Watanabe, Yangyang Shi, Yumeng Tao:
TorchAudio 2.1: Advancing Speech Recognition, Self-Supervised Learning, and Audio Processing Components for Pytorch. 1-9 - Yiquan Zhou, Meng Chen, Yi Lei, Jihua Zhu, Weifeng Zhao:
VITS-Based Singing Voice Conversion System with DSPGAN Post-Processing for SVCC2023. 1-8 - Thomas Thebaud, Sonal Joshi, Henry Li, Martin Sustek, Jesús Villalba, Sanjeev Khudanpur, Najim Dehak:
Clustering Unsupervised Representations as Defense Against Poisoning Attacks on Speech Commands Classification System. 1-8 - Houjian Guo, Chaoran Liu, Carlos Toshinori Ishi, Hiroshi Ishiguro:
Using Joint Training Speaker Encoder With Consistency Loss to Achieve Cross-Lingual Voice Conversion and Expressive Voice Conversion. 1-8 - Dongyuan Li, Yusong Wang, Kotaro Funakoshi, Manabu Okumura:
After: Active Learning Based Fine-Tuning Framework for Speech Emotion Recognition. 1-8 - Yan Huang, Piyush Behre, Guoli Ye, Shawn Chang, Yifan Gong:
Multi Transcription-Style Speech Transcription Using Attention-Based Encoder-Decoder Model. 1-6 - Xiang Lyu, Yuhang Cao, Qing Wang, Jingjing Yin, Yuguang Yang, Pengpeng Zou, Yanni Hu, Heng Lu:
PP-MET: A Real-World Personalized Prompt Based Meeting Transcription System. 1-8 - Pavel Denisov, Ngoc Thang Vu:
Leveraging Multilingual Self-Supervised Pretrained Models for Sequence-to-Sequence End-to-End Spoken Language Understanding. 1-8 - Mun-Hak Lee, Sang-Eon Lee, Ji-Eun Choi, Joon-Hyuk Chang:
Cross-Modal Learning for CTC-Based ASR: Leveraging CTC-Bertscore and Sequence-Level Training. 1-8 - Veera Raghavendra Elluru, Devang Kulshreshtha, Rohit Paturi, Sravan Bodapati, Srikanth Ronanki:
Generalized Zero-Shot Audio-to-Intent Classification. 1-8 - Rajeev Rajan, Noumida Abdul Kareem, Sreelakshmi S:
Paraconsistent Feature Analysis for the Competency Evaluation of Voice Impersonation. 1-7 - Huali Zhou, Yueqian Lin, Yao Shi, Peng Sun, Ming Li:
Bisinger: Bilingual Singing Voice Synthesis. 1-8 - Chung-Ming Chien, Mingjiamei Zhang, Ju-Chieh Chou, Karen Livescu:
Few-Shot Spoken Language Understanding Via Joint Speech-Text Models. 1-8 - Jiajun He, Zekun Yang, Tomoki Toda:
ED-CEC: Improving Rare word Recognition Using ASR Postprocessing Based on Error Detection and Context-Aware Error Correction. 1-6 - Guodong Ma, Wenxuan Wang, Yuke Li, Yuting Yang, Binbin Du, Haoran Fu:
LAE-ST-MOE: Boosted Language-Aware Encoder Using Speech Translation Auxiliary Task for E2E Code-Switching ASR. 1-8 - Hong Liu, Yucheng Cai, Yuan Zhou, Zhijian Ou, Yi Huang, Junlan Feng:
Prompt Pool Based Class-Incremental Continual Learning for Dialog State Tracking. 1-8 - Yuke Lin, Xiaoyi Qin, Ning Jiang, Guoqing Zhao, Ming Li:
Haha-POD: An Attempt for Laughter-Based Non-Verbal Speaker Verification. 1-7 - Wei-Ping Huang, Sung-Feng Huang, Hung-Yi Lee:
Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by Self-Supervised Representation Mixing and Embedding Initialization. 1-8 - Aleksandr Meister, Matvei Novikov, Nikolay Karpov, Evelina Bakhturina, Vitaly Lavrukhin, Boris Ginsburg:
LibriSpeech-PC: Benchmark for Evaluation of Punctuation and Capitalization Capabilities of End-to-End ASR Models. 1-7 - Ji-Hwan Mo, Jae-Jin Jeon, Mun-Hak Lee, Joon-Hyuk Chang:
Knowledge Distillation From Offline to Streaming Transducer: Towards Accurate and Fast Streaming Model by Matching Alignments. 1-7 - Tanel Alumäe, Jiaming Kong, Daniil Robnikov:
Dialect Adaptation and Data Augmentation for Low-Resource ASR: Taltech Systems for the Madasr 2023 Challenge. 1-7 - William Ravenscroft, Stefan Goetze, Thomas Hain:
On Time Domain Conformer Models for Monaural Speech Separation in Noisy Reverberant Acoustic Environments. 1-7 - Weiming Xu, Zhouxuan Chen, Zhili Tan, Shubo Lv, Runduo Han, Wenjiang Zhou, Weifeng Zhao, Lei Xie:
MBTFNET: Multi-Band Temporal-Frequency Neural Network for Singing Voice Enhancement. 1-8 - Wen-Chin Huang, Lester Phillip Violeta, Songxiang Liu, Jiatong Shi, Tomoki Toda:
The Singing Voice Conversion Challenge 2023. 1-8 - Chun-Yi Kuan, Chen-An Li, Tsu-Yuan Hsu, Tse-Yang Lin, Ho-Lam Chung, Kai-Wei Chang, Shuo-Yiin Chang, Hung-Yi Lee:
Towards General-Purpose Text-Instruction-Guided Voice Conversion. 1-8 - Chao-Han Huck Yang, Yile Gu, Yi-Chieh Liu, Shalini Ghosh, Ivan Bulyko, Andreas Stolcke:
Generative Speech Recognition Error Correction With Large Language Models and Task-Activating Prompting. 1-8 - Hsin-Tien Chiang, Kuo-Hsuan Hung, Szu-Wei Fu, Heng-Cheng Kuo, Ming-Hsueh Tsai, Yu Tsao:
Study on the Correlation Between Objective Evaluations and Subjective Speech Quality and Intelligibility. 1-7 - Sibo Tong, Philip Harding, Simon Wiesler:
Hierarchical Attention-Based Contextual Biasing For Personalized Speech Recognition Using Neural Transducers. 1-8 - Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan S. Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jee-Weon Jung, Soumi Maiti, Shinji Watanabe:
Reproducing Whisper-Style Training Using An Open-Source Toolkit And Publicly Available Data. 1-8 - Jisi Zhang, Vandana Rajan, Haaris Mehmood, David Tuckey, Pablo Peso Parada, Md Asif Jalal, Karthikeyan Saravanan, Gil Ho Lee, Jungin Lee, Seokyeong Jung:
Consistency Based Unsupervised Self-Training for ASR Personalisation. 1-8 - Jeremy Heng Meng Wong, Huayun Zhang, Nancy F. Chen:
Variational Gaussian Process Data Uncertainty. 1-8 - Lahiru Samarakoon, Samuel J. Broughton, Marc Härkönen, Ivan Fung:
Transformer Attractors for Robust and Efficient End-To-End Neural Diarization. 1-8 - Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai:
Cross-Modal Alignment With Optimal Transport For CTC-Based ASR. 1-7 - Kailai Shen, Diqun Yan, Li Dong, Ying Ren, Xiaoxun Wu, Jing Hu:
SQAT-LD: SPeech Quality Assessment Transformer Utilizing Listener Dependent Modeling for Zero-Shot Out-of-Domain MOS Prediction. 1-6 - Chang Chen, Xun Gong, Yanmin Qian:
Efficient Text-Only Domain Adaptation For CTC-Based ASR. 1-7 - Zijian Yang, Wei Zhou, Ralf Schlüter, Hermann Ney:
Investigating The Effect of Language Models in Sequence Discriminative Training For Neural Transducers. 1-8 - Dongji Gao, Hainan Xu, Desh Raj, Leibny Paola García-Perera, Daniel Povey, Sanjeev Khudanpur:
Learning From Flawed Data: Weakly Supervised Automatic Speech Recognition. 1-8 - Dongning Yang, Wei Wang, Yanmin Qian:
FAT-HuBERT: Front-End Adaptive Training of Hidden-Unit BERT For Distortion-Invariant Robust Speech Recognition. 1-8 - David Qiu, Shaojin Ding, Yanzhang He:
The Role of Feature Correlation on Quantized Neural Networks. 1-7 - Shaoxiong Lin, Chao Zhang, Yanmin Qian:
Improving Speech Enhancement Using Audio Tagging Knowledge From Pre-Trained Representations and Multi-Task Learning. 1-7 - Yoshiki Sato, Julián Villegas:
Spectral Tilt May Have a Smaller Impact on the Intelligibility of Speech in Noise. 1-5 - Xinjian Li, Shinnosuke Takamichi, Takaaki Saeki, William Chen, Sayaka Shiota, Shinji Watanabe:
Yodas: Youtube-Oriented Dataset for Audio and Speech. 1-8 - Wenqing Wei, Zhengdong Yang, Yuan Gao, Jiyi Li, Chenhui Chu, Shogo Okada, Sheng Li:
FedCPC: An Effective Federated Contrastive Learning Method for Privacy Preserving Early-Stage Alzheimers Speech Detection. 1-6 - Hiroyoshi Yamasaki, Jérôme Louradour, Julie Hunter, Laurent Prévot:
Transcribing and Aligning Conversational Speech: A Hybrid Pipeline Applied to French Conversations. 1-6 - Ching-Feng Yeh, Po-Yao Huang, Vasu Sharma, Shang-Wen Li, Gargi Ghosh:
Flap: Fast Language-Audio Pre-Training. 1-8 - Aya Watanabe, Shinnosuke Takamichi, Yuki Saito, Wataru Nakata, Detai Xin, Hiroshi Saruwatari:
COCO-NUT: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-Based Control. 1-8 - Jiarui Hai, Yu-Jeh Liu, Mounya Elhilali:
Boosting Modality Representation With Pre-Trained Models and Multi-Task Training for Multimodal Sentiment Analysis. 1-8 - Armand Stricker, Patrick Paroubek:
Enhancing Task-Oriented Dialogues With Chitchat: A Comparative Study Based on Lexical Diversity And Divergence. 1-8 - Seongjin Park, Rutuja Ubale:
Multitask Learning Model with Text and Speech Representation for Fine-Grained Speech Scoring. 1-7 - Martin Sustek, Sonal Joshi, Henry Li, Thomas Thebaud, Jesús Villalba, Sanjeev Khudanpur, Najim Dehak:
Joint Energy-Based Model for Robust Speech Classification System Against Dirty-Label Backdoor Poisoning Attacks. 1-8 - Yuya Fujita, Shinji Watanabe, Xuankai Chang, Takashi Maekaku:
LV-CTC: Non-Autoregressive ASR With CTC and Latent Variable Models. 1-6 - Yu-Hsiang Wang, Huang-Yu Chen, Kai-Wei Chang, Winston H. Hsu, Hung-Yi Lee:
Minisuperb: Lightweight Benchmark for Self-Supervised Speech Models. 1-8 - Tzu-Quan Lin, Hung-Yi Lee, Hao Tang:
MelHuBERT: A Simplified Hubert on Mel Spectrograms. 1-8 - Dima Rekesh, Nithin Rao Koluguri, Samuel Kriman, Somshubra Majumdar, Vahid Noroozi, He Huang, Oleksii Hrinchuk, Krishna C. Puvvada, Ankur Kumar, Jagadeesh Balam, Boris Ginsburg:
Fast Conformer With Linearly Scalable Attention For Efficient Speech Recognition. 1-8 - Xintong Wang, Chang Zeng, Jun Chen, Chunhui Wang:
Crosssinger: A Cross-Lingual Multi-Singer High-Fidelity Singing Voice Synthesizer Trained on Monolingual Singers. 1-6 - Mingqiu Wang, Wei Han, Izhak Shafran, Zelin Wu, Chung-Cheng Chiu, Yuan Cao, Nanxin Chen, Yu Zhang, Hagen Soltau, Paul K. Rubenstein, Lukas Zilka, Dian Yu, Golan Pundak, Nikhil Siddhartha, Johan Schalkwyk, Yonghui Wu:
SLM: Bridge the Thin Gap Between Speech and Text Foundation Models. 1-8 - Ilja Baumann, Dominik Wagner, Korbinian Riedhammer, Elmar Nöth, Tobias Bocklet:
Detection of Vowel Errors in Children's Speech using Synthetic Phonetic Transcripts. 1-8 - Jian Wu, Yashesh Gaur, Zhuo Chen, Long Zhou, Yimeng Zhu, Tianrui Wang, Jinyu Li, Shujie Liu, Bo Ren, Linquan Liu, Yu Wu:
On Decoder-Only Architecture For Speech-to-Text and Large Language Model Integration. 1-8 - Md Asif Jalal, Pablo Peso Parada, George Pavlidis, Vasileios Moschopoulos, Karthikeyan Saravanan, Chrysovalantis-Giorgos Kontoulis, Jisi Zhang, Anastasios Drosou, Gil Ho Lee, Jungin Lee, Seokyeong Jung:
Locality Enhanced Dynamic Biasing and Sampling Strategies For Contextual ASR. 1-8 - Bence Mark Halpern, Wen-Chin Huang, Lester Phillip Violeta, R. J. J. H. van Son, Tomoki Toda:
Improving Severity Preservation of Healthy-to-Pathological Voice Conversion With Global Style Tokens. 1-7 - Junchen Liu, Jesin James, Karan Nathwani:
Improved Multi-Modal Emotion Recognition Using Squeeze-and-Excitation Block in Cross-Modal Attention. 1-8 - Junkun Chen, Jian Xue, Peidong Wang, Jing Pan, Jinyu Li:
Improving Stability in Simultaneous Speech Translation: A Revision-Controllable Decoding Approach. 1-7 - Dominik Wagner, Ilja Baumann, Sebastian P. Bayerl, Korbinian Riedhammer, Tobias Bocklet:
Speaker Adaptation for End-to-End Speech Recognition Systems in Noisy Environments. 1-6 - Jun-You Wang, Hung-Yi Lee, Jyh-Shing Roger Jang, Li Su:
Zero-Shot Singing Voice Synthesis from Musical Score. 1-8 - Ashutosh Chaubey, Sparsh Sinha, Susmita Ghose:
Meta-Learning Framework for End-to-End Imposter Identification in Unseen Speaker Recognition. 1-8 - Jerome R. Bellegarda:
Pareto Efficiency of Learning-Forgetting Trade-Off in Neural Language Model Adaptation. 1-8 - Daichi Hayakawa, Takehiko Kagoshima, Kenji Iwata, Norbert Braunschweiler, Rama Doddipatla:
Robust Recognition of Speaker Emotion With Difference Feature Extraction Using a Few Enrollment Utterances. 1-7 - Sara Papi, Peidong Wang, Junkun Chen, Jian Xue, Jinyu Li, Yashesh Gaur:
Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments. 1-8 - Jin Qiu, Lu Huang, Boyu Li, Jun Zhang, Lu Lu, Zejun Ma:
Improving Large-Scale Deep Biasing With Phoneme Features and Text-Only Data in Streaming Transducer. 1-8 - Zitha Sasindran, Harsha Yelchuri, T. V. Prabhakar, Supreeth Rao:
HEVAL: A New Hybrid Evaluation Metric for Automatic Speech Recognition Tasks. 1-7 - Marvin Lavechin, Marianne Métais, Hadrien Titeux, Alodie Boissonnet, Jade Copet, Morgane Rivière, Elika Bergelson, Alejandrina Cristià, Emmanuel Dupoux, Hervé Bredin:
Brouhaha: Multi-Task Training for Voice Activity Detection, Speech-to-Noise Ratio, and C50 Room Acoustics Estimation. 1-7 - Yuanjun Lv, Jixun Yao, Peikun Chen, Hongbin Zhou, Heng Lu, Lei Xie:
Salt: Distinguishable Speaker Anonymization Through Latent Space Transformation. 1-8 - Zhihong Lei, Mingbin Xu, Shiyi Han, Leo Liu, Zhen Huang, Tim Ng, Yuanyuan Zhang, Ernest Pusateri, Mirko Hannemann, Yaqiao Deng, Man-Hung Siu:
Acoustic Model Fusion For End-to-End Speech Recognition. 1-7 - Yusuke Shinohara, Shinji Watanabe:
Domain Adaptation by Data Distribution Matching Via Submodularity For Speech Recognition. 1-7 - Pasquale D'Alterio, Christian Hensel, Bashar Awwad Shiekh Hasan:
Can Unpaired Textual Data Replace Synthetic Speech in ASR Model Adaptation? 1-8 - Daniela A. Wiepert, Rene L. Utianski, Joseph R. Duffy, John L. Stricker, Leland Barnard, Keith A. Josephs, Jennifer L. Whitwell, David T. Jones, Hugo Botha:
Not All Errors Are Created Equal: Evaluating The Impact of Model and Speaker Factors on ASR Outcomes in Clinical Populations. 1-6 - Kohei Saijo,