default search action
Zhehuai Chen
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [c32]Yuchen Hu, Chen Chen, Chao-Han Huck Yang, Ruizhe Li, Dong Zhang, Zhehuai Chen, EngSiong Chng:
GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators. ACL (1) 2024: 74-90 - [c31]Hainan Xu, Zhehuai Chen, Fei Jia, Boris Ginsburg:
Transducers with Pronunciation-Aware Embeddings for Automatic Speech Recognition. ICASSP 2024: 12026-12030 - [c30]Zhehuai Chen, He Huang, Andrei Andrusenko, Oleksii Hrinchuk, Krishna C. Puvvada, Jason Li, Subhankar Ghosh, Jagadeesh Balam, Boris Ginsburg:
SALM: Speech-Augmented Language Model with in-Context Learning for Speech Recognition and Translation. ICASSP 2024: 13521-13525 - [i30]Christopher Li, Gary Wang, Kyle Kastner, Heng Su, Allen Chen, Andrew Rosenberg, Zhehuai Chen, Zelin Wu, Leonid Velikovich, Pat Rondon, Diamantino Caseiro, Petar S. Aleksic:
High-precision Voice Search Query Correction via Retrievable Speech-text Embedings. CoRR abs/2401.04235 (2024) - [i29]Yuchen Hu, Chen Chen, Chao-Han Huck Yang, Ruizhe Li, Dong Zhang, Zhehuai Chen, Eng Siong Chng:
GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators. CoRR abs/2402.06894 (2024) - [i28]Hainan Xu, Zhehuai Chen, Fei Jia, Boris Ginsburg:
Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition. CoRR abs/2404.04295 (2024) - [i27]Vahid Noroozi, Zhehuai Chen, Somshubra Majumdar, Steve Huang, Jagadeesh Balam, Boris Ginsburg:
Instruction Data Generation and Unsupervised Adaptation for Speech Language Models. CoRR abs/2406.12946 (2024) - [i26]Ke-Han Lu, Zhehuai Chen, Szu-Wei Fu, He Huang, Boris Ginsburg, Yu-Chiang Frank Wang, Hung-yi Lee:
DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment. CoRR abs/2406.18871 (2024) - [i25]Krishna C. Puvvada, Piotr Zelasko, He Huang, Oleksii Hrinchuk, Nithin Rao Koluguri, Kunal Dhawan, Somshubra Majumdar, Elena Rastorgueva, Zhehuai Chen, Vitaly Lavrukhin, Jagadeesh Balam, Boris Ginsburg:
Less is More: Accurate Speech Recognition & Translation without Web-Scale Data. CoRR abs/2406.19674 (2024) - [i24]Zhehuai Chen, He Huang, Oleksii Hrinchuk, Krishna C. Puvvada, Nithin Rao Koluguri, Piotr Zelasko, Jagadeesh Balam, Boris Ginsburg:
BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5. CoRR abs/2406.19954 (2024) - [i23]Chao-Han Huck Yang, Taejin Park, Yuan Gong, Yuanchao Li, Zhehuai Chen, Yen-Ting Lin, Chen Chen, Yuchen Hu, Kunal Dhawan, Piotr Zelasko, Chao Zhang, Yun-Nung Chen, Yu Tsao, Jagadeesh Balam, Boris Ginsburg, Sabato Marco Siniscalchi, Eng Siong Chng, Peter Bell, Catherine Lai, Shinji Watanabe, Andreas Stolcke:
Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition. CoRR abs/2409.09785 (2024) - [i22]Ke Hu, Zhehuai Chen, Chao-Han Huck Yang, Piotr Zelasko, Oleksii Hrinchuk, Vitaly Lavrukhin, Jagadeesh Balam, Boris Ginsburg:
Chain-of-Thought Prompting for Speech Translation. CoRR abs/2409.11538 (2024) - [i21]Piotr Zelasko, Zhehuai Chen, Mengru Wang, Daniel Galvez, Oleksii Hrinchuk, Shuoyang Ding, Ke Hu, Jagadeesh Balam, Vitaly Lavrukhin, Boris Ginsburg:
EMMeTT: Efficient Multimodal Machine Translation Training. CoRR abs/2409.13523 (2024) - [i20]Ke-Han Lu, Zhehuai Chen, Szu-Wei Fu, Chao-Han Huck Yang, Jagadeesh Balam, Boris Ginsburg, Yu-Chiang Frank Wang, Hung-yi Lee:
Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data. CoRR abs/2409.20007 (2024) - 2023
- [c29]Takaaki Saeki, Heiga Zen, Zhehuai Chen, Nobuyuki Morioka, Gary Wang, Yu Zhang, Ankur Bapna, Andrew Rosenberg, Bhuvana Ramabhadran:
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-to-Speech. ICASSP 2023: 1-5 - [c28]Yongqiang Wang, Zhehuai Chen, Chengjian Zheng, Yu Zhang, Wei Han, Parisa Haghani:
Accelerating RNN-T Training and Inference Using CTC Guidance. ICASSP 2023: 1-5 - [c27]Gary Wang, Kyle Kastner, Ankur Bapna, Zhehuai Chen, Andrew Rosenberg, Bhuvana Ramabhadran, Yu Zhang:
Understanding Shared Speech-Text Representations. ICASSP 2023: 1-5 - [c26]Yochai Blau, Rohan Agrawal, Lior Madmony, Gary Wang, Andrew Rosenberg, Zhehuai Chen, Zorik Gekhman, Genady Beryozkin, Parisa Haghani, Bhuvana Ramabhadran:
Using Text Injection to Improve Recognition of Personal Identifiers in Speech. INTERSPEECH 2023: 191-195 - [i19]Yu Zhang, Wei Han, James Qin, Yongqiang Wang, Ankur Bapna, Zhehuai Chen, Nanxin Chen, Bo Li, Vera Axelrod, Gary Wang, Zhong Meng, Ke Hu, Andrew Rosenberg, Rohit Prabhavalkar, Daniel S. Park, Parisa Haghani, Jason Riesa, Ginger Perng, Hagen Soltau, Trevor Strohman, Bhuvana Ramabhadran, Tara N. Sainath, Pedro J. Moreno, Chung-Cheng Chiu, Johan Schalkwyk, Françoise Beaufays, Yonghui Wu:
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages. CoRR abs/2303.01037 (2023) - [i18]Gary Wang, Kyle Kastner, Ankur Bapna, Zhehuai Chen, Andrew Rosenberg, Bhuvana Ramabhadran, Yu Zhang:
Understanding Shared Speech-Text Representations. CoRR abs/2304.14514 (2023) - [i17]Yochai Blau, Rohan Agrawal, Lior Madmony, Gary Wang, Andrew Rosenberg, Zhehuai Chen, Zorik Gekhman, Genady Beryozkin, Parisa Haghani, Bhuvana Ramabhadran:
Using Text Injection to Improve Recognition of Personal Identifiers in Speech. CoRR abs/2308.07393 (2023) - [i16]Zhehuai Chen, He Huang, Andrei Andrusenko, Oleksii Hrinchuk, Krishna C. Puvvada, Jason Li, Subhankar Ghosh, Jagadeesh Balam, Boris Ginsburg:
SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation. CoRR abs/2310.09424 (2023) - 2022
- [c25]Zhehuai Chen, Yu Zhang, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro J. Moreno, Gary Wang:
Tts4pretrain 2.0: Advancing the use of Text and Speech in ASR Pretraining with Consistency and Contrastive Losses. ICASSP 2022: 7677-7681 - [c24]Zhiyun Lu, Yongqiang Wang, Yu Zhang, Wei Han, Zhehuai Chen, Parisa Haghani:
Unsupervised Data Selection via Discrete Speech Representation for ASR. INTERSPEECH 2022: 3393-3397 - [c23]Zhehuai Chen, Yu Zhang, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro J. Moreno, Ankur Bapna, Heiga Zen:
MAESTRO: Matched Speech Text Representations through Modality Matching. INTERSPEECH 2022: 4093-4097 - [c22]Tara N. Sainath, Rohit Prabhavalkar, Ankur Bapna, Yu Zhang, Zhouyuan Huo, Zhehuai Chen, Bo Li, Weiran Wang, Trevor Strohman:
JOIST: A Joint Speech and Text Streaming Model for ASR. SLT 2022: 52-59 - [c21]Zhehuai Chen, Ankur Bapna, Andrew Rosenberg, Yu Zhang, Bhuvana Ramabhadran, Pedro J. Moreno, Nanxin Chen:
Maestro-U: Leveraging Joint Speech-Text Representation Learning for Zero Supervised Speech ASR. SLT 2022: 68-75 - [i15]Zhehuai Chen, Yu Zhang, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro J. Moreno, Ankur Bapna, Heiga Zen:
MAESTRO: Matched Speech Text Representations through Modality Matching. CoRR abs/2204.03409 (2022) - [i14]Alëna Aksënova, Zhehuai Chen, Chung-Cheng Chiu, Daan van Esch, Pavel Golik, Wei Han, Levi King, Bhuvana Ramabhadran, Andrew Rosenberg, Suzan Schwartz, Gary Wang:
Accented Speech Recognition: Benchmarking, Pre-training, and Diverse Data. CoRR abs/2205.08014 (2022) - [i13]Tara N. Sainath, Rohit Prabhavalkar, Ankur Bapna, Yu Zhang, Zhouyuan Huo, Zhehuai Chen, Bo Li, Weiran Wang, Trevor Strohman:
JOIST: A Joint Speech and Text Streaming Model For ASR. CoRR abs/2210.07353 (2022) - [i12]Zhehuai Chen, Ankur Bapna, Andrew Rosenberg, Yu Zhang, Bhuvana Ramabhadran, Pedro J. Moreno, Nanxin Chen:
Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR. CoRR abs/2210.10027 (2022) - [i11]Takaaki Saeki, Heiga Zen, Zhehuai Chen, Nobuyuki Morioka, Gary Wang, Yu Zhang, Ankur Bapna, Andrew Rosenberg, Bhuvana Ramabhadran:
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech. CoRR abs/2210.15447 (2022) - [i10]Yongqiang Wang, Zhehuai Chen, Chengjian Zheng, Yu Zhang, Wei Han, Parisa Haghani:
Accelerating RNN-T Training and Inference Using CTC guidance. CoRR abs/2210.16481 (2022) - 2021
- [c20]Zhehuai Chen, Yu Zhang, Andrew Rosenberg, Bhuvana Ramabhadran, Gary Wang, Pedro J. Moreno:
Injecting Text in Self-Supervised Speech Pretraining. ASRU 2021: 251-258 - [c19]Hang Lv, Zhehuai Chen, Hainan Xu, Daniel Povey, Lei Xie, Sanjeev Khudanpur:
An Asynchronous WFST-Based Decoder for Automatic Speech Recognition. ICASSP 2021: 6019-6023 - [c18]Zhehuai Chen, Andrew Rosenberg, Yu Zhang, Heiga Zen, Mohammadreza Ghodsi, Yinghui Huang, Jesse Emond, Gary Wang, Bhuvana Ramabhadran, Pedro J. Moreno:
Semi-Supervision in ASR: Sequential MixMatch and Factorized TTS-Based Augmentation. Interspeech 2021: 736-740 - [c17]Zhehuai Chen, Bhuvana Ramabhadran, Fadi Biadsy, Xia Zhang, Youzheng Chen, Liyang Jiang, Fang Chu, Rohan Doshi, Pedro J. Moreno:
Conformer Parrotron: A Faster and Stronger End-to-End Speech Conversion and Recognition Model for Atypical Speech. Interspeech 2021: 4828-4832 - [i9]Hang Lv, Zhehuai Chen, Hainan Xu, Daniel Povey, Lei Xie, Sanjeev Khudanpur:
An Asynchronous WFST-Based Decoder For Automatic Speech Recognition. CoRR abs/2103.09063 (2021) - [i8]Zhehuai Chen, Yu Zhang, Andrew Rosenberg, Bhuvana Ramabhadran, Gary Wang, Pedro J. Moreno:
Injecting Text in Self-Supervised Speech Pretraining. CoRR abs/2108.12226 (2021) - 2020
- [j4]Qi Liu, Zhehuai Chen, Hao Li, Mingkun Huang, Yizhou Lu, Kai Yu:
Modular End-to-End Automatic Speech Recognition Framework for Acoustic-to-Word Model. IEEE ACM Trans. Audio Speech Lang. Process. 28: 2174-2183 (2020) - [c16]Gary Wang, Andrew Rosenberg, Zhehuai Chen, Yu Zhang, Bhuvana Ramabhadran, Yonghui Wu, Pedro J. Moreno:
Improving Speech Recognition Using Consistent Predictions on Synthesized Speech. ICASSP 2020: 7029-7033 - [c15]Zhehuai Chen, Andrew Rosenberg, Yu Zhang, Gary Wang, Bhuvana Ramabhadran, Pedro J. Moreno:
Improving Speech Recognition Using GAN-Based Speech Synthesis and Contrastive Unspoken Text Selection. INTERSPEECH 2020: 556-560 - [c14]Gary Wang, Andrew Rosenberg, Zhehuai Chen, Yu Zhang, Bhuvana Ramabhadran, Pedro J. Moreno:
SCADA: Stochastic, Consistent and Adversarial Data Augmentation to Improve ASR. INTERSPEECH 2020: 2832-2836 - [i7]Qi Liu, Zhehuai Chen, Hao Li, Mingkun Huang, Yizhou Lu, Kai Yu:
Modular End-to-end Automatic Speech Recognition Framework for Acoustic-to-word Model. CoRR abs/2008.00953 (2020)
2010 – 2019
- 2019
- [c13]Zhehuai Chen, Mahsa Yarmohammadi, Hainan Xu, Hang Lv, Lei Xie, Daniel Povey, Sanjeev Khudanpur:
Incremental Lattice Determinization for WFST Decoders. ASRU 2019: 1-7 - [c12]Zhehuai Chen, Mahaveer Jain, Yongqiang Wang, Michael L. Seltzer, Christian Fuegen:
End-to-end Contextual Speech Recognition Using Class Language Models and a Token Passing Decoder. ICASSP 2019: 6186-6190 - [c11]Zhehuai Chen, Mahaveer Jain, Yongqiang Wang, Michael L. Seltzer, Christian Fuegen:
Joint Grapheme and Phoneme Embeddings for Contextual End-to-End ASR. INTERSPEECH 2019: 3490-3494 - 2018
- [j3]Zhehuai Chen, Yanmin Qian, Kai Yu:
Sequence discriminative training for deep learning based acoustic keyword spotting. Speech Commun. 102: 100-111 (2018) - [j2]Zhehuai Chen, Jasha Droppo, Jinyu Li, Wayne Xiong:
Progressive Joint Modeling in Unsupervised Single-Channel Overlapped Speech Recognition. IEEE ACM Trans. Audio Speech Lang. Process. 26(1): 184-196 (2018) - [c10]Zhehuai Chen, Qi Liu, Hao Li, Kai Yu:
On Modular Training of Neural Acoustics-to-Word Model for LVCSR. ICASSP 2018: 4754-4758 - [c9]Zhehuai Chen, Jasha Droppo:
Sequence Modeling in Unsupervised Single-Channel Overlapped Speech Recognition. ICASSP 2018: 4809-4813 - [c8]Zhehuai Chen, Justin Luitjens, Hainan Xu, Yiming Wang, Daniel Povey, Sanjeev Khudanpur:
A GPU-based WFST Decoder with Exact Lattice Generation. INTERSPEECH 2018: 2212-2216 - [c7]Mingkun Huang, Yongbin You, Zhehuai Chen, Yanmin Qian, Kai Yu:
Knowledge Distillation for Sequence Model. INTERSPEECH 2018: 3703-3707 - [i6]Zhehuai Chen, Qi Liu, Hao Li, Kai Yu:
On Modular Training of Neural Acoustics-to-Word Model for LVCSR. CoRR abs/1803.01090 (2018) - [i5]Zhehuai Chen, Justin Luitjens, Hainan Xu, Yiming Wang, Daniel Povey, Sanjeev Khudanpur:
A GPU-based WFST Decoder with Exact Lattice Generation. CoRR abs/1804.03243 (2018) - [i4]Zhehuai Chen, Yanmin Qian, Kai Yu:
Sequence Discriminative Training for Deep Learning based Acoustic Keyword Spotting. CoRR abs/1808.00639 (2018) - [i3]Zhehuai Chen:
Linguistic Search Optimization for Deep Learning Based LVCSR. CoRR abs/1808.00687 (2018) - [i2]Zhehuai Chen, Mahaveer Jain, Yongqiang Wang, Michael L. Seltzer, Christian Fuegen:
End-to-end contextual speech recognition using class language models and a token passing decoder. CoRR abs/1812.02142 (2018) - 2017
- [j1]Zhehuai Chen, Yimeng Zhuang, Yanmin Qian, Kai Yu:
Phone Synchronous Speech Recognition With CTC Lattices. IEEE ACM Trans. Audio Speech Lang. Process. 25(1): 86-97 (2017) - [c6]Yue Wu, Tianxing He, Zhehuai Chen, Yanmin Qian, Kai Yu:
Multi-view LSTM Language Model with Word-Synchronized Auxiliary Feature for LVCSR. CCL 2017: 398-410 - [c5]Zhehuai Chen, Yimeng Zhuang, Kai Yu:
Confidence measures for CTC-based phone synchronous decoding. ICASSP 2017: 4850-4854 - [c4]Zhehuai Chen, Yanmin Qian, Kai Yu:
A Unified Confidence Measure Framework Using Auxiliary Normalization Graph. IScIDE 2017: 123-133 - [i1]Zhehuai Chen, Jasha Droppo, Jinyu Li, Wayne Xiong:
Progressive Joint Modeling in Unsupervised Single-channel Overlapped Speech Recognition. CoRR abs/1707.07048 (2017) - 2016
- [c3]Zhehuai Chen, Wei Deng, Tao Xu, Kai Yu:
Phone Synchronous Decoding with CTC Lattice. INTERSPEECH 2016: 1923-1927 - [c2]Da Zheng, Zhehuai Chen, Yue Wu, Kai Yu:
Directed automatic speech transcription error correction using bidirectional LSTM. ISCSLP 2016: 1-5 - 2015
- [c1]Bo Chen, Zhehuai Chen, Jiachen Xu, Kai Yu:
An investigation of context clustering for statistical speech synthesis with deep neural network. INTERSPEECH 2015: 2212-2216
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-10-22 21:17 CEST by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint