


default search action
Tara N. Sainath
Person information
- affiliation: Google Inc., New York, NY, USA
- affiliation: IBM T. J. Watson Research Center, Yorktown Heights, NY, USA
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
showing all ?? records
2020 – today
- 2024
- [j11]Rohit Prabhavalkar
, Takaaki Hori
, Tara N. Sainath
, Ralf Schlüter
, Shinji Watanabe
:
End-to-End Speech Recognition: A Survey. IEEE ACM Trans. Audio Speech Lang. Process. 32: 325-351 (2024) - [c183]Wen Wu, Bo Li, Chao Zhang, Chung-Cheng Chiu, Qiujia Li, Junwen Bai, Tara N. Sainath, Philip C. Woodland:
Handling Ambiguity in Emotion: From Out-of-Domain Detection to Distribution Estimation. ACL (1) 2024: 2078-2093 - [c182]Khe Chai Sim, Zhouyuan Huo, Tsendsuren Munkhdalai, Nikhil Siddhartha, Adam Stooke, Zhong Meng, Bo Li, Tara N. Sainath:
A Comparison of Parameter-Efficient ASR Domain Adaptation Methods for Universal Speech and Language Models. ICASSP 2024: 6900-6904 - [c181]Shaojin Ding, David Qiu, David Rim, Yanzhang He, Oleg Rybakov, Bo Li, Rohit Prabhavalkar, Weiran Wang, Tara N. Sainath, Zhonglin Han, Jian Li, Amir Yazdanbakhsh, Shivani Agrawal:
USM-Lite: Quantization and Sparsity Aware Fine-Tuning for Speech Recognition with Universal Speech Models. ICASSP 2024: 10756-10760 - [c180]Junwen Bai, Bo Li, Qiujia Li, Tara N. Sainath, Trevor Strohman:
Efficient Adapter Finetuning for Tail Languages in Streaming Multilingual ASR. ICASSP 2024: 10841-10845 - [c179]Rohit Prabhavalkar, Zhong Meng, Weiran Wang, Adam Stooke, Xingyu Cai, Yanzhang He, Arun Narayanan, Dongseong Hwang, Tara N. Sainath, Pedro J. Moreno:
Extreme Encoder Output Frame Rate Reduction: Improving Computational Latencies of Large End-to-End Models. ICASSP 2024: 11816-11820 - [c178]Haozhe Shan, Albert Gu, Zhong Meng, Weiran Wang, Krzysztof Choromanski, Tara N. Sainath:
Augmenting Conformers With Structured State-Space Sequence Models For Online Speech Recognition. ICASSP 2024: 12221-12225 - [c177]Shefali Garg, Zhouyuan Huo, Khe Chai Sim, Suzan Schwartz, Mason Chua, Alëna Aksënova, Tsendsuren Munkhdalai, Levi King, Darryl Wright, Zion Mengesha, Dongseong Hwang, Tara N. Sainath, Françoise Beaufays, Pedro Moreno Mengibar:
Improving Speech Recognition for African American English with Audio Classification. ICASSP 2024: 12356-12360 - [c176]W. Ronny Huang, Cyril Allauzen, Tongzhou Chen, Kilol Gupta, Ke Hu, James Qin, Yu Zhang, Yongqiang Wang, Shuo-Yiin Chang, Tara N. Sainath:
Multilingual and Fully Non-Autoregressive ASR with Large Language Model Fusion: A Comprehensive Study. ICASSP 2024: 13306-13310 - [c175]Weiran Wang, Rohit Prabhavalkar, Haozhe Shan, Zhong Meng, Dongseong Hwang, Qiujia Li, Khe Chai Sim, Bo Li, James Qin, Xingyu Cai, Adam Stooke, Chengjian Zheng, Yanzhang He, Tara N. Sainath, Pedro Moreno Mengibar:
Massive End-to-end Speech Recognition Models with Time Reduction. NAACL-HLT 2024: 6206-6217 - [i96]Junwen Bai, Bo Li, Qiujia Li, Tara N. Sainath, Trevor Strohman:
Efficient Adapter Finetuning for Tail Languages in Streaming Multilingual ASR. CoRR abs/2401.08992 (2024) - [i95]W. Ronny Huang, Cyril Allauzen, Tongzhou Chen, Kilol Gupta, Ke Hu, James Qin, Yu Zhang, Yongqiang Wang, Shuo-Yiin Chang, Tara N. Sainath:
Multilingual and Fully Non-Autoregressive ASR with Large Language Model Fusion: A Comprehensive Study. CoRR abs/2401.12789 (2024) - [i94]Wen Wu, Bo Li, Chao Zhang, Chung-Cheng Chiu, Qiujia Li, Junwen Bai, Tara N. Sainath, Philip C. Woodland:
Handling Ambiguity in Emotion: From Out-of-Domain Detection to Distribution Estimation. CoRR abs/2402.12862 (2024) - [i93]Rohit Prabhavalkar, Zhong Meng, Weiran Wang, Adam Stooke, Xingyu Cai, Yanzhang He, Arun Narayanan, Dongseong Hwang, Tara N. Sainath, Pedro J. Moreno:
Extreme Encoder Output Frame Rate Reduction: Improving Computational Latencies of Large End-to-End Models. CoRR abs/2402.17184 (2024) - [i92]Tsendsuren Munkhdalai, Youzheng Chen, Khe Chai Sim, Fadi Biadsy, Tara N. Sainath, Pedro Moreno Mengibar:
Hierarchical Recurrent Adapters for Efficient Multi-Task Adaptation of Large Speech Models. CoRR abs/2403.19709 (2024) - [i91]Zhong Meng, Zelin Wu, Rohit Prabhavalkar, Cal Peyser, Weiran Wang, Nanxin Chen, Tara N. Sainath, Bhuvana Ramabhadran:
Text Injection for Neural Contextual Biasing. CoRR abs/2406.02921 (2024) - 2023
- [c174]Guru Prakash Arumugam, Shuo-Yiin Chang, Tara N. Sainath, Rohit Prabhavalkar, Quan Wang, Shaan Bijwadia:
Improved Long-Form Speech Recognition By Jointly Modeling The Primary And Non-Primary Speakers. ASRU 2023: 1-8 - [c173]Xingyu Cai, David Qiu, Shaojin Ding, Dongseong Hwang, Weiran Wang, Antoine Bruguier, Rohit Prabhavalkar, Tara N. Sainath, Yanzhang He:
Efficient Cascaded Streaming ASR System Via Frame Rate Reduction. ASRU 2023: 1-8 - [c172]Ke Hu, Tara N. Sainath, Bo Li, Yu Zhang, Yong Cheng, Tao Wang, Yujing Zhang, Frederick Liu:
Improving Multilingual and Code-Switching ASR Using Large Language Model Generated Text. ASRU 2023: 1-7 - [c171]Rami Botros, Rohit Prabhavalkar, Johan Schalkwyk, Ciprian Chelba, Tara N. Sainath, Françoise Beaufays:
Lego-Features: Exporting Modular Encoder Features for Streaming and Deliberation ASR. ICASSP 2023: 1-5 - [c170]Shuo-Yiin Chang, Chao Zhang, Tara N. Sainath, Bo Li, Trevor Strohman:
Context-Aware end-to-end ASR Using Self-Attentive Embedding and Tensor Fusion. ICASSP 2023: 1-5 - [c169]Steven M. Hernandez, Ding Zhao, Shaojin Ding, Antoine Bruguier, Rohit Prabhavalkar, Tara N. Sainath, Yanzhang He, Ian McGraw:
Sharing Low Rank Conformer Weights for Tiny Always-On Ambient Speech Recognition Models. ICASSP 2023: 1-5 - [c168]Ke Hu, Tara N. Sainath, Bo Li, Nan Du, Yanping Huang, Andrew M. Dai, Yu Zhang, Rodrigo Cabrera, Zhifeng Chen, Trevor Strohman:
Massively Multilingual Shallow Fusion with Large Language Models. ICASSP 2023: 1-5 - [c167]W. Ronny Huang, Shuo-Yiin Chang, Tara N. Sainath, Yanzhang He, David Rybach, Robert David, Rohit Prabhavalkar, Cyril Allauzen, Cal Peyser, Trevor D. Strohman:
E2E Segmentation in a Two-Pass Cascaded Encoder ASR Model. ICASSP 2023: 1-5 - [c166]Zhouyuan Huo, Khe Chai Sim, Bo Li, Dongseong Hwang, Tara N. Sainath, Trevor Strohman:
Resource-Efficient Transfer Learning from Speech Foundation Model Using Hierarchical Feature Fusion. ICASSP 2023: 1-5 - [c165]Bo Li, Dongseong Hwang, Zhouyuan Huo, Junwen Bai, Guru Prakash, Tara N. Sainath, Khe Chai Sim, Yu Zhang, Wei Han, Trevor Strohman, Françoise Beaufays:
Efficient Domain Adaptation for Speech Foundation Models. ICASSP 2023: 1-5 - [c164]Zhong Meng, Weiran Wang, Rohit Prabhavalkar, Tara N. Sainath, Tongzhou Chen, Ehsan Variani, Yu Zhang, Bo Li, Andrew Rosenberg, Bhuvana Ramabhadran:
JEIT: Joint End-to-End Model and Internal Language Model Training for Speech Recognition. ICASSP 2023: 1-5 - [c163]Cal Peyser, Michael Picheny, Kyunghyun Cho, Rohit Prabhavalkar, W. Ronny Huang, Tara N. Sainath:
A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at Scale. ICASSP 2023: 1-5 - [c162]Tara N. Sainath, Rohit Prabhavalkar, Diamantino Caseiro, Pat Rondon, Cyril Allauzen:
Improving Contextual Biasing with Text Injection. ICASSP 2023: 1-5 - [c161]Weiran Wang, Ding Zhao, Shaojin Ding, Hao Zhang, Shuo-Yiin Chang, David Rybach, Tara N. Sainath, Yanzhang He, Ian McGraw, Shankar Kumar:
Multi-Output RNN-T Joint Networks for Multi-Task Learning of ASR and Auxiliary Tasks. ICASSP 2023: 1-5 - [c160]Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Rohit Prabhavalkar, Tara N. Sainath, Trevor Strohman:
From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition. ICASSP 2023: 1-5 - [c159]Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Tara N. Sainath, Sabato Marco Siniscalchi, Chin-Hui Lee:
A Quantum Kernel Learning Approach to Acoustic Modeling for Spoken Command Recognition. ICASSP 2023: 1-5 - [c158]Chao Zhang, Bo Li, Tara N. Sainath, Trevor Strohman, Shuo-Yiin Chang:
UML: A Universal Monolingual Output Layer For Multilingual Asr. ICASSP 2023: 1-5 - [c157]Zih-Ching Chen, Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Shuo-Yiin Chang, Rohit Prabhavalkar, Hung-yi Lee, Tara N. Sainath:
How to Estimate Model Transferability of Pre-Trained Speech Models? INTERSPEECH 2023: 456-460 - [c156]Zhouyuan Huo, Khe Chai Sim, Dongseong Hwang, Tsendsuren Munkhdalai, Tara N. Sainath, Pedro Moreno Mengibar:
Re-investigating the Efficient Transfer Learning of Speech Foundation Model using Feature Fusion Methods. INTERSPEECH 2023: 556-560 - [c155]Cal Peyser, Zhong Meng, Rohit Prabhavalkar, Andrew Rosenberg, Tara N. Sainath, Michael Picheny, Kyunghyun Cho, Ke Hu:
Improving Joint Speech-Text Representations Without Alignment. INTERSPEECH 2023: 1354-1358 - [c154]W. Ronny Huang, Hao Zhang, Shankar Kumar, Shuo-Yiin Chang, Tara N. Sainath:
Semantic Segmentation with Bidirectional Language Models Improves Long-form ASR. INTERSPEECH 2023: 2778-2782 - [c153]Ke Hu, Bo Li, Tara N. Sainath, Yu Zhang, Françoise Beaufays:
Mixture-of-Expert Conformer for Streaming Multilingual ASR. INTERSPEECH 2023: 3327-3331 - [c152]Qiujia Li, Bo Li, Dongseong Hwang, Tara N. Sainath, Pedro Moreno Mengibar:
Modular Domain Adaptation for Conformer-Based Streaming ASR. INTERSPEECH 2023: 3357-3361 - [i90]Cal Peyser, W. Ronny Huang, Tara N. Sainath, Rohit Prabhavalkar, Michael Picheny, Kyunghyun Cho:
Dual Learning for Large Vocabulary On-Device ASR. CoRR abs/2301.04327 (2023) - [i89]Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Rohit Prabhavalkar, Tara N. Sainath, Trevor Strohman:
From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition. CoRR abs/2301.07851 (2023) - [i88]Bo Li, Dongseong Hwang, Zhouyuan Huo, Junwen Bai, Guru Prakash, Tara N. Sainath, Khe Chai Sim, Yu Zhang, Wei Han, Trevor Strohman, Françoise Beaufays:
Efficient Domain Adaptation for Speech Foundation Models. CoRR abs/2302.01496 (2023) - [i87]Zhong Meng, Weiran Wang, Rohit Prabhavalkar, Tara N. Sainath, Tongzhou Chen, Ehsan Variani, Yu Zhang, Bo Li, Andrew Rosenberg, Bhuvana Ramabhadran:
JEIT: Joint End-to-End Model and Internal Language Model Training for Speech Recognition. CoRR abs/2302.08583 (2023) - [i86]Ke Hu, Tara N. Sainath, Bo Li, Nan Du, Yanping Huang, Andrew M. Dai, Yu Zhang, Rodrigo Cabrera, Zhifeng Chen, Trevor Strohman:
Massively Multilingual Shallow Fusion with Large Language Models. CoRR abs/2302.08917 (2023) - [i85]Chao Zhang, Bo Li, Tara N. Sainath, Trevor Strohman, Shuo-Yiin Chang:
UML: A Universal Monolingual Output Layer for Multilingual ASR. CoRR abs/2302.11186 (2023) - [i84]Yu Zhang, Wei Han, James Qin, Yongqiang Wang, Ankur Bapna, Zhehuai Chen, Nanxin Chen, Bo Li, Vera Axelrod, Gary Wang, Zhong Meng, Ke Hu, Andrew Rosenberg, Rohit Prabhavalkar, Daniel S. Park, Parisa Haghani, Jason Riesa, Ginger Perng, Hagen Soltau, Trevor Strohman, Bhuvana Ramabhadran, Tara N. Sainath, Pedro J. Moreno, Chung-Cheng Chiu, Johan Schalkwyk, Françoise Beaufays, Yonghui Wu:
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages. CoRR abs/2303.01037 (2023) - [i83]Rohit Prabhavalkar, Takaaki Hori, Tara N. Sainath, Ralf Schlüter, Shinji Watanabe:
End-to-End Speech Recognition: A Survey. CoRR abs/2303.03329 (2023) - [i82]Steven M. Hernandez, Ding Zhao, Shaojin Ding, Antoine Bruguier, Rohit Prabhavalkar, Tara N. Sainath, Yanzhang He, Ian McGraw:
Sharing Low Rank Conformer Weights for Tiny Always-On Ambient Speech Recognition Models. CoRR abs/2303.08343 (2023) - [i81]Sepand Mavandadi, Tara N. Sainath, Ke Hu, Zelin Wu:
A Deliberation-based Joint Acoustic and Text Decoder. CoRR abs/2303.15293 (2023) - [i80]Rami Botros, Anmol Gulati, Tara N. Sainath, Krzysztof Choromanski, Ruoming Pang, Trevor Strohman, Weiran Wang, Jiahui Yu:
Practical Conformer: Optimizing size, speed and flops of Conformer for on-Device and cloud ASR. CoRR abs/2304.00171 (2023) - [i79]Rami Botros, Rohit Prabhavalkar, Johan Schalkwyk, Ciprian Chelba, Tara N. Sainath, Françoise Beaufays:
Lego-Features: Exporting modular encoder features for streaming and deliberation ASR. CoRR abs/2304.00173 (2023) - [i78]Cal Peyser, Michael Picheny, Kyunghyun Cho, Rohit Prabhavalkar, W. Ronny Huang, Tara N. Sainath:
A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at Scale. CoRR abs/2304.11053 (2023) - [i77]Qiujia Li, Bo Li, Dongseong Hwang, Tara N. Sainath, Pedro Moreno Mengibar:
Modular Domain Adaptation for Conformer-Based Streaming ASR. CoRR abs/2305.13408 (2023) - [i76]Ke Hu, Bo Li, Tara N. Sainath, Yu Zhang, Françoise Beaufays:
Mixture-of-Expert Conformer for Streaming Multilingual ASR. CoRR abs/2305.15663 (2023) - [i75]W. Ronny Huang, Hao Zhang, Shankar Kumar, Shuo-Yiin Chang, Tara N. Sainath:
Semantic Segmentation with Bidirectional Language Models Improves Long-form ASR. CoRR abs/2305.18419 (2023) - [i74]Zih-Ching Chen, Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Shuo-Yiin Chang, Rohit Prabhavalkar, Hung-yi Lee, Tara N. Sainath:
How to Estimate Model Transferability of Pre-Trained Speech Models? CoRR abs/2306.01015 (2023) - [i73]Paul K. Rubenstein, Chulayuth Asawaroengchai, Duc Dung Nguyen, Ankur Bapna, Zalán Borsos, Félix de Chaumont Quitry, Peter Chen, Dalia El Badawy, Wei Han, Eugene Kharitonov, Hannah Muckenhirn, Dirk Padfield, James Qin, Danny Rozenberg, Tara N. Sainath, Johan Schalkwyk, Matthew Sharifi, Michelle Tadmor Ramanovich, Marco Tagliasacchi, Alexandru Tudor, Mihajlo Velimirovic, Damien Vincent, Jiahui Yu, Yongqiang Wang, Vicky Zayats, Neil Zeghidour, Yu Zhang, Zhishuai Zhang, Lukas Zilka, Christian Havnø Frank:
AudioPaLM: A Large Language Model That Can Speak and Listen. CoRR abs/2306.12925 (2023) - [i72]Cal Peyser, Zhong Meng, Ke Hu, Rohit Prabhavalkar, Andrew Rosenberg, Tara N. Sainath, Michael Picheny, Kyunghyun Cho:
Improving Joint Speech-Text Representations Without Alignment. CoRR abs/2308.06125 (2023) - [i71]Shaan Bijwadia, Shuo-Yiin Chang, Weiran Wang, Zhong Meng, Hao Zhang, Tara N. Sainath:
Text Injection for Capitalization and Turn-Taking Prediction in Speech Models. CoRR abs/2308.07395 (2023) - [i70]Haozhe Shan, Albert Gu, Zhong Meng, Weiran Wang, Krzysztof Choromanski, Tara N. Sainath:
Augmenting conformers with structured state space models for online speech recognition. CoRR abs/2309.08551 (2023) - [i69]Shefali Garg, Zhouyuan Huo, Khe Chai Sim, Suzan Schwartz, Mason Chua, Alëna Aksënova, Tsendsuren Munkhdalai, Levi King, Darryl Wright, Zion Mengesha, Dongseong Hwang, Tara N. Sainath, Françoise Beaufays, Pedro Moreno Mengibar:
Improving Speech Recognition for African American English With Audio Classification. CoRR abs/2309.09996 (2023) - [i68]Weiran Wang, Rohit Prabhavalkar, Dongseong Hwang, Qiujia Li, Khe Chai Sim, Bo Li, James Qin, Xingyu Cai, Adam Stooke, Zhong Meng, CJ Zheng, Yanzhang He, Tara N. Sainath, Pedro Moreno Mengibar:
Massive End-to-end Models for Short Search Queries. CoRR abs/2309.12963 (2023) - [i67]Weiran Wang, Zelin Wu, Diamantino Caseiro, Tsendsuren Munkhdalai, Khe Chai Sim, Pat Rondon, Golan Pundak, Gan Song, Rohit Prabhavalkar, Zhong Meng, Ding Zhao, Tara N. Sainath, Pedro Moreno Mengibar:
Contextual Biasing with the Knuth-Morris-Pratt Matching Algorithm. CoRR abs/2310.00178 (2023) - [i66]Shaojin Ding, David Qiu, David Rim, Yanzhang He, Oleg Rybakov, Bo Li, Rohit Prabhavalkar, Weiran Wang, Tara N. Sainath, Shivani Agrawal, Zhonglin Han, Jian Li, Amir Yazdanbakhsh:
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models. CoRR abs/2312.08553 (2023) - [i65]Guru Prakash Arumugam, Shuo-Yiin Chang, Tara N. Sainath, Rohit Prabhavalkar, Quan Wang, Shaan Bijwadia:
Improved Long-Form Speech Recognition by Jointly Modeling the Primary and Non-primary Speakers. CoRR abs/2312.11123 (2023) - 2022
- [j10]Hung-Yi Lee, Shinji Watanabe
, Karen Livescu
, Abdelrahman Mohamed, Tara N. Sainath
:
Editorial Editorial of Special Issue on Self-Supervised Learning for Speech and Audio Processing. IEEE J. Sel. Top. Signal Process. 16(6): 1174-1178 (2022) - [j9]Abdelrahman Mohamed, Hung-yi Lee
, Lasse Borgholt, Jakob D. Havtorn, Joakim Edin
, Christian Igel, Katrin Kirchhoff, Shang-Wen Li, Karen Livescu
, Lars Maaløe, Tara N. Sainath
, Shinji Watanabe
:
Self-Supervised Speech Representation Learning: A Review. IEEE J. Sel. Top. Signal Process. 16(6): 1179-1210 (2022) - [j8]Yu Zhang
, Daniel S. Park
, Wei Han
, James Qin, Anmol Gulati, Joel Shor
, Aren Jansen, Yuanzhong Xu, Yanping Huang, Shibo Wang, Zongwei Zhou, Bo Li
, Min Ma
, William Chan, Jiahui Yu, Yongqiang Wang, Liangliang Cao, Khe Chai Sim
, Bhuvana Ramabhadran
, Tara N. Sainath
, Françoise Beaufays, Zhifeng Chen
, Quoc V. Le, Chung-Cheng Chiu, Ruoming Pang, Yonghui Wu:
BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition. IEEE J. Sel. Top. Signal Process. 16(6): 1519-1532 (2022) - [c151]Bo Li, Ruoming Pang, Yu Zhang, Tara N. Sainath, Trevor Strohman, Parisa Haghani, Yun Zhu, Brian Farris, Neeraj Gaur, Manasa Prasad:
Massively Multilingual ASR: A Lifelong Learning Solution. ICASSP 2022: 6397-6401 - [c150]Junwen Bai
, Bo Li, Yu Zhang, Ankur Bapna, Nikhil Siddhartha, Khe Chai Sim, Tara N. Sainath:
Joint Unsupervised and Supervised Training for Multilingual ASR. ICASSP 2022: 6402-6406 - [c149]Weiran Wang, Ke Hu, Tara N. Sainath:
Deliberation of Streaming RNN-Transducer by Non-Autoregressive Decoding. ICASSP 2022: 7452-7456 - [c148]Ke Hu, Tara N. Sainath, Arun Narayanan, Ruoming Pang, Trevor Strohman:
Transducer-Based Streaming Deliberation for Cascaded Encoders. ICASSP 2022: 8107-8111 - [c147]Tara N. Sainath, Yanzhang He, Arun Narayanan, Rami Botros, Weiran Wang, David Qiu, Chung-Cheng Chiu, Rohit Prabhavalkar, Alexander Gruenstein, Anmol Gulati, Bo Li, David Rybach, Emmanuel Guzman, Ian McGraw, James Qin, Krzysztof Choromanski, Qiao Liang, Robert David, Ruoming Pang, Shuo-Yiin Chang, Trevor Strohman, W. Ronny Huang, Wei Han, Yonghui Wu, Yu Zhang:
Improving The Latency And Quality Of Cascaded Encoders. ICASSP 2022: 8112-8116 - [c146]Chao Zhang, Bo Li, Zhiyun Lu, Tara N. Sainath, Shuo-Yiin Chang:
Improving the Fusion of Acoustic and Text Representations in RNN-T. ICASSP 2022: 8117-8121 - [c145]W. Ronny Huang, Cal Peyser, Tara N. Sainath, Ruoming Pang, Trevor D. Strohman, Shankar Kumar:
Sentence-Select: Large-Scale Language Model Data Selection for Rare-Word Speech Recognition. INTERSPEECH 2022: 689-693 - [c144]Weiran Wang, Tongzhou Chen, Tara N. Sainath, Ehsan Variani, Rohit Prabhavalkar, W. Ronny Huang, Bhuvana Ramabhadran, Neeraj Gaur, Sepand Mavandadi, Cal Peyser, Trevor Strohman, Yanzhang He, David Rybach:
Improving Rare Word Recognition with LM-aware MWER Training. INTERSPEECH 2022: 1031-1035 - [c143]Weiran Wang, Ke Hu, Tara N. Sainath:
Streaming Align-Refine for Non-autoregressive Deliberation. INTERSPEECH 2022: 1696-1700 - [c142]Shaojin Ding, Weiran Wang, Ding Zhao, Tara N. Sainath, Yanzhang He, Robert David, Rami Botros, Xin Wang, Rina Panigrahy, Qiao Liang, Dongseong Hwang, Ian McGraw, Rohit Prabhavalkar, Trevor Strohman:
A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes. INTERSPEECH 2022: 1706-1710 - [c141]Shuo-Yiin Chang, Bo Li, Tara N. Sainath, Chao Zhang, Trevor Strohman, Qiao Liang, Yanzhang He:
Turn-Taking Prediction for Natural Conversational Speech. INTERSPEECH 2022: 1821-1825 - [c140]Shuo-Yiin Chang, Guru Prakash, Zelin Wu, Tara N. Sainath, Bo Li, Qiao Liang, Adam Stambler, Shyam Upadhyay, Manaal Faruqui, Trevor Strohman:
Streaming Intended Query Detection using E2E Modeling for Continued Conversation. INTERSPEECH 2022: 1826-1830 - [c139]Bo Li, Tara N. Sainath, Ruoming Pang, Shuo-Yiin Chang, Qiumin Xu, Trevor Strohman, Vince Chen, Qiao Liang, Heguang Liu, Yanzhang He, Parisa Haghani, Sameer Bidichandani:
A Language Agnostic Multilingual Streaming On-Device ASR System. INTERSPEECH 2022: 3188-3192 - [c138]Chao Zhang, Bo Li, Tara N. Sainath, Trevor Strohman, Sepand Mavandadi, Shuo-Yiin Chang, Parisa Haghani:
Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification. INTERSPEECH 2022: 3223-3227 - [c137]Cal Peyser, W. Ronny Huang, Andrew Rosenberg, Tara N. Sainath, Michael Picheny, Kyunghyun Cho:
Towards Disentangled Speech Representations. INTERSPEECH 2022: 3603-3607 - [c136]Ke Hu, Tara N. Sainath, Yanzhang He, Rohit Prabhavalkar
, Trevor Strohman, Sepand Mavandadi, Weiran Wang:
Improving Deliberation by Text-Only and Semi-Supervised Training. INTERSPEECH 2022: 4940-4944 - [c135]W. Ronny Huang, Shuo-Yiin Chang, David Rybach, Tara N. Sainath, Rohit Prabhavalkar, Cal Peyser, Zhiyun Lu, Cyril Allauzen:
E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR. INTERSPEECH 2022: 4995-4999 - [c134]Tara N. Sainath, Rohit Prabhavalkar, Ankur Bapna, Yu Zhang, Zhouyuan Huo, Zhehuai Chen, Bo Li, Weiran Wang, Trevor Strohman:
JOIST: A Joint Speech and Text Streaming Model for ASR. SLT 2022: 52-59 - [c133]Tsendsuren Munkhdalai, Zelin Wu, Golan Pundak, Khe Chai Sim, Jiayang Li, Pat Rondon, Tara N. Sainath:
NAM+: Towards Scalable End-to-End Contextual Biasing for Adaptive ASR. SLT 2022: 190-196 - [c132]Cal Peyser, W. Ronny Huang, Tara N. Sainath, Rohit Prabhavalkar
, Michael Picheny, Kyunghyun Cho:
Dual Learning for Large Vocabulary On-Device ASR. SLT 2022: 245-251 - [c131]Shaan Bijwadia, Shuo-Yiin Chang, Bo Li, Tara N. Sainath, Chao Zhang, Yanzhang He:
Unified End-to-End Speech Recognition and Endpointing for Fast and Efficient Speech Systems. SLT 2022: 310-316 - [c130]Ke Hu, Bo Li, Tara N. Sainath:
Scaling Up Deliberation For Multilingual ASR. SLT 2022: 771-776 - [c129]Sepand Mavandadi, Bo Li, Chao Zhang, Brian Farris, Tara N. Sainath, Trevor Strohman:
A Truly Multilingual First Pass and Monolingual Second Pass Streaming on-Device ASR System. SLT 2022: 838-845 - [i64]Chao Zhang, Bo Li, Zhiyun Lu, Tara N. Sainath, Shuo-Yiin Chang:
Improving the fusion of acoustic and text representations in RNN-T. CoRR abs/2201.10240 (2022) - [i63]W. Ronny Huang, Cal Peyser, Tara N. Sainath, Ruoming Pang, Trevor Strohman, Shankar Kumar:
Sentence-Select: Large-Scale Language Model Data Selection for Rare-Word Speech Recognition. CoRR abs/2203.05008 (2022) - [i62]Shaojin Ding, Weiran Wang, Ding Zhao, Tara N. Sainath, Yanzhang He, Robert David, Rami Botros, Xin Wang, Rina Panigrahy, Qiao Liang, Dongseong Hwang, Ian McGraw, Rohit Prabhavalkar, Trevor Strohman:
A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes. CoRR abs/2204.06164 (2022) - [i61]Weiran Wang, Tongzhou Chen, Tara N. Sainath, Ehsan Variani, Rohit Prabhavalkar, W. Ronny Huang, Bhuvana Ramabhadran, Neeraj Gaur, Sepand Mavandadi, Cal Peyser, Trevor Strohman, Yanzhang He, David Rybach:
Improving Rare Word Recognition with LM-aware MWER Training. CoRR abs/2204.07553 (2022) - [i60]Weiran Wang, Ke Hu, Tara N. Sainath:
Streaming Align-Refine for Non-autoregressive Deliberation. CoRR abs/2204.07556 (2022) - [i59]W. Ronny Huang, Shuo-Yiin Chang, David Rybach, Rohit Prabhavalkar, Tara N. Sainath, Cyril Allauzen, Cal Peyser, Zhiyun Lu:
E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR. CoRR abs/2204.10749 (2022) - [i58]Abdelrahman Mohamed, Hung-yi Lee, Lasse Borgholt, Jakob D. Havtorn, Joakim Edin, Christian Igel, Katrin Kirchhoff, Shang-Wen Li, Karen Livescu
, Lars Maaløe, Tara N. Sainath, Shinji Watanabe
:
Self-Supervised Speech Representation Learning: A Review. CoRR abs/2205.10643 (2022) - [i57]Ke Hu, Tara N. Sainath, Yanzhang He, Rohit Prabhavalkar, Trevor Strohman, Sepand Mavandadi, Weiran Wang:
Improving Deliberation by Text-Only and Semi-Supervised Training. CoRR abs/2206.14716 (2022) - [i56]Cal Peyser, W. Ronny Huang, Andrew Rosenberg, Tara N. Sainath, Michael Picheny, Kyunghyun Cho:
Towards Disentangled Speech Representations. CoRR abs/2208.13191 (2022) - [i55]Shuo-Yiin Chang, Bo Li, Tara N. Sainath, Chao Zhang, Trevor Strohman, Qiao Liang, Yanzhang He:
Turn-Taking Prediction for Natural Conversational Speech. CoRR abs/2208.13321 (2022) - [i54]Shuo-Yiin Chang, Guru Prakash, Zelin Wu, Qiao Liang, Tara N. Sainath, Bo Li, Adam Stambler, Shyam Upadhyay, Manaal Faruqui, Trevor Strohman:
Streaming Intended Query Detection using E2E Modeling for Continued Conversation. CoRR abs/2208.13322 (2022) - [i53]Bo Li, Tara N. Sainath, Ruoming Pang, Shuo-Yiin Chang, Qiumin Xu, Trevor Strohman, Vince Chen, Qiao Liang, Heguang Liu, Yanzhang He, Parisa Haghani, Sameer Bidichandani:
A Language Agnostic Multilingual Streaming On-Device ASR System. CoRR abs/2208.13916 (2022) - [i52]Chao Zhang, Bo Li, Tara N. Sainath, Trevor Strohman, Sepand Mavandadi, Shuo-Yiin Chang, Parisa Haghani:
Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification. CoRR abs/2209.06058 (2022) - [i51]Ke Hu, Bo Li, Tara N. Sainath:
Scaling Up Deliberation for Multilingual ASR. CoRR abs/2210.05785 (2022) - [i50]Tara N. Sainath, Rohit Prabhavalkar, Ankur Bapna, Yu Zhang, Zhouyuan Huo, Zhehuai Chen, Bo Li, Weiran Wang, Trevor Strohman:
JOIST: A Joint Speech and Text Streaming Model For ASR. CoRR abs/2210.07353 (2022) - [i49]Shaan Bijwadia, Shuo-Yiin Chang, Bo Li, Tara N. Sainath, Chao Zhang, Yanzhang He:
Unified End-to-End Speech Recognition and Endpointing for Fast and Efficient Speech Systems. CoRR abs/2211.00786 (2022) - [i48]