default search action
John R. Hershey
Person information
- affiliation: Mitsubishi Electric Research Laboratories (MERL), Cambridge, USA
- affiliation: IBM T. J. Watson Research Center, New York, USA
- affiliation: University of California San Diego, Department of Cognitive Science
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2025
- [j12]Simon Leglaive, Matthieu Fraticelli, Hend Elghazaly, Léonie Borne, Mostafa Sadeghi, Scott Wisdom, Manuel Pariente, John R. Hershey, Daniel Pressnitzer, Jon P. Barker:
Objective and subjective evaluation of speech enhancement methods in the UDASE task of the 7th CHiME challenge. Comput. Speech Lang. 89: 101685 (2025) - 2024
- [c110]Mark Hamilton, Andrew Zisserman, John R. Hershey, William T. Freeman:
Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language. CVPR 2024: 13117-13127 - [c109]Cong Han, Kevin W. Wilson, Scott Wisdom, John R. Hershey:
Unsupervised Multi-Channel Separation And Adaptation. ICASSP 2024: 721-725 - [i42]Simon Leglaive, Matthieu Fraticelli, Hend Elghazaly, Léonie Borne, Mostafa Sadeghi, Scott Wisdom, Manuel Pariente, John R. Hershey, Daniel Pressnitzer, Jon P. Barker:
Objective and subjective evaluation of speech enhancement methods in the UDASE task of the 7th CHiME challenge. CoRR abs/2402.01413 (2024) - [i41]Mark Hamilton, Andrew Zisserman, John R. Hershey, William T. Freeman:
Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language. CoRR abs/2406.05629 (2024) - [i40]Artem Dementyev, Chandan K. A. Reddy, Scott Wisdom, Navin Chatlani, John R. Hershey, Richard F. Lyon:
Towards sub-millisecond latency real-time speech enhancement models on hearables. CoRR abs/2409.18239 (2024) - 2023
- [c108]Pradyumna Reddy, Scott Wisdom, Klaus Greff, John R. Hershey, Thomas Kipf:
Audioslots: A Slot-Centric Generative Model For Audio Separation. ICASSP Workshops 2023: 1-5 - [c107]Hakan Erdogan, Scott Wisdom, Xuankai Chang, Zalán Borsos, Marco Tagliasacchi, Neil Zeghidour, John R. Hershey:
TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition. INTERSPEECH 2023: 3462-3466 - [i39]Pradyumna Reddy, Scott Wisdom, Klaus Greff, John R. Hershey, Thomas Kipf:
AudioSlots: A slot-centric generative model for audio separation. CoRR abs/2305.05591 (2023) - [i38]Cong Han, Kevin W. Wilson, Scott Wisdom, John R. Hershey:
Unsupervised Multi-channel Separation and Adaptation. CoRR abs/2305.11151 (2023) - [i37]Simon Leglaive, Léonie Borne, Efthymios Tzinis, Mostafa Sadeghi, Matthieu Fraticelli, Scott Wisdom, Manuel Pariente, Daniel Pressnitzer, John R. Hershey:
The CHiME-7 UDASE task: Unsupervised domain adaptation for conversational speech enhancement. CoRR abs/2307.03533 (2023) - [i36]Hakan Erdogan, Scott Wisdom, Xuankai Chang, Zalán Borsos, Marco Tagliasacchi, Neil Zeghidour, John R. Hershey:
TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition. CoRR abs/2308.10415 (2023) - 2022
- [c106]Efthymios Tzinis, Scott Wisdom, Tal Remez, John R. Hershey:
AudioScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound Separation. ECCV (37) 2022: 368-385 - [c105]Tom Denton, Scott Wisdom, John R. Hershey:
Improving Bird Classification with Unsupervised Sound Separation. ICASSP 2022: 636-640 - [c104]Aswin Sivaraman, Scott Wisdom, Hakan Erdogan, John R. Hershey:
Adapting Speech Separation to Real-World Meetings using Mixture Invariant Training. ICASSP 2022: 686-690 - [c103]Hannah Muckenhirn, Aleksandr Safin, Hakan Erdogan, Felix de Chaumont Quitry, Marco Tagliasacchi, Scott Wisdom, John R. Hershey:
CycleGAN-based Unpaired Speech Dereverberation. INTERSPEECH 2022: 196-200 - [c102]Katharine Patterson, Kevin W. Wilson, Scott Wisdom, John R. Hershey:
Distance-Based Sound Separation. INTERSPEECH 2022: 901-905 - [i35]Hannah Muckenhirn, Aleksandr Safin, Hakan Erdogan, Felix de Chaumont Quitry, Marco Tagliasacchi, Scott Wisdom, John R. Hershey:
CycleGAN-Based Unpaired Speech Dereverberation. CoRR abs/2203.15652 (2022) - [i34]Katharine Patterson, Kevin W. Wilson, Scott Wisdom, John R. Hershey:
Distance-Based Sound Separation. CoRR abs/2207.00562 (2022) - [i33]Efthymios Tzinis, Scott Wisdom, Tal Remez, John R. Hershey:
AudioScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound Separation. CoRR abs/2207.10141 (2022) - 2021
- [c101]Scott Wisdom, Hakan Erdogan, Daniel P. W. Ellis, Romain Serizel, Nicolas Turpault, Eduardo Fonseca, Justin Salamon, Prem Seetharaman, John R. Hershey:
What's all the Fuss about Free Universal Sound Separation Data? ICASSP 2021: 186-190 - [c100]Nicolas Turpault, Romain Serizel, Scott Wisdom, Hakan Erdogan, John R. Hershey, Eduardo Fonseca, Prem Seetharaman, Justin Salamon:
Sound Event Detection and Separation: A Benchmark on Desed Synthetic Soundscapes. ICASSP 2021: 840-844 - [c99]Soumi Maiti, Hakan Erdogan, Kevin W. Wilson, Scott Wisdom, Shinji Watanabe, John R. Hershey:
End-To-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings. ICASSP 2021: 7183-7187 - [c98]Efthymios Tzinis, Scott Wisdom, Aren Jansen, Shawn Hershey, Tal Remez, Dan Ellis, John R. Hershey:
Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds. ICLR 2021 - [c97]Cong Han, Yi Luo, Chenda Li, Tianyan Zhou, Keisuke Kinoshita, Shinji Watanabe, Marc Delcroix, Hakan Erdogan, John R. Hershey, Nima Mesgarani, Zhuo Chen:
Continuous Speech Separation Using Speaker Inventory for Long Recording. Interspeech 2021: 3036-3040 - [c96]Desh Raj, Pavel Denisov, Zhuo Chen, Hakan Erdogan, Zili Huang, Maokui He, Shinji Watanabe, Jun Du, Takuya Yoshioka, Yi Luo, Naoyuki Kanda, Jinyu Li, Scott Wisdom, John R. Hershey:
Integration of Speech Separation, Diarization, and Recognition for Multi-Speaker Meetings: System Description, Comparison, and Analysis. SLT 2021: 897-904 - [c95]Zhong-Qiu Wang, Hakan Erdogan, Scott Wisdom, Kevin W. Wilson, Desh Raj, Shinji Watanabe, Zhuo Chen, John R. Hershey:
Sequential Multi-Frame Neural Beamforming for Speech Separation and Enhancement. SLT 2021: 905-911 - [c94]Scott Wisdom, Aren Jansen, Ron J. Weiss, Hakan Erdogan, John R. Hershey:
Sparse, Efficient, and Semantic Mixture Invariant Training: Taming In-the-Wild Unsupervised Sound Separation. WASPAA 2021: 51-55 - [c93]Yuma Koizumi, Shigeki Karita, Scott Wisdom, Hakan Erdogan, John R. Hershey, Llion Jones, Michiel Bacchiani:
DF-Conformer: Integrated Architecture of Conv-Tasnet and Conformer Using Linear Complexity Self-Attention for Speech Enhancement. WASPAA 2021: 161-165 - [c92]Eduardo Fonseca, Aren Jansen, Daniel P. W. Ellis, Scott Wisdom, Marco Tagliasacchi, John R. Hershey, Manoj Plakal, Shawn Hershey, R. Channing Moore, Xavier Serra:
Self-Supervised Learning from Automatically Separated Sound Scenes. WASPAA 2021: 251-255 - [i32]Soumi Maiti, Hakan Erdogan, Kevin W. Wilson, Scott Wisdom, Shinji Watanabe, John R. Hershey:
End-to-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings. CoRR abs/2105.02096 (2021) - [i31]Eduardo Fonseca, Aren Jansen, Daniel P. W. Ellis, Scott Wisdom, Marco Tagliasacchi, John R. Hershey, Manoj Plakal, Shawn Hershey, R. Channing Moore, Xavier Serra:
Self-Supervised Learning from Automatically Separated Sound Scenes. CoRR abs/2105.02132 (2021) - [i30]Scott Wisdom, Aren Jansen, Ron J. Weiss, Hakan Erdogan, John R. Hershey:
Sparse, Efficient, and Semantic Mixture Invariant Training: Taming In-the-Wild Unsupervised Sound Separation. CoRR abs/2106.00847 (2021) - [i29]Efthymios Tzinis, Scott Wisdom, Tal Remez, John R. Hershey:
Improving On-Screen Sound Separation for Open Domain Videos with Audio-Visual Self-attention. CoRR abs/2106.09669 (2021) - [i28]Yuma Koizumi, Shigeki Karita, Scott Wisdom, Hakan Erdogan, John R. Hershey, Llion Jones, Michiel Bacchiani:
DF-Conformer: Integrated architecture of Conv-TasNet and Conformer using linear complexity self-attention for speech enhancement. CoRR abs/2106.15813 (2021) - [i27]Aswin Sivaraman, Scott Wisdom, Hakan Erdogan, John R. Hershey:
Adapting Speech Separation to Real-World Meetings Using Mixture Invariant Training. CoRR abs/2110.10739 (2021) - 2020
- [c91]Nicolas Turpault, Scott Wisdom, Hakan Erdogan, John R. Hershey, Romain Serizel, Eduardo Fonseca, Prem Seetharaman, Justin Salamon:
Improving Sound Event Detection in Domestic Environments using Sound Separation. DCASE 2020: 205-209 - [c90]Efthymios Tzinis, Scott Wisdom, John R. Hershey, Aren Jansen, Daniel P. W. Ellis:
Improving Universal Sound Separation Using Sound Classification. ICASSP 2020: 96-100 - [c89]Scott Wisdom, Efthymios Tzinis, Hakan Erdogan, Ron J. Weiss, Kevin W. Wilson, John R. Hershey:
Unsupervised Sound Separation Using Mixture Invariant Training. NeurIPS 2020 - [i26]Scott Wisdom, Efthymios Tzinis, Hakan Erdogan, Ron J. Weiss, Kevin W. Wilson, John R. Hershey:
Unsupervised Sound Separation Using Mixtures of Mixtures. CoRR abs/2006.12701 (2020) - [i25]Nicolas Turpault, Scott Wisdom, Hakan Erdogan, John R. Hershey, Romain Serizel, Eduardo Fonseca, Prem Seetharaman, Justin Salamon:
Improving Sound Event Detection In Domestic Environments Using Sound Separation. CoRR abs/2007.03932 (2020) - [i24]Nicolas Turpault, Romain Serizel, Scott Wisdom, Hakan Erdogan, John R. Hershey, Eduardo Fonseca, Prem Seetharaman, Justin Salamon:
Sound Event Detection and Separation: a Benchmark on Desed Synthetic Soundscapes. CoRR abs/2011.00801 (2020) - [i23]Scott Wisdom, Hakan Erdogan, Daniel P. W. Ellis, Romain Serizel, Nicolas Turpault, Eduardo Fonseca, Justin Salamon, Prem Seetharaman, John R. Hershey:
What's All the FUSS About Free Universal Sound Separation Data? CoRR abs/2011.00803 (2020) - [i22]Efthymios Tzinis, Scott Wisdom, Aren Jansen, Shawn Hershey, Tal Remez, Daniel P. W. Ellis, John R. Hershey:
Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds. CoRR abs/2011.01143 (2020) - [i21]Desh Raj, Pavel Denisov, Zhuo Chen, Hakan Erdogan, Zili Huang, Mao-Kui He, Shinji Watanabe, Jun Du, Takuya Yoshioka, Yi Luo, Naoyuki Kanda, Jinyu Li, Scott Wisdom, John R. Hershey:
Integration of speech separation, diarization, and recognition for multi-speaker meetings: System description, comparison, and analysis. CoRR abs/2011.02014 (2020) - [i20]Cong Han, Yi Luo, Chenda Li, Tianyan Zhou, Keisuke Kinoshita, Shinji Watanabe, Marc Delcroix, Hakan Erdogan, John R. Hershey, Nima Mesgarani, Zhuo Chen:
Continuous Speech Separation Using Speaker Inventory for Long Multi-talker Recording. CoRR abs/2012.09727 (2020)
2010 – 2019
- 2019
- [j11]Takaaki Hori, Wen Wang, Yusuke Koji, Chiori Hori, Bret Harsham, John R. Hershey:
Adversarial training and decoding strategies for end-to-end neural conversation models. Comput. Speech Lang. 54: 122-139 (2019) - [j10]Jonathan Le Roux, Gordon Wichern, Shinji Watanabe, Andy M. Sarroff, John R. Hershey:
Phasebook and Friends: Leveraging Discrete Representations for Source Separation. IEEE J. Sel. Top. Signal Process. 13(2): 370-382 (2019) - [c88]Jonathan Le Roux, Gordon Wichern, Shinji Watanabe, Andy M. Sarroff, John R. Hershey:
The Phasebook: Building Complex Masks via Discrete Representations for Source Separation. ICASSP 2019: 66-70 - [c87]Jonathan Le Roux, Scott Wisdom, Hakan Erdogan, John R. Hershey:
SDR - Half-baked or Well Done? ICASSP 2019: 626-630 - [c86]Scott Wisdom, John R. Hershey, Kevin W. Wilson, Jeremy Thorpe, Michael Chinen, Brian Patton, Rif A. Saurous:
Differentiable Consistency Constraints for Improved Deep Speech Enhancement. ICASSP 2019: 900-904 - [c85]Quan Wang, Hannah Muckenhirn, Kevin W. Wilson, Prashant Sridhar, Zelin Wu, John R. Hershey, Rif A. Saurous, Ron J. Weiss, Ye Jia, Ignacio López-Moreno:
VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking. INTERSPEECH 2019: 2728-2732 - [c84]Hiroshi Seki, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux, John R. Hershey:
End-to-End Multilingual Multi-Speaker Speech Recognition. INTERSPEECH 2019: 3755-3759 - [c83]Ilya Kavalerov, Scott Wisdom, Hakan Erdogan, Brian Patton, Kevin W. Wilson, Jonathan Le Roux, John R. Hershey:
Universal Sound Separation. WASPAA 2019: 175-179 - [i19]Ilya Kavalerov, Scott Wisdom, Hakan Erdogan, Brian Patton, Kevin W. Wilson, Jonathan Le Roux, John R. Hershey:
Universal Sound Separation. CoRR abs/1905.03330 (2019) - [i18]Efthymios Tzinis, Scott Wisdom, John R. Hershey, Aren Jansen, Daniel P. W. Ellis:
Improving Universal Sound Separation Using Sound Classification. CoRR abs/1911.07951 (2019) - [i17]Zhong-Qiu Wang, Scott Wisdom, Kevin W. Wilson, John R. Hershey:
Alternating Between Spectral and Spatial Estimation for Speech Separation and Enhancement. CoRR abs/1911.07953 (2019) - 2018
- [c82]Hiroshi Seki, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux, John R. Hershey:
A Purely End-to-End System for Multi-speaker Speech Recognition. ACL (1) 2018: 2620-2630 - [c81]Zhong-Qiu Wang, Jonathan Le Roux, John R. Hershey:
Multi-Channel Deep Clustering: Discriminative Spectral and Spatial Embeddings for Speaker-Independent Speech Separation. ICASSP 2018: 1-5 - [c80]Zhong-Qiu Wang, Jonathan Le Roux, John R. Hershey:
Alternative Objective Functions for Deep Clustering. ICASSP 2018: 686-690 - [c79]Shane Settle, Jonathan Le Roux, Takaaki Hori, Shinji Watanabe, John R. Hershey:
End-to-End Multi-Speaker Speech Recognition. ICASSP 2018: 4819-4823 - [c78]Hiroshi Seki, Shinji Watanabe, Takaaki Hori, Jonathan Le Roux, John R. Hershey:
An End-to-End Language-Tracking Speech Recognizer for Mixed-Language Speech. ICASSP 2018: 4919-4923 - [c77]Tsubasa Ochiai, Shinji Watanabe, Shigeru Katagiri, Takaaki Hori, John R. Hershey:
Speaker Adaptation for Multichannel End-to-End Speech Recognition. ICASSP 2018: 6707-6711 - [c76]Zhong-Qiu Wang, Jonathan Le Roux, DeLiang Wang, John R. Hershey:
End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction. INTERSPEECH 2018: 2708-2712 - [c75]Kevin W. Wilson, Michael Chinen, Jeremy Thorpe, Brian Patton, John R. Hershey, Rif A. Saurous, Jan Skoglund, Richard F. Lyon:
Exploring Tradeoffs in Models for Low-Latency Speech Enhancement. IWAENC 2018: 366-370 - [i16]Zhong-Qiu Wang, Jonathan Le Roux, DeLiang Wang, John R. Hershey:
End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction. CoRR abs/1804.10204 (2018) - [i15]Hiroshi Seki, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux, John R. Hershey:
A Purely End-to-end System for Multi-speaker Speech Recognition. CoRR abs/1805.05826 (2018) - [i14]Jonathan Le Roux, Gordon Wichern, Shinji Watanabe, Andy M. Sarroff, John R. Hershey:
Phasebook and Friends: Leveraging Discrete Representations for Source Separation. CoRR abs/1810.01395 (2018) - [i13]Quan Wang, Hannah Muckenhirn, Kevin W. Wilson, Prashant Sridhar, Zelin Wu, John R. Hershey, Rif A. Saurous, Ron J. Weiss, Ye Jia, Ignacio López-Moreno:
VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking. CoRR abs/1810.04826 (2018) - [i12]Jonathan Le Roux, Scott Wisdom, Hakan Erdogan, John R. Hershey:
SDR - half-baked or well done? CoRR abs/1811.02508 (2018) - [i11]Kevin W. Wilson, Michael Chinen, Jeremy Thorpe, Brian Patton, John R. Hershey, Rif A. Saurous, Jan Skoglund, Richard F. Lyon:
Exploring Tradeoffs in Models for Low-latency Speech Enhancement. CoRR abs/1811.07030 (2018) - [i10]Scott Wisdom, John R. Hershey, Kevin W. Wilson, Jeremy Thorpe, Michael Chinen, Brian Patton, Rif A. Saurous:
Differentiable Consistency Constraints for Improved Deep Speech Enhancement. CoRR abs/1811.08521 (2018) - 2017
- [j9]Takaaki Hori, Zhuo Chen, Hakan Erdogan, John R. Hershey, Jonathan Le Roux, Vikramjit Mitra, Shinji Watanabe:
Multi-microphone speech recognition integrating beamforming, robust feature extraction, and advanced DNN/RNN backend. Comput. Speech Lang. 46: 401-418 (2017) - [j8]Yuuki Tachioka, Shinji Watanabe, Jonathan Le Roux, John R. Hershey:
Prior-based Binary Masking and Discriminative Methods for Reverberant and Noisy Speech Recognition Using Distant Stereo Microphones. J. Inf. Process. 25: 407-416 (2017) - [j7]Shinji Watanabe, Takaaki Hori, Suyoun Kim, John R. Hershey, Tomoki Hayashi:
Hybrid CTC/Attention Architecture for End-to-End Speech Recognition. IEEE J. Sel. Top. Signal Process. 11(8): 1240-1253 (2017) - [j6]Tsubasa Ochiai, Shinji Watanabe, Takaaki Hori, John R. Hershey, Xiong Xiao:
Unified Architecture for Multichannel End-to-End Speech Recognition With Neural Beamforming. IEEE J. Sel. Top. Signal Process. 11(8): 1274-1288 (2017) - [c74]Takaaki Hori, Shinji Watanabe, John R. Hershey:
Joint CTC/attention decoding for end-to-end speech recognition. ACL (1) 2017: 518-529 - [c73]Shinji Watanabe, Takaaki Hori, John R. Hershey:
Language independent end-to-end architecture for joint language identification and speech recognition. ASRU 2017: 265-271 - [c72]Takaaki Hori, Shinji Watanabe, John R. Hershey:
Multi-level language modeling and decoding for open vocabulary end-to-end speech recognition. ASRU 2017: 287-293 - [c71]Chiori Hori, Takaaki Hori, Tim K. Marks, John R. Hershey:
Early and late integration of audio features for automatic video description. ASRU 2017: 430-436 - [c70]Yi Luo, Zhuo Chen, John R. Hershey, Jonathan Le Roux, Nima Mesgarani:
Deep clustering and conventional networks for music separation: Stronger together. ICASSP 2017: 61-65 - [c69]Zhong Meng, Shinji Watanabe, John R. Hershey, Hakan Erdogan:
Deep long short-term memory adaptive beamforming networks for multichannel robust speech recognition. ICASSP 2017: 271-275 - [c68]Shinji Watanabe, Takaaki Hori, Jonathan Le Roux, John R. Hershey:
Student-teacher network learning with enhanced features. ICASSP 2017: 5275-5279 - [c67]Chiori Hori, Takaaki Hori, Teng-Yok Lee, Ziming Zhang, Bret Harsham, John R. Hershey, Tim K. Marks, Kazuhiro Sumi:
Attention-Based Multimodal Fusion for Video Description. ICCV 2017: 4203-4212 - [c66]Tsubasa Ochiai, Shinji Watanabe, Takaaki Hori, John R. Hershey:
Multichannel End-to-end Speech Recognition. ICML 2017: 2632-2641 - [p6]Shinji Watanabe, Marc Delcroix, Florian Metze, John R. Hershey:
Preliminaries. New Era for Robust Speech Recognition, Exploiting Deep Learning 2017: 3-17 - [p5]Xiong Xiao, Shinji Watanabe, Hakan Erdogan, Michael I. Mandel, Liang Lu, John R. Hershey, Michael L. Seltzer, Guoguo Chen, Yu Zhang, Dong Yu:
Discriminative Beamforming with Phase-Aware Neural Networks for Speech Enhancement and Recognition. New Era for Robust Speech Recognition, Exploiting Deep Learning 2017: 79-104 - [p4]John R. Hershey, Jonathan Le Roux, Shinji Watanabe, Scott Wisdom, Zhuo Chen, Yusuf Ziya Isik:
Novel Deep Architectures in Speech Processing. New Era for Robust Speech Recognition, Exploiting Deep Learning 2017: 135-164 - [p3]Hakan Erdogan, John R. Hershey, Shinji Watanabe, Jonathan Le Roux:
Deep Recurrent Networks for Separation and Recognition of Single-Channel Speech in Nonstationary Background Audio. New Era for Robust Speech Recognition, Exploiting Deep Learning 2017: 165-186 - [p2]Shinji Watanabe, Takaaki Hori, Yajie Miao, Marc Delcroix, Florian Metze, John R. Hershey:
Toolkits for Robust Speech Processing. New Era for Robust Speech Recognition, Exploiting Deep Learning 2017: 369-382 - [e1]Shinji Watanabe, Marc Delcroix, Florian Metze, John R. Hershey:
New Era for Robust Speech Recognition, Exploiting Deep Learning. Springer 2017, ISBN 978-3-319-64679-4 [contents] - [i9]Chiori Hori, Takaaki Hori, Teng-Yok Lee, Kazuhiro Sumi, John R. Hershey, Tim K. Marks:
Attention-Based Multimodal Fusion for Video Description. CoRR abs/1701.03126 (2017) - [i8]Tsubasa Ochiai, Shinji Watanabe, Takaaki Hori, John R. Hershey:
Multichannel End-to-end Speech Recognition. CoRR abs/1703.04783 (2017) - [i7]Zhong Meng, Shinji Watanabe, John R. Hershey, Hakan Erdogan:
Deep Long Short-Term Memory Adaptive Beamforming Networks For Multichannel Robust Speech Recognition. CoRR abs/1711.08016 (2017) - 2016
- [c65]John R. Hershey, Zhuo Chen, Jonathan Le Roux, Shinji Watanabe:
Deep clustering: Discriminative embeddings for segmentation and separation. ICASSP 2016: 31-35 - [c64]Scott Wisdom, John R. Hershey, Jonathan Le Roux, Shinji Watanabe:
Deep unfolding for multichannel source separation. ICASSP 2016: 121-125 - [c63]Xiong Xiao, Shinji Watanabe, Hakan Erdogan, Liang Lu, John R. Hershey, Michael L. Seltzer, Guoguo Chen, Yu Zhang, Michael I. Mandel, Dong Yu:
Deep beamforming networks for multi-channel speech recognition. ICASSP 2016: 5745-5749 - [c62]Takaaki Hori, Chiori Hori, Shinji Watanabe, John R. Hershey:
Minimum word error training of long short-term memory recurrent neural network language models for speech recognition. ICASSP 2016: 5990-5994 - [c61]Chiori Hori, Shinji Watanabe, Takaaki Hori, Bret A. Harsham, John R. Hershey, Yusuke Koji, Youichi Fujii, Yuki Furumoto:
Driver confusion status detection using recurrent neural networks. ICME 2016: 1-6 - [c60]Yusuf Ziya Isik, Jonathan Le Roux, Zhuo Chen, Shinji Watanabe, John R. Hershey:
Single-Channel Multi-Speaker Separation Using Deep Clustering. INTERSPEECH 2016: 545-549 - [c59]Hakan Erdogan, John R. Hershey, Shinji Watanabe, Michael I. Mandel, Jonathan Le Roux:
Improved MVDR Beamforming Using Single-Channel Mask Prediction Networks. INTERSPEECH 2016: 1981-1985 - [c58]Chiori Hori, Takaaki Hori, Shinji Watanabe, John R. Hershey:
Context-Sensitive and Role-Dependent Spoken Language Understanding Using Bidirectional and Attention LSTMs. INTERSPEECH 2016: 3236-3240 - [c57]Scott Wisdom, Thomas Powers, John R. Hershey, Jonathan Le Roux, Les E. Atlas:
Full-Capacity Unitary Recurrent Neural Networks. NIPS 2016: 4880-4888 - [c56]Takaaki Hori, Hai Wang, Chiori Hori, Shinji Watanabe, Bret Harsham, Jonathan Le Roux, John R. Hershey, Yusuke Koji, Yi Jing, Zhaocheng Zhu, Takeyuki Aikawa:
Dialog state tracking with attention-based sequence-to-sequence learning. SLT 2016: 552-558 - [i6]Oncel Tuzel, Yuichi Taguchi, John R. Hershey:
Global-Local Face Upsampling Network. CoRR abs/1603.07235 (2016) - [i5]Yusuf Ziya Isik, Jonathan Le Roux, Zhuo Chen, Shinji Watanabe, John R. Hershey:
Single-Channel Multi-Speaker Separation using Deep Clustering. CoRR abs/1607.02173 (2016) - [i4]Scott Wisdom, Thomas Powers, John R. Hershey, Jonathan Le Roux, Les E. Atlas:
Full-Capacity Unitary Recurrent Neural Networks. CoRR abs/1611.00035 (2016) - [i3]Yi Luo, Zhuo Chen, John R. Hershey, Jonathan Le Roux, Nima Mesgarani:
Deep Clustering and Conventional Networks for Music Separation: Stronger Together. CoRR abs/1611.06265 (2016) - 2015
- [c55]Takaaki Hori, Zhuo Chen, Hakan Erdogan, John R. Hershey, Jonathan Le Roux, Vikramjit Mitra, Shinji Watanabe:
The MERL/SRI system for the 3RD CHiME challenge using beamforming, robust feature extraction, and advanced speech recognition. ASRU 2015: 475-481 - [c54]Felix Weninger, Hakan Erdogan, Shinji Watanabe, Emmanuel Vincent, Jonathan Le Roux, John R. Hershey, Björn W. Schuller:
Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR. LVA/ICA 2015: 91-99 - [c53]Jonathan Le Roux, John R. Hershey, Felix Weninger:
Deep NMF for speech separation. ICASSP 2015: 66-70 - [c52]Hakan Erdogan, John R. Hershey, Shinji Watanabe, Jonathan Le Roux:
Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks. ICASSP 2015: 708-712 - [c51]Jonathan Le Roux, Emmanuel Vincent, John R. Hershey, Daniel P. W. Ellis:
Micbots: Collecting large realistic datasets for speech and audio research using mobile robots. ICASSP 2015: 5635-5639 - [c50]Zhuo Chen, Shinji Watanabe, Hakan Erdogan, John R. Hershey:
Speech enhancement and recognition using multi-task learning of long short-term memory recurrent neural networks. INTERSPEECH 2015: 3274-3278 - [c49]Ahmed Hussen Abdelaziz, Shinji Watanabe, John R. Hershey, Emmanuel Vincent, Dorothea Kolossa:
Uncertainty propagation through deep neural networks. INTERSPEECH 2015: 3561-3565 - [i2]John R. Hershey, Zhuo Chen, Jonathan Le Roux, Shinji Watanabe:
Deep clustering: Discriminative embeddings for segmentation and separation. CoRR abs/1508.04306 (2015) - 2014
- [c48]Yuuki Tachioka, Shinji Watanabe, Jonathan Le Roux, John R. Hershey:
Sequence discriminative training for low-rank deep neural networks. GlobalSIP 2014: 572-576 - [c47]Felix Weninger, John R. Hershey, Jonathan Le Roux, Björn W. Schuller:
Discriminatively trained recurrent neural networks for single-channel speech separation. GlobalSIP 2014: 577-581 - [c46]Hao Tang, Shinji Watanabe, Tim K. Marks, John R. Hershey:
Log-linear dialog manager. ICASSP 2014: 4092-4096 - [c45]Umut Simsekli, Jonathan Le Roux, John R. Hershey:
Non-negative source-filter dynamical system for speech enhancement. ICASSP 2014: 6206-6210 - [c44]Shinji Watanabe, John R. Hershey, Tim K. Marks, Youichi Fujii, Yusuke Koji:
Cost-level integration of statistical and rule-based dialog managers. INTERSPEECH 2014: 323-327 - [c43]Felix Weninger, Jonathan Le Roux, John R. Hershey, Shinji Watanabe:
Discriminative NMF and its application to single-channel source separation. INTERSPEECH 2014: 865-869 - [c42]Yuuki Tachioka, Shinji Watanabe, Jonathan Le Roux, John R. Hershey:
Sequential maximum mutual information linear discriminant analysis for speech recognition. INTERSPEECH 2014: 2415-2419 - [i1]John R. Hershey, Jonathan Le Roux, Felix Weninger:
Deep Unfolding: Model-Based Inspiration of Novel Deep Architectures. CoRR abs/1409.2574 (2014) - 2013
- [c41]Yuuki Tachioka, Shinji Watanabe, Jonathan Le Roux, John R. Hershey:
A generalized discriminative training framework for system combination. ASRU 2013: 43-48 - [c40]Cédric Févotte, Jonathan Le Roux, John R. Hershey:
Non-negative dynamical system with application to speech and audio. ICASSP 2013: 3158-3162 - [c39]Jonathan Le Roux, Petros T. Boufounos, Kang Kang, John R. Hershey:
Source localization in reverberant environments using sparse optimization. ICASSP 2013: 4310-4314 - [c38]Yuuki Tachioka, Shinji Watanabe, John R. Hershey:
Effectiveness of discriminative training and feature transformation for reverberated and noisy speech. ICASSP 2013: 6935-6939 - [c37]Shinji Watanabe, John R. Hershey:
Stereo-based feature enhancement using dictionary learning. ICASSP 2013: 7073-7077 - [c36]Koichiro Yoshino, Shinji Watanabe, Jonathan Le Roux, John R. Hershey:
Statistical Dialogue Management using Intention Dependency Graph. IJCNLP 2013: 962-966 - [c35]Jonathan Le Roux, Shinji Watanabe, John R. Hershey:
Ensemble learning for speech enhancement. WASPAA 2013: 1-4 - [c34]Umut Simsekli, Jonathan Le Roux, John R. Hershey:
Hierarchical and coupled non-negative dynamical systems with application to audio modeling. WASPAA 2013: 1-4 - 2012
- [j5]Xiaodong Cui, Jian Xue, Xin Chen, Peder A. Olsen, Pierre L. Dognin, Upendra V. Chaudhari, John R. Hershey, Bowen Zhou:
Hidden Markov Acoustic Modeling With Bootstrap and Restructuring for Low-Resourced Languages. IEEE Trans. Speech Audio Process. 20(8): 2252-2264 (2012) - [c33]Jonathan Le Roux, John R. Hershey:
Indirect model-based speech enhancement. ICASSP 2012: 4045-4048 - [p1]John R. Hershey, Steven J. Rennie, Jonathan Le Roux:
Factorial Models for Noise Robust Speech Recognition. Techniques for Noise Robustness in Automatic Speech Recognition 2012: 311-345 - 2011
- [c32]Xin Chen, Xiaodong Cui, Jian Xue, Peder A. Olsen, John R. Hershey, Bowen Zhou, Yunxin Zhao:
Clustering of bootstrapped acoustic model with full covariance. ICASSP 2011: 4496-4499 - [c31]Xiaodong Cui, Xin Chen, Jian Xue, Peder A. Olsen, John R. Hershey, Bowen Zhou:
Acoustic Modeling with Bootstrap and Restructuring Based on Full Covariance. INTERSPEECH 2011: 1697-1700 - [c30]Yuichi Taguchi, Tim K. Marks, John R. Hershey:
Entropy-based motion selection for touch-based registration using Rao-Blackwellized particle filtering. IROS 2011: 4690-4697 - 2010
- [j4]Martin Cooke, John R. Hershey, Steven J. Rennie:
Monaural speech separation and recognition challenge. Comput. Speech Lang. 24(1): 1-15 (2010) - [j3]John R. Hershey, Steven J. Rennie, Peder A. Olsen, Trausti T. Kristjansson:
Super-human multi-talker speech recognition: A graphical modeling approach. Comput. Speech Lang. 24(1): 45-66 (2010) - [j2]Tim K. Marks, John R. Hershey, Javier R. Movellan:
Tracking Motion, Deformation, and Texture Using Conditionally Gaussian Processes. IEEE Trans. Pattern Anal. Mach. Intell. 32(2): 348-363 (2010) - [j1]Steven J. Rennie, John R. Hershey, Peder A. Olsen:
Single-Channel Multitalker Speech Recognition. IEEE Signal Process. Mag. 27(6): 66-80 (2010) - [c29]Pierre L. Dognin, John R. Hershey, Vaibhava Goel, Peder A. Olsen:
Restructuring exponential family mixture models. INTERSPEECH 2010: 62-65 - [c28]John R. Hershey, Peder A. Olsen, Steven J. Rennie:
Signal interaction and the devil function. INTERSPEECH 2010: 334-337 - [c27]Peder A. Olsen, Vaibhava Goel, Charles A. Micchelli, John R. Hershey:
Modeling posterior probabilities using the linear exponential family. INTERSPEECH 2010: 2994-2997
2000 – 2009
- 2009
- [c26]Steven J. Rennie, John R. Hershey, Peder A. Olsen:
Hierarchical variational loopy belief propagation for multi-talker speech recognition. ASRU 2009: 176-181 - [c25]Pierre L. Dognin, Vaibhava Goel, John R. Hershey, Peder A. Olsen:
A fast, accurate approximation to log likelihood of Gaussian mixture models. ICASSP 2009: 3817-3820 - [c24]Steven J. Rennie, John R. Hershey, Peder A. Olsen:
Single-channel speech separation and recognition using loopy belief propagation. ICASSP 2009: 3845-3848 - [c23]Pierre L. Dognin, John R. Hershey, Vaibhava Goel, Peder A. Olsen:
Refactoring acoustic models using variational density approximation. ICASSP 2009: 4473-4476 - [c22]Pierre L. Dognin, John R. Hershey, Vaibhava Goel, Peder A. Olsen:
Refactoring acoustic models using variational expectation-maximization. INTERSPEECH 2009: 212-215 - [c21]Steven J. Rennie, John R. Hershey, Peder A. Olsen:
Variational loopy belief propagation for multi-talker speech recognition. INTERSPEECH 2009: 1331-1334 - 2008
- [c20]Steven J. Rennie, John R. Hershey, Peder A. Olsen:
Efficient model-based speech separation and denoising using non-negative subspace analysis. ICASSP 2008: 1833-1836 - [c19]Jia-Yu Chen, John R. Hershey, Peder A. Olsen, Emmanuel Yashchin:
Accelerated Monte Carlo for Kullback-Leibler divergence between Gaussian mixture models. ICASSP 2008: 4553-4556 - [c18]John R. Hershey, Peder A. Olsen:
Variational Bhattacharyya divergence for hidden Markov models. ICASSP 2008: 4557-4560 - [c17]Binit Mohanty, John R. Hershey, Peder A. Olsen, Suleyman Serdar Kozat, Vaibhava Goel:
Optimizing speech recognition grammars using a measure of similarity between hidden Markov models. ICASSP 2008: 4953-4956 - 2007
- [c16]John R. Hershey, Peder A. Olsen, Steven J. Rennie:
Variational Kullback-Leibler divergence for Hidden Markov models. ASRU 2007: 323-328 - [c15]John R. Hershey, Peder A. Olsen:
Approximating the Kullback Leibler Divergence Between Gaussian Mixture Models. ICASSP (4) 2007: 317-320 - [c14]Peder A. Olsen, John R. Hershey:
Bhattacharyya error and divergence using variational importance sampling. INTERSPEECH 2007: 46-49 - [c13]Jia-Yu Chen, Peder A. Olsen, John R. Hershey:
Word confusability - measuring hidden Markov model similarity. INTERSPEECH 2007: 2089-2092 - 2006
- [c12]Trausti T. Kristjansson, John R. Hershey, Peder A. Olsen, Steven J. Rennie, Ramesh A. Gopinath:
Super-human multi-talker speech recognition: the IBM 2006 speech separation challenge system. INTERSPEECH 2006 - [c11]Steven J. Rennie, Peder A. Olsen, John R. Hershey, Trausti T. Kristjansson:
The Iroquois model: using temporal dynamics to separate speakers. SAPA@INTERSPEECH 2006: 24-30 - [c10]John R. Hershey, Trausti T. Kristjansson, Steven J. Rennie, Peder A. Olsen:
Single Channel Speech Separation Using Factorial Dynamics. NIPS 2006: 593-600 - 2005
- [b1]John R. Hershey:
Perceptual inference in generative models. University of California, San Diego, USA, 2005 - 2004
- [c9]Tim K. Marks, John R. Hershey, J. Cooper Roddey, Javier R. Movellan:
3D Tracking of Morphable Objects Using Conditionally Gaussian Nonlinear Filters. CVPR Workshops 2004: 190 - [c8]Trausti T. Kristjansson, Hagai Attias, John R. Hershey:
Stereo Based 3D Tracking and Scene Learning, Employing Particle Filtering within EM. ECCV (4) 2004: 546-559 - [c7]John R. Hershey, Hagai Attias, Nebojsa Jojic, Trausti T. Kristjansson:
Audio-visual graphical models for speech processing. ICASSP (5) 2004: 649-652 - [c6]Trausti T. Kristjansson, Hagai Attias, John R. Hershey:
Single microphone source separation using high resolution signal reconstruction. ICASSP (2) 2004: 817-820 - [c5]John R. Hershey, Trausti T. Kristjansson, Zhengyou Zhang:
Model-based fusion of bone and air sensors for speech enhancement and robust speech recognition. SAPA@INTERSPEECH 2004: 139 - [c4]Tim K. Marks, John R. Hershey, J. Cooper Roddey, Javier R. Movellan:
Joint Tracking of Pose, Expression, and Texture using Conditionally Gaussian Filters. NIPS 2004: 889-896 - 2001
- [c3]John R. Hershey, Michael Casey:
Audio-Visual Sound Separation Via Hidden Markov Models. NIPS 2001: 1173-1180 - 2000
- [c2]Irina F. Gorodnitsky, John R. Hershey:
A Low-Level Cortical Perception Model with Applications to Image Analysis. ICIP 2000: 308-311
1990 – 1999
- 1999
- [c1]John R. Hershey, Javier R. Movellan:
Audio Vision: Using Audio-Visual Synchrony to Locate Sounds. NIPS 1999: 813-819
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-10-18 20:30 CEST by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint