


default search action
Hiroshi Sato 0002
Person information
- affiliation: NTT Corporation, NTT Media Intelligence Laboratories, Japan
Other persons with the same name
- Hiroshi Sato — disambiguation page
- Hiroshi Sato 0001
(aka: Hiroshi Satoh 0001) — National Defense Academy, Department of Computer Science, Yokosuka, Japan
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2026
[j3]Naoyuki Kamo, Naohiro Tawara, Atsushi Ando, Takatomo Kano, Hiroshi Sato, Rintaro Ikeshita, Takafumi Moriya, Shota Horiguchi, Kohei Matsuura, Atsunori Ogawa, Alexis Plaquet
, Takanori Ashihara, Tsubasa Ochiai, Masato Mimura, Marc Delcroix, Tomohiro Nakatani, Taichi Asami, Shoko Araki:
Microphone array geometry-independent multi-talker distant ASR: NTT system for DASR task of the CHiME-8 challenge. Comput. Speech Lang. 95: 101820 (2026)- 2025
[c35]Shota Horiguchi, Takafumi Moriya, Atsushi Ando, Takanori Ashihara, Hiroshi Sato, Naohiro Tawara, Marc Delcroix:
Guided Speaker Embedding. ICASSP 2025: 1-5
[c34]Takafumi Moriya, Shota Horiguchi, Marc Delcroix, Ryo Masumura, Takanori Ashihara, Hiroshi Sato, Kohei Matsuura, Masato Mimura:
Alignment-Free Training for Transducer-based Multi-Talker ASR. ICASSP 2025: 1-5
[c33]Takafumi Moriya, Masato Mimura, Kiyoaki Matsui, Hiroshi Sato, Kohei Matsuura:
Attention-Free Dual-Mode ASR with Latency-Controlled Selective State Spaces. INTERSPEECH 2025
[c32]Keigo Wakayama, Tomoko Kawase, Takafumi Moriya, Marc Delcroix, Hiroshi Sato, Tsubasa Ochiai, Masahiro Yasuda, Shoko Araki:
Real-time TSE demonstration via SoundBeam with KD. INTERSPEECH 2025
[i19]Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takafumi Moriya, Takanori Ashihara, Ryo Masumura:
Generic Speech Enhancement with Self-Supervised Representation Space Loss. CoRR abs/2507.07631 (2025)- 2024
[j2]Tsubasa Ochiai
, Kazuma Iwamoto, Marc Delcroix
, Rintaro Ikeshita
, Hiroshi Sato, Shoko Araki
, Shigeru Katagiri
:
Rethinking Processing Distortions: Disentangling the Impact of Speech Enhancement Errors on Speech Recognition Performance. IEEE ACM Trans. Audio Speech Lang. Process. 32: 3589-3602 (2024)
[c31]Kazuma Iwamoto, Tsubasa Ochiai, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri:
How Does End-To-End Speech Recognition Training Impact Speech Enhancement Artifacts? ICASSP 2024: 11031-11035
[c30]Kenichi Fujita, Hiroshi Sato, Takanori Ashihara, Hiroki Kanagawa, Marc Delcroix, Takafumi Moriya, Yusuke Ijima:
Noise-Robust Zero-Shot Text-to-Speech Synthesis Conditioned on Self-Supervised Speech-Representation Model with Adapters. ICASSP 2024: 11471-11475
[c29]Takafumi Moriya, Takanori Ashihara, Masato Mimura, Hiroshi Sato, Kohei Matsuura, Ryo Masumura, Taichi Asami:
Boosting Hybrid Autoregressive Transducer-based ASR with Internal Acoustic Model Training and Dual Blank Thresholding. INTERSPEECH 2024
[c28]Hiroshi Sato, Takafumi Moriya, Masato Mimura, Shota Horiguchi, Tsubasa Ochiai, Takanori Ashihara, Atsushi Ando, Kentaro Shinayama, Marc Delcroix:
SpeakerBeam-SS: Real-time Target Speaker Extraction with Lightweight Conv-TasNet and State Space Modeling. INTERSPEECH 2024
[c27]Takanori Ashihara, Takafumi Moriya, Shota Horiguchi, Junyi Peng
, Tsubasa Ochiai, Marc Delcroix, Kohei Matsuura, Hiroshi Sato:
Investigation of Speaker Representation for Target-Speaker Speech Processing. SLT 2024: 423-430
[c26]Shota Horiguchi, Atsushi Ando, Takafumi Moriya, Takanori Ashihara, Hiroshi Sato, Naohiro Tawara, Marc Delcroix:
Recursive Attentive Pooling For Extracting Speaker Embeddings From Multi-Speaker Recordings. SLT 2024: 1201-1208
[i18]Kenichi Fujita, Hiroshi Sato, Takanori Ashihara, Hiroki Kanagawa, Marc Delcroix, Takafumi Moriya, Yusuke Ijima:
Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters. CoRR abs/2401.05111 (2024)
[i17]Tsubasa Ochiai, Kazuma Iwamoto, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri:
Rethinking Processing Distortions: Disentangling the Impact of Speech Enhancement Errors on Speech Recognition Performance. CoRR abs/2404.14860 (2024)
[i16]Hiroshi Sato, Takafumi Moriya, Masato Mimura, Shota Horiguchi, Tsubasa Ochiai, Takanori Ashihara, Atsushi Ando, Kentaro Shinayama, Marc Delcroix:
SpeakerBeam-SS: Real-time Target Speaker Extraction with Lightweight Conv-TasNet and State Space Modeling. CoRR abs/2407.01857 (2024)
[i15]Shota Horiguchi, Atsushi Ando, Takafumi Moriya, Takanori Ashihara, Hiroshi Sato, Naohiro Tawara, Marc Delcroix:
Recursive Attentive Pooling for Extracting Speaker Embeddings from Multi-Speaker Recordings. CoRR abs/2408.17142 (2024)
[i14]Takafumi Moriya, Shota Horiguchi, Marc Delcroix, Ryo Masumura, Takanori Ashihara, Hiroshi Sato, Kohei Matsuura, Masato Mimura:
Alignment-Free Training for Transducer-based Multi-Talker ASR. CoRR abs/2409.20301 (2024)
[i13]Takafumi Moriya, Takanori Ashihara, Masato Mimura, Hiroshi Sato, Kohei Matsuura, Ryo Masumura, Taichi Asami:
Boosting Hybrid Autoregressive Transducer-based ASR with Internal Acoustic Model Training and Dual Blank Thresholding. CoRR abs/2409.20313 (2024)
[i12]Takanori Ashihara, Takafumi Moriya, Shota Horiguchi, Junyi Peng, Tsubasa Ochiai, Marc Delcroix, Kohei Matsuura, Hiroshi Sato:
Investigation of Speaker Representation for Target-Speaker Speech Processing. CoRR abs/2410.11243 (2024)
[i11]Shota Horiguchi, Takafumi Moriya, Atsushi Ando, Takanori Ashihara, Hiroshi Sato, Naohiro Tawara, Marc Delcroix:
Guided Speaker Embedding. CoRR abs/2410.12182 (2024)- 2023
[j1]Takafumi Moriya
, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takahiro Shinozaki
:
Streaming End-to-End Target-Speaker Automatic Speech Recognition and Activity Detection. IEEE Access 11: 13906-13917 (2023)
[c25]Takafumi Moriya, Takanori Ashihara, Hiroshi Sato, Kohei Matsuura, Tomohiro Tanaka, Ryo Masumura:
Improving Scheduled Sampling for Neural Transducer-Based ASR. ICASSP 2023: 1-5
[c24]Tomohiro Tanaka, Ryo Masumura, Mana Ihori, Hiroshi Sato, Taiga Yamane, Takanori Ashihara, Kohei Matsuura, Takafumi Moriya:
Leveraging Language Embeddings for Cross-Lingual Self-Supervised Speech Representation Learning. ICASSP 2023: 1-5
[c23]Hiroshi Sato, Ryo Masumura, Tsubasa Ochiai, Marc Delcroix, Takafumi Moriya, Takanori Ashihara, Kentaro Shinayama, Saki Mizuno, Mana Ihori, Tomohiro Tanaka, Nobukatsu Hojo:
Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss. INTERSPEECH 2023: 854-858
[c22]Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takanori Ashihara, Kohei Matsuura, Tomohiro Tanaka, Ryo Masumura, Atsunori Ogawa, Taichi Asami:
Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data. INTERSPEECH 2023: 899-903
[c21]Ryo Masumura, Naoki Makishima, Taiga Yamane, Yoshihiko Yamazaki, Saki Mizuno, Mana Ihori, Mihiro Uchida
, Keita Suzuki, Hiroshi Sato, Tomohiro Tanaka, Akihiko Takashima, Satoshi Suzuki, Takafumi Moriya, Nobukatsu Hojo, Atsushi Ando:
End-to-End Joint Target and Non-Target Speakers ASR. INTERSPEECH 2023: 2903-2907
[i10]Hiroshi Sato, Ryo Masumura, Tsubasa Ochiai, Marc Delcroix, Takafumi Moriya, Takanori Ashihara, Kentaro Shinayama, Saki Mizuno, Mana Ihori, Tomohiro Tanaka, Nobukatsu Hojo:
Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss. CoRR abs/2305.14723 (2023)
[i9]Ryo Masumura, Naoki Makishima, Taiga Yamane, Yoshihiko Yamazaki, Saki Mizuno, Mana Ihori, Mihiro Uchida, Keita Suzuki, Hiroshi Sato, Tomohiro Tanaka, Akihiko Takashima, Satoshi Suzuki, Takafumi Moriya, Nobukatsu Hojo, Atsushi Ando:
End-to-End Joint Target and Non-Target Speakers ASR. CoRR abs/2306.02273 (2023)- 2022
[c20]Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita
, Naoyuki Kamo, Takafumi Moriya:
Learning to Enhance or Not: Neural Network-Based Switching of Enhanced and Observed Signals for Overlapping Speech Recognition. ICASSP 2022: 6287-6291
[c19]Takafumi Moriya, Takanori Ashihara, Atsushi Ando, Hiroshi Sato, Tomohiro Tanaka, Kohei Matsuura, Ryo Masumura, Marc Delcroix, Takahiro Shinozaki:
Hybrid RNN-T/Attention-Based Streaming ASR with Triggered Chunkwise Attention and Dual Internal Language Model Integration. ICASSP 2022: 8282-8286
[c18]Atsushi Ando, Yumiko Murata, Ryo Masumura, Satoshi Suzuki, Naoki Makishima, Takafumi Moriya, Takanori Ashihara, Hiroshi Sato:
Customer Satisfaction Estimation Using Unsupervised Representation Learning with Multi-Format Prediction Loss. ICASSP 2022: 8497-8501
[c17]Marc Delcroix, Keisuke Kinoshita
, Tsubasa Ochiai, Katerina Zmolíková, Hiroshi Sato, Tomohiro Nakatani:
Listen only to me! How well can target speech extraction handle false alarms? INTERSPEECH 2022: 216-220
[c16]Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita
, Takafumi Moriya, Naoki Makishima, Mana Ihori, Tomohiro Tanaka, Ryo Masumura:
Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations. INTERSPEECH 2022: 996-1000
[c15]Tomohiro Tanaka, Ryo Masumura, Hiroshi Sato, Mana Ihori, Kohei Matsuura, Takanori Ashihara, Takafumi Moriya:
Domain Adversarial Self-Supervised Speech Representation Learning for Improving Unknown Domain Downstream Tasks. INTERSPEECH 2022: 1066-1070
[c14]Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takahiro Shinozaki:
Streaming Target-Speaker ASR with Neural Transducer. INTERSPEECH 2022: 2673-2677
[c13]Ryo Masumura, Yoshihiro Yamazaki, Saki Mizuno, Naoki Makishima, Mana Ihori, Mihiro Uchida
, Hiroshi Sato, Tomohiro Tanaka, Akihiko Takashima, Satoshi Suzuki, Shota Orihashi, Takafumi Moriya, Nobukatsu Hojo, Atsushi Ando:
End-to-End Joint Modeling of Conversation History-Dependent and Independent ASR Systems with Multi-History Training. INTERSPEECH 2022: 3218-3222
[c12]Kazuma Iwamoto, Tsubasa Ochiai, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri:
How bad are artifacts?: Analyzing the impact of speech enhancement errors on ASR. INTERSPEECH 2022: 5418-5422
[c11]Atsushi Ando, Ryo Masumura, Akihiko Takashima, Satoshi Suzuki, Naoki Makishima, Keita Suzuki, Takafumi Moriya, Takanori Ashihara, Hiroshi Sato:
On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis. SLT 2022: 739-746
[i8]Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Naoyuki Kamo, Takafumi Moriya:
Learning to Enhance or Not: Neural Network-Based Switching of Enhanced and Observed Signals for Overlapping Speech Recognition. CoRR abs/2201.03881 (2022)
[i7]Kazuma Iwamoto, Tsubasa Ochiai, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri:
How Bad Are Artifacts?: Analyzing the Impact of Speech Enhancement Errors on ASR. CoRR abs/2201.06685 (2022)
[i6]Marc Delcroix, Keisuke Kinoshita
, Tsubasa Ochiai, Katerina Zmolíková, Hiroshi Sato, Tomohiro Nakatani:
Listen only to me! How well can target speech extraction handle false alarms? CoRR abs/2204.04811 (2022)
[i5]Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita
, Takafumi Moriya, Naoki Makishima, Mana Ihori, Tomohiro Tanaka, Ryo Masumura:
Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations. CoRR abs/2206.08174 (2022)
[i4]Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takahiro Shinozaki:
Streaming Target-Speaker ASR with Neural Transducer. CoRR abs/2209.04175 (2022)
[i3]Atsushi Ando, Ryo Masumura, Akihiko Takashima, Satoshi Suzuki, Naoki Makishima, Keita Suzuki, Takafumi Moriya, Takanori Ashihara, Hiroshi Sato:
On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis. CoRR abs/2210.15937 (2022)- 2021
[c10]Takafumi Moriya, Takanori Ashihara, Tomohiro Tanaka, Tsubasa Ochiai, Hiroshi Sato, Atsushi Ando, Yusuke Ijima, Ryo Masumura, Yusuke Shinohara:
Simpleflat: A Simple Whole-Network Pre-Training Approach for RNN Transducer-Based End-to-End Speech Recognition. ICASSP 2021: 5664-5668
[c9]Atsushi Ando, Ryo Masumura, Hiroshi Sato, Takafumi Moriya, Takanori Ashihara, Yusuke Ijima, Tomoki Toda:
Speech Emotion Recognition Based on Listener Adaptive Models. ICASSP 2021: 6274-6278
[c8]Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix
, Keisuke Kinoshita
, Takafumi Moriya, Naoyuki Kamo:
Should We Always Separate?: Switching Between Enhanced and Observed Signals for Overlapping Speech Recognition. Interspeech 2021: 1149-1153
[c7]Takafumi Moriya, Tomohiro Tanaka, Takanori Ashihara, Tsubasa Ochiai, Hiroshi Sato, Atsushi Ando, Ryo Masumura, Marc Delcroix
, Taichi Asami:
Streaming End-to-End Speech Recognition for Hybrid RNN-T/Attention Architecture. Interspeech 2021: 1787-1791
[c6]Hiroshi Sato, Tsubasa Ochiai, Keisuke Kinoshita
, Marc Delcroix
, Tomohiro Nakatani, Shoko Araki:
Multimodal Attention Fusion for Target Speaker Extraction. SLT 2021: 778-784
[i2]Hiroshi Sato, Tsubasa Ochiai, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani, Shoko Araki:
Multimodal Attention Fusion for Target Speaker Extraction. CoRR abs/2102.01326 (2021)
[i1]Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoyuki Kamo:
Should We Always Separate?: Switching Between Enhanced and Observed Signals for Overlapping Speech Recognition. CoRR abs/2106.00949 (2021)- 2020
[c5]Takafumi Moriya, Hiroshi Sato, Tomohiro Tanaka, Takanori Ashihara, Ryo Masumura, Yusuke Shinohara:
Distilling Attention Weights for CTC-Based ASR Systems. ICASSP 2020: 6894-6898
[c4]Takafumi Moriya, Tsubasa Ochiai, Shigeki Karita, Hiroshi Sato, Tomohiro Tanaka, Takanori Ashihara, Ryo Masumura, Yusuke Shinohara, Marc Delcroix
:
Self-Distillation for Improving CTC-Transformer-Based ASR Systems. INTERSPEECH 2020: 546-550
2010 – 2019
- 2019
[c3]Hiroshi Sato, Takafumi Moriya, Yusuke Shinohara, Ryo Masumura, Takaaki Fukutomi, Kiyoaki Matsui, Takanori Ashihara, Yoshikazu Yamaguchi, Yushi Aono:
Revisiting Dynamic Adjustment of Language Model Scaling Factor for Automatic Speech Recognition. APSIPA 2019: 186-191
[c2]Ryo Masumura, Hiroshi Sato, Tomohiro Tanaka, Takafumi Moriya, Yusuke Ijima, Takanobu Oba:
End-to-End Automatic Speech Recognition with a Reconstruction Criterion Using Speech-to-Text and Text-to-Speech Encoder-Decoders. INTERSPEECH 2019: 1606-1610
[c1]Takanori Ashihara, Yusuke Shinohara, Hiroshi Sato, Takafumi Moriya, Kiyoaki Matsui, Takaaki Fukutomi, Yoshikazu Yamaguchi, Yushi Aono:
Neural Whispered Speech Detection with Imbalanced Learning. INTERSPEECH 2019: 3352-3356
Coauthor Index

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from
to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the
of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from
,
, and
to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from
and
to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from
.
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2026-02-13 01:44 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID







