default search action
Soumi Maiti
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [j1]Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe, Shinnosuke Takamichi, Hiroshi Saruwatari:
Text-Inductive Graphone-Based Language Adaptation for Low-Resource Speech Synthesis. IEEE ACM Trans. Audio Speech Lang. Process. 32: 1829-1844 (2024) - [c22]Minsu Kim, Jeongsoo Choi, Soumi Maiti, Jeong Hun Yeo, Shinji Watanabe, Yong Man Ro:
Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-Training and Multi-Modal Tokens. ICASSP 2024: 7970-7974 - [c21]Xuankai Chang, Brian Yan, Kwanghee Choi, Jee-Weon Jung, Yichen Lu, Soumi Maiti, Roshan S. Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe, Yuya Fujita, Takashi Maekaku, Pengcheng Guo, Yao-Fei Cheng, Pavel Denisov, Kohei Saijo, Hsiu-Hsuan Wang:
Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study. ICASSP 2024: 11481-11485 - [c20]Soumi Maiti, Yifan Peng, Shukjae Choi, Jee-Weon Jung, Xuankai Chang, Shinji Watanabe:
VoxtLM: Unified Decoder-Only Models for Consolidating Speech Recognition, Synthesis and Speech, Text Continuation Tasks. ICASSP 2024: 13326-13330 - [i21]Takaaki Saeki, Soumi Maiti, Shinnosuke Takamichi, Shinji Watanabe, Hiroshi Saruwatari:
SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics. CoRR abs/2401.16812 (2024) - [i20]Yihan Wu, Soumi Maiti, Yifan Peng, Wangyou Zhang, Chenda Li, Yuyue Wang, Xihua Wang, Shinji Watanabe, Ruihua Song:
SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition. CoRR abs/2401.18045 (2024) - [i19]Minsu Kim, Jee-weon Jung, Hyeongseop Rha, Soumi Maiti, Siddhant Arora, Xuankai Chang, Shinji Watanabe, Yong Man Ro:
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages. CoRR abs/2402.16021 (2024) - [i18]William Chen, Wangyou Zhang, Yifan Peng, Xinjian Li, Jinchuan Tian, Jiatong Shi, Xuankai Chang, Soumi Maiti, Karen Livescu, Shinji Watanabe:
Towards Robust Speech Representation Learning for Thousands of Languages. CoRR abs/2407.00837 (2024) - 2023
- [c19]Brian Yan, Jiatong Shi, Yun Tang, Hirofumi Inaguma, Yifan Peng, Siddharth Dalmia, Peter Polak, Patrick Fernandes, Dan Berrebbi, Tomoki Hayashi, Xiaohui Zhang, Zhaoheng Ni, Moto Hira, Soumi Maiti, Juan Pino, Shinji Watanabe:
ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit. ACL (demo) 2023: 400-411 - [c18]William Chen, Jiatong Shi, Brian Yan, Dan Berrebbi, Wangyou Zhang, Yifan Peng, Xuankai Chang, Soumi Maiti, Shinji Watanabe:
Joint Prediction and Denoising for Large-Scale Multilingual Self-Supervised Learning. ASRU 2023: 1-8 - [c17]Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan S. Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jee-Weon Jung, Soumi Maiti, Shinji Watanabe:
Reproducing Whisper-Style Training Using An Open-Source Toolkit And Publicly Available Data. ASRU 2023: 1-8 - [c16]William Chen, Brian Yan, Jiatong Shi, Yifan Peng, Soumi Maiti, Shinji Watanabe:
Improving Massively Multilingual ASR with Auxiliary CTC Objectives. ICASSP 2023: 1-5 - [c15]Junwei Huang, Karthik Ganesan, Soumi Maiti, Young Min Kim, Xuankai Chang, Paul Liang, Shinji Watanabe:
FindAdaptNet: Find and Insert Adapters by Learned Layer Importance. ICASSP 2023: 1-5 - [c14]Soumi Maiti, Yifan Peng, Takaaki Saeki, Shinji Watanabe:
Speechlmscore: Evaluating Speech Generation Using Speech Language Model. ICASSP 2023: 1-5 - [c13]Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe, Shinnosuke Takamichi, Hiroshi Saruwatari:
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining. IJCAI 2023: 5179-5187 - [c12]William Chen, Xuankai Chang, Yifan Peng, Zhaoheng Ni, Soumi Maiti, Shinji Watanabe:
Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute. INTERSPEECH 2023: 4404-4408 - [c11]Brian Yan, Jiatong Shi, Soumi Maiti, William Chen, Xinjian Li, Yifan Peng, Siddhant Arora, Shinji Watanabe:
CMU's IWSLT 2023 Simultaneous Speech Translation System. IWSLT@ACL 2023: 235-240 - [i17]Massa Baali, Tomoki Hayashi, Hamdy Mubarak, Soumi Maiti, Shinji Watanabe, Wassim El-Hajj, Ahmed Ali:
Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study. CoRR abs/2301.09099 (2023) - [i16]Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe, Shinnosuke Takamichi, Hiroshi Saruwatari:
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining. CoRR abs/2301.12596 (2023) - [i15]William Chen, Brian Yan, Jiatong Shi, Yifan Peng, Soumi Maiti, Shinji Watanabe:
Improving Massively Multilingual ASR With Auxiliary CTC Objectives. CoRR abs/2302.12829 (2023) - [i14]William Chen, Xuankai Chang, Yifan Peng, Zhaoheng Ni, Soumi Maiti, Shinji Watanabe:
Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute. CoRR abs/2306.06672 (2023) - [i13]Soumi Maiti, Yifan Peng, Shukjae Choi, Jee-weon Jung, Xuankai Chang, Shinji Watanabe:
Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks. CoRR abs/2309.07937 (2023) - [i12]Minsu Kim, Jeongsoo Choi, Soumi Maiti, Jeong Hun Yeo, Shinji Watanabe, Yong Man Ro:
Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens. CoRR abs/2309.08531 (2023) - [i11]Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan S. Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jee-weon Jung, Soumi Maiti, Shinji Watanabe:
Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data. CoRR abs/2309.13876 (2023) - [i10]William Chen, Jiatong Shi, Brian Yan, Dan Berrebbi, Wangyou Zhang, Yifan Peng, Xuankai Chang, Soumi Maiti, Shinji Watanabe:
Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning. CoRR abs/2309.15317 (2023) - [i9]Xuankai Chang, Brian Yan, Kwanghee Choi, Jee-Weon Jung, Yichen Lu, Soumi Maiti, Roshan S. Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe, Yuya Fujita, Takashi Maekaku, Pengcheng Guo, Yao-Fei Cheng, Pavel Denisov, Kohei Saijo, Hsiu-Hsuan Wang:
Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study. CoRR abs/2309.15800 (2023) - [i8]Dareen Alharthi, Roshan Sharma, Hira Dhamyal, Soumi Maiti, Bhiksha Raj, Rita Singh:
Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech. CoRR abs/2310.00706 (2023) - 2022
- [c10]Yooncheol Ju, Ilhwan Kim, Hongsun Yang, Ji-Hoon Kim, Byeongyeol Kim, Soumi Maiti, Shinji Watanabe:
TriniTTS: Pitch-controllable End-to-end TTS without External Aligner. INTERSPEECH 2022: 16-20 - [c9]Soumi Maiti, Yushi Ueda, Shinji Watanabe, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Yong Xu:
EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers. SLT 2022: 480-487 - [i7]Yushi Ueda, Soumi Maiti, Shinji Watanabe, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Yong Xu:
EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers. CoRR abs/2203.17068 (2022) - [i6]Soumi Maiti, Yifan Peng, Takaaki Saeki, Shinji Watanabe:
SpeechLMScore: Evaluating speech generation using speech language model. CoRR abs/2212.04559 (2022) - 2021
- [b1]Soumi Maiti:
Speech Enhancement Using Speech Synthesis Techniques. City University of New York, USA, 2021 - [c8]Soumi Maiti, Hakan Erdogan, Kevin W. Wilson, Scott Wisdom, Shinji Watanabe, John R. Hershey:
End-To-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings. ICASSP 2021: 7183-7187 - [i5]Soumi Maiti, Hakan Erdogan, Kevin W. Wilson, Scott Wisdom, Shinji Watanabe, John R. Hershey:
End-to-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings. CoRR abs/2105.02096 (2021) - 2020
- [c7]Soumi Maiti, Michael I. Mandel:
Speaker Independence of Neural Vocoders and Their Effect on Parametric Resynthesis Speech Enhancement. ICASSP 2020: 206-210 - [c6]Soumi Maiti, Erik Marchi, Alistair Conkie:
Generating Multilingual Voices Using Speaker Space Translation Based on Bilingual Speaker Data. ICASSP 2020: 7624-7628 - [i4]Soumi Maiti, Erik Marchi, Alistair Conkie:
Generating Multilingual Voices Using Speaker Space Translation Based on Bilingual Speaker Data. CoRR abs/2004.04972 (2020)
2010 – 2019
- 2019
- [c5]Soumi Maiti, Michael I. Mandel:
Speech Denoising by Parametric Resynthesis. ICASSP 2019: 6995-6999 - [c4]Soumi Maiti, Michael I. Mandel:
Parametric Resynthesis With Neural Vocoders. WASPAA 2019: 303-307 - [i3]Soumi Maiti, Michael I. Mandel:
Speech denoising by parametric resynthesis. CoRR abs/1904.01537 (2019) - [i2]Soumi Maiti, Michael I. Mandel:
Parametric Resynthesis with neural vocoders. CoRR abs/1906.06762 (2019) - [i1]Soumi Maiti, Michael I. Mandel:
Speaker independence of neural vocoders and their effect on parametric resynthesis speech enhancement. CoRR abs/1911.06266 (2019) - 2018
- [c3]Soumi Maiti, Joey Ching, Michael I. Mandel:
Large Vocabulary Concatenative Resynthesis. INTERSPEECH 2018: 1190-1194 - 2017
- [c2]Soumi Maiti, Michael I. Mandel:
Concatenative Resynthesis Using Twin Networks. INTERSPEECH 2017: 3647-3651 - [c1]Svetlana Stoyanchev, Soumi Maiti, Srinivas Bangalore:
Predicting Interaction Quality in Customer Service Dialogs. IWSDS 2017: 149-159
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-08-11 01:04 CEST by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint