default search action

combined dblp search
author search
venue search
publication search

ask others

Soumi Maiti

> Home > Persons

Person information

Refine list

refinements active!

zoomed in on ?? of ?? records

view refined list in

export refined list as

showing all ?? records

2020 – today

see FAQ

What is the meaning of the colors in the publication lists?

2024
[j1]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/taslp/SaekiMLWTS24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/taslp/SaekiMLWTS24
Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe, Shinnosuke Takamichi, Hiroshi Saruwatari:
Text-Inductive Graphone-Based Language Adaptation for Low-Resource Speech Synthesis. IEEE ACM Trans. Audio Speech Lang. Process. 32: 1829-1844 (2024)
[c22]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/KimCMY0R24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/KimCMY0R24
Minsu Kim, Jeongsoo Choi, Soumi Maiti, Jeong Hun Yeo, Shinji Watanabe, Yong Man Ro:
Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-Training and Multi-Modal Tokens. ICASSP 2024: 7970-7974
[c21]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/ChangYCJLMSST0F24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/ChangYCJLMSST0F24
Xuankai Chang, Brian Yan, Kwanghee Choi, Jee-Weon Jung, Yichen Lu, Soumi Maiti, Roshan S. Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe, Yuya Fujita, Takashi Maekaku, Pengcheng Guo, Yao-Fei Cheng, Pavel Denisov, Kohei Saijo, Hsiu-Hsuan Wang:
Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study. ICASSP 2024: 11481-11485
[c20]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/MaitiPCJC024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/MaitiPCJC024
Soumi Maiti, Yifan Peng, Shukjae Choi, Jee-Weon Jung, Xuankai Chang, Shinji Watanabe:
VoxtLM: Unified Decoder-Only Models for Consolidating Speech Recognition, Synthesis and Speech, Text Continuation Tasks. ICASSP 2024: 13326-13330
[i21]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2401-16812
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2401-16812
Takaaki Saeki, Soumi Maiti, Shinnosuke Takamichi, Shinji Watanabe, Hiroshi Saruwatari:
SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics. CoRR abs/2401.16812 (2024)
[i20]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2401-18045
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2401-18045
Yihan Wu, Soumi Maiti, Yifan Peng, Wangyou Zhang, Chenda Li, Yuyue Wang, Xihua Wang, Shinji Watanabe, Ruihua Song:
SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition. CoRR abs/2401.18045 (2024)
[i19]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2402-16021
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2402-16021
Minsu Kim, Jee-weon Jung, Hyeongseop Rha, Soumi Maiti, Siddhant Arora, Xuankai Chang, Shinji Watanabe, Yong Man Ro:
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages. CoRR abs/2402.16021 (2024)
[i18]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2407-00837
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2407-00837
William Chen, Wangyou Zhang, Yifan Peng, Xinjian Li, Jinchuan Tian, Jiatong Shi, Xuankai Chang, Soumi Maiti, Karen Livescu, Shinji Watanabe:
Towards Robust Speech Representation Learning for Thousands of Languages. CoRR abs/2407.00837 (2024)
2023
[c19]
- view
  authority control:
- export record
  dblp key:
  - conf/acl/YanS0IPDPFBHZNH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/acl/YanS0IPDPFBHZNH23
Brian Yan, Jiatong Shi, Yun Tang, Hirofumi Inaguma, Yifan Peng, Siddharth Dalmia, Peter Polak, Patrick Fernandes, Dan Berrebbi, Tomoki Hayashi, Xiaohui Zhang, Zhaoheng Ni, Moto Hira, Soumi Maiti, Juan Pino, Shinji Watanabe:
ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit. ACL (demo) 2023: 400-411
[c18]
- view
  authority control:
- export record
  dblp key:
  - conf/asru/ChenSYBZPCMW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/asru/ChenSYBZPCMW23
William Chen, Jiatong Shi, Brian Yan, Dan Berrebbi, Wangyou Zhang, Yifan Peng, Xuankai Chang, Soumi Maiti, Shinji Watanabe:
Joint Prediction and Denoising for Large-Scale Multilingual Self-Supervised Learning. ASRU 2023: 1-8
[c17]
- view
  authority control:
- export record
  dblp key:
  - conf/asru/PengTYBCLSACSZSSJMW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/asru/PengTYBCLSACSZSSJMW23
Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan S. Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jee-Weon Jung, Soumi Maiti, Shinji Watanabe:
Reproducing Whisper-Style Training Using An Open-Source Toolkit And Publicly Available Data. ASRU 2023: 1-8
[c16]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/ChenYSPMW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/ChenYSPMW23
William Chen, Brian Yan, Jiatong Shi, Yifan Peng, Soumi Maiti, Shinji Watanabe:
Improving Massively Multilingual ASR with Auxiliary CTC Objectives. ICASSP 2023: 1-5
[c15]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/HuangGMKCLW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/HuangGMKCLW23
Junwei Huang, Karthik Ganesan, Soumi Maiti, Young Min Kim, Xuankai Chang, Paul Liang, Shinji Watanabe:
FindAdaptNet: Find and Insert Adapters by Learned Layer Importance. ICASSP 2023: 1-5
[c14]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/MaitiPSW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/MaitiPSW23
Soumi Maiti, Yifan Peng, Takaaki Saeki, Shinji Watanabe:
Speechlmscore: Evaluating Speech Generation Using Speech Language Model. ICASSP 2023: 1-5
[c13]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/ijcai/SaekiML0TS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/ijcai/SaekiML0TS23
Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe, Shinnosuke Takamichi, Hiroshi Saruwatari:
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining. IJCAI 2023: 5179-5187
[c12]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenCPNM023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenCPNM023
William Chen, Xuankai Chang, Yifan Peng, Zhaoheng Ni, Soumi Maiti, Shinji Watanabe:
Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute. INTERSPEECH 2023: 4404-4408
[c11]
- view
  authority control:
- export record
  dblp key:
  - conf/iwslt/YanSMCLPA023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/iwslt/YanSMCLPA023
Brian Yan, Jiatong Shi, Soumi Maiti, William Chen, Xinjian Li, Yifan Peng, Siddhant Arora, Shinji Watanabe:
CMU's IWSLT 2023 Simultaneous Speech Translation System. IWSLT@ACL 2023: 235-240
[i17]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2301-09099
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2301-09099
Massa Baali, Tomoki Hayashi, Hamdy Mubarak, Soumi Maiti, Shinji Watanabe, Wassim El-Hajj, Ahmed Ali:
Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study. CoRR abs/2301.09099 (2023)
[i16]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2301-12596
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2301-12596
Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe, Shinnosuke Takamichi, Hiroshi Saruwatari:
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining. CoRR abs/2301.12596 (2023)
[i15]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2302-12829
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2302-12829
William Chen, Brian Yan, Jiatong Shi, Yifan Peng, Soumi Maiti, Shinji Watanabe:
Improving Massively Multilingual ASR With Auxiliary CTC Objectives. CoRR abs/2302.12829 (2023)
[i14]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2306-06672
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2306-06672
William Chen, Xuankai Chang, Yifan Peng, Zhaoheng Ni, Soumi Maiti, Shinji Watanabe:
Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute. CoRR abs/2306.06672 (2023)
[i13]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2309-07937
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2309-07937
Soumi Maiti, Yifan Peng, Shukjae Choi, Jee-weon Jung, Xuankai Chang, Shinji Watanabe:
Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks. CoRR abs/2309.07937 (2023)
[i12]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2309-08531
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2309-08531
Minsu Kim, Jeongsoo Choi, Soumi Maiti, Jeong Hun Yeo, Shinji Watanabe, Yong Man Ro:
Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens. CoRR abs/2309.08531 (2023)
[i11]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2309-13876
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2309-13876
Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan S. Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jee-weon Jung, Soumi Maiti, Shinji Watanabe:
Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data. CoRR abs/2309.13876 (2023)
[i10]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2309-15317
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2309-15317
William Chen, Jiatong Shi, Brian Yan, Dan Berrebbi, Wangyou Zhang, Yifan Peng, Xuankai Chang, Soumi Maiti, Shinji Watanabe:
Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning. CoRR abs/2309.15317 (2023)
[i9]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2309-15800
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2309-15800
Xuankai Chang, Brian Yan, Kwanghee Choi, Jee-Weon Jung, Yichen Lu, Soumi Maiti, Roshan S. Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe, Yuya Fujita, Takashi Maekaku, Pengcheng Guo, Yao-Fei Cheng, Pavel Denisov, Kohei Saijo, Hsiu-Hsuan Wang:
Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study. CoRR abs/2309.15800 (2023)
[i8]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2310-00706
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2310-00706
Dareen Alharthi, Roshan Sharma, Hira Dhamyal, Soumi Maiti, Bhiksha Raj, Rita Singh:
Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech. CoRR abs/2310.00706 (2023)
2022
[c10]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JuKYKKM022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JuKYKKM022
Yooncheol Ju, Ilhwan Kim, Hongsun Yang, Ji-Hoon Kim, Byeongyeol Kim, Soumi Maiti, Shinji Watanabe:
TriniTTS: Pitch-controllable End-to-end TTS without External Aligner. INTERSPEECH 2022: 16-20
[c9]
- view
  authority control:
- export record
  dblp key:
  - conf/slt/MaitiUWZYZX22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/slt/MaitiUWZYZX22
Soumi Maiti, Yushi Ueda, Shinji Watanabe, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Yong Xu:
EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers. SLT 2022: 480-487
[i7]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2203-17068
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2203-17068
Yushi Ueda, Soumi Maiti, Shinji Watanabe, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Yong Xu:
EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers. CoRR abs/2203.17068 (2022)
[i6]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2212-04559
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2212-04559
Soumi Maiti, Yifan Peng, Takaaki Saeki, Shinji Watanabe:
SpeechLMScore: Evaluating speech generation using speech language model. CoRR abs/2212.04559 (2022)
2021
[b1]
- view
  - electronic edition @ cuny.edu
  - no references & citations available
- export record
  dblp key:
  - phd/us/Maiti21
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/phd/us/Maiti21
Soumi Maiti:
Speech Enhancement Using Speech Synthesis Techniques. City University of New York, USA, 2021
[c8]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/MaitiEWW0H21
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/MaitiEWW0H21
Soumi Maiti, Hakan Erdogan, Kevin W. Wilson, Scott Wisdom, Shinji Watanabe, John R. Hershey:
End-To-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings. ICASSP 2021: 7183-7187
[i5]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2105-02096
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2105-02096
Soumi Maiti, Hakan Erdogan, Kevin W. Wilson, Scott Wisdom, Shinji Watanabe, John R. Hershey:
End-to-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings. CoRR abs/2105.02096 (2021)
2020
[c7]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/MaitiM20
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/MaitiM20
Soumi Maiti, Michael I. Mandel:
Speaker Independence of Neural Vocoders and Their Effect on Parametric Resynthesis Speech Enhancement. ICASSP 2020: 206-210
[c6]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/MaitiMC20
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/MaitiMC20
Soumi Maiti, Erik Marchi, Alistair Conkie:
Generating Multilingual Voices Using Speaker Space Translation Based on Bilingual Speaker Data. ICASSP 2020: 7624-7628
[i4]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2004-04972
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2004-04972
Soumi Maiti, Erik Marchi, Alistair Conkie:
Generating Multilingual Voices Using Speaker Space Translation Based on Bilingual Speaker Data. CoRR abs/2004.04972 (2020)

2010 – 2019

see FAQ

What is the meaning of the colors in the publication lists?

2019
[c5]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/MaitiM19
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/MaitiM19
Soumi Maiti, Michael I. Mandel:
Speech Denoising by Parametric Resynthesis. ICASSP 2019: 6995-6999
[c4]
- view
  authority control:
- export record
  dblp key:
  - conf/waspaa/MaitiM19
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/waspaa/MaitiM19
Soumi Maiti, Michael I. Mandel:
Parametric Resynthesis With Neural Vocoders. WASPAA 2019: 303-307
[i3]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-1904-01537
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-1904-01537
Soumi Maiti, Michael I. Mandel:
Speech denoising by parametric resynthesis. CoRR abs/1904.01537 (2019)
[i2]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-1906-06762
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-1906-06762
Soumi Maiti, Michael I. Mandel:
Parametric Resynthesis with neural vocoders. CoRR abs/1906.06762 (2019)
[i1]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-1911-06266
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-1911-06266
Soumi Maiti, Michael I. Mandel:
Speaker independence of neural vocoders and their effect on parametric resynthesis speech enhancement. CoRR abs/1911.06266 (2019)
2018
[c3]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MaitiCM18
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MaitiCM18
Soumi Maiti, Joey Ching, Michael I. Mandel:
Large Vocabulary Concatenative Resynthesis. INTERSPEECH 2018: 1190-1194
2017
[c2]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MaitiM17
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MaitiM17
Soumi Maiti, Michael I. Mandel:
Concatenative Resynthesis Using Twin Networks. INTERSPEECH 2017: 3647-3651
[c1]
- view
  authority control:
- export record
  dblp key:
  - conf/iwsds/StoyanchevMB17
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/iwsds/StoyanchevMB17
Svetlana Stoyanchev, Soumi Maiti, Srinivas Bangalore:
Predicting Interaction Quality in Customer Service Dialogs. IWSDS 2017: 149-159

Coauthor Index

see FAQ

manage site settings

To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.