default search action

combined dblp search
author search
venue search
publication search

ask others

Puyuan Peng

> Home > Persons

Person information

Refine list

refinements active!

zoomed in on ?? of ?? records

view refined list in

export refined list as

showing all ?? records

2020 – today

see FAQ

What is the meaning of the colors in the publication lists?

2024
[c13]
- view
  authority control:
- export record
  dblp key:
  - conf/acl/Peng00MH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/acl/Peng00MH24
Puyuan Peng, Po-Yao Huang, Shang-Wen Li, Abdelrahman Mohamed, David Harwath:
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild. ACL (1) 2024: 12442-12462
[c12]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/WangSCBPLWH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/WangSCBPLWH24
Hsuan-Fu Wang, Yi-Jen Shih, Heng-Jui Chang, Layne Berry, Puyuan Peng, Hung-Yi Lee, Hsin-Min Wang, David Harwath:
SpeechCLIP+: Self-Supervised Multi-Task Representation Learning for Speech Via Clip and Speech-Image Data. ICASSP Workshops 2024: 465-469
[c11]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/FangYSPWBLH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/FangYSPWBLH24
Hung-Chieh Fang, Nai-Xuan Ye, Yi-Jen Shih, Puyuan Peng, Hsuan-Fu Wang, Layne Berry, Hung-Yi Lee, David Harwath:
Integrating Self-Supervised Speech Model with Pseudo Word-Level Targets from Visually-Grounded Speech Model. ICASSP Workshops 2024: 645-649
[c10]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/TsengBCCLLPSWW024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/TsengBCCLLPSWW024
Yuan Tseng, Layne Berry, Yiting Chen, I-Hsiang Chiu, Hsuan-Hao Lin, Max Liu, Puyuan Peng, Yi-Jen Shih, Hung-Yu Wang, Haibin Wu, Poyao Huang, Chun-Mao Lai, Shang-Wen Li, David Harwath, Yu Tsao, Abdelrahman Mohamed, Chi-Luen Feng, Hung-Yi Lee:
AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models. ICASSP 2024: 6890-6894
[c9]
- view
  - electronic edition @ openreview.net (open access)
  - details & citations
- export record
  dblp key:
  - conf/icml/ZhengPM0CH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icml/ZhengPM0CH24
Zhisheng Zheng, Puyuan Peng, Ziyang Ma, Xie Chen, Eunsol Choi, David Harwath:
BAT: Learning to Reason about Spatial Sounds with Large Language Models. ICML 2024
[i16]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2402-01591
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2402-01591
Zhisheng Zheng, Puyuan Peng, Ziyang Ma, Xie Chen, Eunsol Choi, David Harwath:
BAT: Learning to Reason about Spatial Sounds with Large Language Models. CoRR abs/2402.01591 (2024)
[i15]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2402-05819
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2402-05819
Hung-Chieh Fang, Nai-Xuan Ye, Yi-Jen Shih, Puyuan Peng, Hsuan-Fu Wang, Layne Berry, Hung-yi Lee, David Harwath:
Integrating Self-supervised Speech Model with Pseudo Word-level Targets from Visually-grounded Speech Model. CoRR abs/2402.05819 (2024)
[i14]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2402-06959
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2402-06959
Hsuan-Fu Wang, Yi-Jen Shih, Heng-Jui Chang, Layne Berry, Puyuan Peng, Hung-yi Lee, Hsin-Min Wang, David Harwath:
SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data. CoRR abs/2402.06959 (2024)
[i13]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2403-16973
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2403-16973
Puyuan Peng, Po-Yao Huang, Daniel Li, Abdelrahman Mohamed, David Harwath:
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild. CoRR abs/2403.16973 (2024)
[i12]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2406-09272
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2406-09272
Changan Chen, Puyuan Peng, Ami Baid, Zihui Xue, Wei-Ning Hsu, David Harwath, Kristen Grauman:
Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos. CoRR abs/2406.09272 (2024)
2023
[c8]
- view
  authority control:
- export record
  dblp key:
  - conf/asru/LaiSPKGCCBCHZLG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/asru/LaiSPKGCCBCHZLG23
Cheng-I Jeff Lai, Freda Shi, Puyuan Peng, Yoon Kim, Kevin Gimpel, Shiyu Chang, Yung-Sung Chuang, Saurabhchand Bhati, David D. Cox, David Harwath, Yang Zhang, Karen Livescu, James R. Glass:
Audio-Visual Neural Syntax Acquisition. ASRU 2023: 1-8
[c7]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Peng0RMH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Peng0RMH23
Puyuan Peng, Shang-Wen Li, Okko Räsänen, Abdelrahman Mohamed, David Harwath:
Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model. INTERSPEECH 2023: 391-395
[c6]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PengY0H23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PengY0H23
Puyuan Peng, Brian Yan, Shinji Watanabe, David Harwath:
Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization. INTERSPEECH 2023: 396-400
[c5]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HoriPHLOJCJRR23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HoriPHLOJCJRR23
Chiori Hori, Puyuan Peng, David Harwath, Xinyu Liu, Kei Ota, Siddarth Jain, Radu Corcodel, Devesh K. Jha, Diego Romeres, Jonathan Le Roux:
Style-transfer based Speech and Audio-visual Scene understanding for Robot Action Sequence Acquisition from Videos. INTERSPEECH 2023: 4663-4667
[i11]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2305-11095
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2305-11095
Puyuan Peng, Brian Yan, Shinji Watanabe, David Harwath:
Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization. CoRR abs/2305.11095 (2023)
[i10]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2305-11435
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2305-11435
Puyuan Peng, Shang-Wen Li, Okko Räsänen, Abdelrahman Mohamed, David Harwath:
Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Mode. CoRR abs/2305.11435 (2023)
[i9]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2306-15644
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2306-15644
Chiori Hori, Puyuan Peng, David Harwath, Xinyu Liu, Kei Ota, Siddarth Jain, Radu Corcodel, Devesh K. Jha, Diego Romeres, Jonathan Le Roux:
Style-transfer based Speech and Audio-visual Scene Understanding for Robot Action Sequence Acquisition from Videos. CoRR abs/2306.15644 (2023)
[i8]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2309-10787
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2309-10787
Yuan Tseng, Layne Berry, Yi-Ting Chen, I-Hsiang Chiu, Hsuan-Hao Lin, Max Liu, Puyuan Peng, Yi-Jen Shih, Hung-Yu Wang, Haibin Wu, Po-Yao Huang, Chun-Mao Lai, Shang-Wen Li, David Harwath, Yu Tsao, Shinji Watanabe, Abdelrahman Mohamed, Chi-Luen Feng, Hung-yi Lee:
AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models. CoRR abs/2309.10787 (2023)
[i7]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2310-07654
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2310-07654
Cheng-I Jeff Lai, Freda Shi, Puyuan Peng, Yoon Kim, Kevin Gimpel, Shiyu Chang, Yung-Sung Chuang, Saurabhchand Bhati, David D. Cox, David Harwath, Yang Zhang, Karen Livescu, James R. Glass:
Audio-Visual Neural Syntax Acquisition. CoRR abs/2310.07654 (2023)
2022
[c4]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/PengH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/PengH22
Puyuan Peng, David Harwath:
Fast-Slow Transformer for Visually Grounding Speech. ICASSP 2022: 7727-7731
[c3]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BaadePH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BaadePH22
Alan Baade, Puyuan Peng, David Harwath:
MAE-AST: Masked Autoencoding Audio Spectrogram Transformer. INTERSPEECH 2022: 2438-2442
[c2]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PengH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PengH22
Puyuan Peng, David Harwath:
Word Discovery in Visually Grounded, Self-Supervised Speech Models. INTERSPEECH 2022: 2823-2827
[c1]
- view
  - electronic edition @ mlr.press (open access)
  - details & citations
- export record
  dblp key:
  - conf/tl4nlp/DiwanPM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/tl4nlp/DiwanPM22
Anuj Diwan, Puyuan Peng, Raymond J. Mooney:
Zero-shot Video Moment Retrieval With Off-the-Shelf Models. TL4NLP 2022: 10-21
[i6]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-2202-03543
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2202-03543
Puyuan Peng, David Harwath:
Self-Supervised Representation Learning for Speech Using Visual Grounding and Masked Language Modeling. CoRR abs/2202.03543 (2022)
[i5]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2203-15081
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2203-15081
Puyuan Peng, David Harwath:
Word Discovery in Visually Grounded, Self-Supervised Speech Models. CoRR abs/2203.15081 (2022)
[i4]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2203-16691
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2203-16691
Alan Baade, Puyuan Peng, David Harwath:
MAE-AST: Masked Autoencoding Audio Spectrogram Transformer. CoRR abs/2203.16691 (2022)
[i3]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2211-02178
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2211-02178
Anuj Diwan, Puyuan Peng, Raymond J. Mooney:
Zero-shot Video Moment Retrieval With Off-the-Shelf Models. CoRR abs/2211.02178 (2022)
2021
[i2]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-2109-08186
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2109-08186
Puyuan Peng, David Harwath:
Fast-Slow Transformer for Visually Grounding Speech. CoRR abs/2109.08186 (2021)
2020
[i1]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-2012-02221
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2012-02221
Puyuan Peng, Herman Kamper, Karen Livescu:
A Correspondence Variational Autoencoder for Unsupervised Acoustic Word Embeddings. CoRR abs/2012.02221 (2020)

Coauthor Index

see FAQ

manage site settings

To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.