


default search action
Xiaoda Yang
Person information
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2025
[c20]Yuhang Ma, Wenting Xu, Chaoyi Zhao, Keqiang Sun, Qinfeng Jin, Xiaoda Yang, Zeng Zhao, Changjie Fan, Zhipeng Hu:
Storynizor: Consistent Story Generation via Inter-Frame Synchronized and Shuffled ID Injection. AAAI 2025: 6027-6035
[c19]Minghui Fang, Shengpeng Ji, Jialong Zuo, Hai Huang, Yan Xia, Jieming Zhu, Xize Cheng, Xiaoda Yang, Wenrui Liu, Gang Wang, Zhenhua Dong, Zhou Zhao:
CART: A Generative Cross-Modal Retrieval Framework With Coarse-To-Fine Semantic Modeling. ACL (1) 2025: 15120-15133
[c18]Jialong Zuo, Shengpeng Ji, Minghui Fang, Mingze Li, Ziyue Jiang, Xize Cheng, Xiaoda Yang, Feiyang Chen, Xinyu Duan, Zhou Zhao:
Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching. ACL (1) 2025: 16203-16217
[c17]Wenrui Liu, Jionghao Bai, Xize Cheng, Jialong Zuo, Ziyue Jiang, Shengpeng Ji, Minghui Fang, Xiaoda Yang, Qian Yang, Zhou Zhao:
VoxpopuliTTS: a large-scale multilingual TTS corpus for zero-shot speech generation. COLING 2025: 10293-10297
[c16]Jiaqi Duan, Xiaoda Yang, Kaixuan Luan, Hongshun Qiu, Weicai Yan, Xueyi Zhang, Youliang Zhang, Zhaoyang Li, Donglin Huang, Junyu Lu, Ziyue Jiang, Xifeng Yang:
BrainLoc: Brain Signal-Based Object Detection with Multi-modal Alignment. EMNLP (Findings) 2025: 21652-21662
[c15]Dongjie Fu, Xize Cheng, Linjun Li, Xiaoda Yang, Lujia Yang, Tao Jin:
PACHAT: Persona-Aware Speech Assistant for Multi-party Dialogue. EMNLP 2025: 29325-29342
[c14]Xize Cheng, Ruofan Hu, Xiaoda Yang, Jingyu Lu, Dongjie Fu, Zehan Wang, Shengpeng Ji, Rongjie Huang, Boyang Zhang, Tao Jin, Zhou Zhao:
VoxDialogue: Can Spoken Dialogue Systems Understand Information Beyond Words? ICLR 2025
[c13]Shengpeng Ji, Ziyue Jiang, Wen Wang, Yifu Chen, Minghui Fang, Jialong Zuo, Qian Yang, Xize Cheng, Zehan Wang, Ruiqi Li, Ziang Zhang, Xiaoda Yang, Rongjie Huang, Yidi Jiang, Qian Chen, Siqi Zheng, Zhou Zhao:
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling. ICLR 2025
[c12]Weicai Yan, Wang Lin, Zirun Guo, Ye Wang, Fangming Feng, Xiaoda Yang, Zehan Wang, Tao Jin:
Diff-Prompt: Diffusion-Driven Prompt Generator with Mask Supervision. ICLR 2025
[c11]Minghui Fang, Shengpeng Ji, Jialong Zuo, Xize Cheng, Wenrui Liu, Xiaoda Yang, Ruofan Hu, Jieming Zhu, Zhou Zhao:
GTA: Towards Generative Text-To-Audio Retrieval via Multi-Scale Tokenizer. INTERSPEECH 2025
[c10]Ruofan Hu, Yan Xia, Minjie Hong, Jieming Zhu, Bo Chen, Xiaoda Yang, Minghui Fang, Tao Jin:
Vela: Scalable Embeddings with Voice Large Language Models for Multimodal Retrieval. INTERSPEECH 2025
[c9]Kaixuan Luan, Xiaoda Yang, Shile Cai, Ruofan Hu, Minghui Fang, Wenrui Liu, Jialong Zuo, Jiaqi Duan, Yuhang Ma, Junyu Lu:
MelRe: Vision-Based Mel-Spectrogram Restoration. INTERSPEECH 2025
[c8]Xiaoda Yang, Xize Cheng
, Minghui Fang
, Hongshun Qiu
, Yuhang Ma, Junyu Lu, Jiaqi Duan, Sihang Cai
, Zehan Wang, Ruofan Hu
, Dongjie Fu, Zhou Zhao, Tao Jin:
Multimodal Conditional Retrieval with High Controllability. KDD (2) 2025: 3577-3585
[c7]Sijing Li
, Tianwei Lin
, Lingshuai Lin
, Wenqiao Zhang
, Jiang Liu
, Xiaoda Yang
, Juncheng Li
, Yucheng He
, Xiaohui Song
, Jun Xiao
, Yueting Zhuang
, Beng Chin Ooi
:
EyecareGPT: Boosting Comprehensive Ophthalmology Understanding with Tailored Dataset, Benchmark and Model. ACM Multimedia 2025: 3893-3902
[c6]Wenrui Liu
, Qian Chen
, Wen Wang
, Guanrou Yang
, Weiqin Li
, Minghui Fang
, Jialong Zuo
, Xiaoda Yang
, Tao Jin
, Jin Xu
, Zemin Liu
, Yafeng Chen
, Jionghao Bai
, Zhifang Guo
:
Speech Token Prediction via Compressed-to-fine Language Modeling for Speech Generation. ACM Multimedia 2025: 10632-10641
[c5]Xueyi Zhang
, Peiyin Zhu
, Jinping Sui
, Xiaoda Yang
, Jiahe Tian
, Mingrui Lao
, Siqi Cai
, Yanming Guo
, Jun Tang
:
Choose Your Expert: Uncertainty-Guided Expert Selection for Continual Deepfake Detection. ACM Multimedia 2025: 11502-11511
[c4]Minjie Hong
, Yan Xia
, Zehan Wang
, Jieming Zhu
, Ye Wang
, Sihang Cai
, Xiaoda Yang
, Quanyu Dai
, Zhenhua Dong
, Zhimeng Zhang
, Zhou Zhao
:
EAGER-LLM: Enhancing Large Language Models as Recommenders through Exogenous Behavior-Semantic Integration. WWW 2025: 2754-2762
[i15]Xize Cheng, Dongjie Fu, Xiaoda Yang, Minghui Fang, Ruofan Hu, Jingyu Lu, Jionghao Bai, Zehan Wang, Shengpeng Ji, Rongjie Huang, Linjun Li, Yu Chen, Tao Jin, Zhou Zhao:
OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios. CoRR abs/2501.01384 (2025)
[i14]Minjie Hong, Yan Xia, Zehan Wang, Jieming Zhu, Ye Wang, Sihang Cai, Xiaoda Yang, Quanyu Dai, Zhenhua Dong, Zhimeng Zhang, Zhou Zhao:
EAGER-LLM: Enhancing Large Language Models as Recommenders through Exogenous Behavior-Semantic Integration. CoRR abs/2502.14735 (2025)
[i13]Ziyue Jiang, Yi Ren, Ruiqi Li, Shengpeng Ji, Zhenhui Ye, Chen Zhang, Jionghao Bai, Xiaoda Yang, Jialong Zuo, Yu Zhang
, Rui Liu, Xiang Yin, Zhou Zhao:
Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis. CoRR abs/2502.18924 (2025)
[i12]Xiaoda Yang, Junyu Lu, Hongshun Qiu, Sijing Li, Hao Li, Shengpeng Ji, Xudong Tang, Jiayang Xu, Jiaqi Duan
, Ziyue Jiang, Cong Lin, Sihang Cai, Zejian Xie, Zhuoyang Song, Songxin Zhang:
Astrea: A MOE-based Visual Understanding Model with Progressive Alignment. CoRR abs/2503.09445 (2025)
[i11]Xiaoda Yang, Jiayang Xu, Kaixuan Luan, Xinyu Zhan, Hongshun Qiu, Shijun Shi, Hao Li, Shuai Yang, Li Zhang, Checheng Yu, Cewu Lu, Lixin Yang:
OmniCam: Unified Multimodal Video Generation via Camera Control. CoRR abs/2504.02312 (2025)
[i10]Sijing Li, Tianwei Lin, Lingshuai Lin, Wenqiao Zhang, Jiang Liu, Xiaoda Yang, Juncheng Li, Yucheng He, Xiaohui Song, Jun Xiao, Yueting Zhuang, Beng Chin Ooi:
EyecareGPT: Boosting Comprehensive Ophthalmology Understanding with Tailored Dataset, Benchmark and Model. CoRR abs/2504.13650 (2025)
[i9]Weicai Yan, Wang Lin, Zirun Guo, Ye Wang, Fangming Feng, Xiaoda Yang, Zehan Wang, Tao Jin:
Diff-Prompt: Diffusion-Driven Prompt Generator with Mask Supervision. CoRR abs/2504.21423 (2025)
[i8]Jialong Zuo, Shengpeng Ji, Minghui Fang, Mingze Li, Ziyue Jiang, Xize Cheng, Xiaoda Yang, Feiyang Chen, Xinyu Duan, Zhou Zhao:
Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching. CoRR abs/2506.01014 (2025)
[i7]Ruofan Hu, Yan Xia, Minjie Hong, Jieming Zhu, Bo Chen, Xiaoda Yang, Minghui Fang, Tao Jin:
Vela: Scalable Embeddings with Voice Large Language Models for Multimodal Retrieval. CoRR abs/2506.14445 (2025)
[i6]Hao Li, Shuai Yang, Yilun Chen, Yang Tian, Xiaoda Yang, Xinyi Chen, Hanqing Wang, Tai Wang, Feng Zhao, Dahua Lin, Jiangmiao Pang:
CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation. CoRR abs/2506.19816 (2025)
[i5]Donglin Huang, Yongyuan Li, Tianhang Liu, Junming Huang, Xiaoda Yang, Chi Wang, Weiwei Xu:
VividAnimator: An End-to-End Audio and Pose-driven Half-Body Human Animation Framework. CoRR abs/2510.10269 (2025)
[i4]Shijun Shi, Jing Xu, Zhihang Li, Chunli Peng, Xiaoda Yang, Lijing Lu, Kai Hu, Jiangning Zhang:
One-to-All Animation: Alignment-Free Character Animation and Image Pose Transfer. CoRR abs/2511.22940 (2025)- 2024
[c3]Xiaoda Yang, Xize Cheng, Jiaqi Duan
, Hongshun Qiu, Minjie Hong, Minghui Fang, Shengpeng Ji, Jialong Zuo, Zhiqing Hong, Zhimeng Zhang, Tao Jin:
AudioVSR: Enhancing Video Speech Recognition with Audio Data. EMNLP 2024: 15352-15361
[c2]Dongjie Fu
, Xize Cheng
, Xiaoda Yang
, Hanting Wang
, Zhou Zhao
, Tao Jin
:
Boosting Speech Recognition Robustness to Modality-Distortion with Contrast-Augmented Prompts. ACM Multimedia 2024: 3838-3847
[c1]Xiaoda Yang
, Xize Cheng
, Dongjie Fu
, Minghui Fang
, Jialong Zuo
, Shengpeng Ji
, Zhou Zhao
, Tao Jin:
SyncTalklip: Highly Synchronized Lip-Readable Speaker Generation with Multi-Task Learning. ACM Multimedia 2024: 8149-8158
[i3]Minghui Fang, Shengpeng Ji, Jialong Zuo, Hai Huang, Yan Xia, Jieming Zhu, Xize Cheng, Xiaoda Yang, Wenrui Liu, Gang Wang, Zhenhua Dong, Zhou Zhao:
ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling. CoRR abs/2406.17507 (2024)
[i2]Shengpeng Ji, Ziyue Jiang, Xize Cheng, Yifu Chen, Minghui Fang, Jialong Zuo, Qian Yang, Ruiqi Li, Ziang Zhang, Xiaoda Yang, Rongjie Huang, Yidi Jiang, Qian Chen, Siqi Zheng, Wen Wang, Zhou Zhao:
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling. CoRR abs/2408.16532 (2024)
[i1]Shengpeng Ji, Yifu Chen, Minghui Fang, Jialong Zuo, Jingyu Lu, Hanting Wang, Ziyue Jiang, Long Zhou, Shujie Liu, Xize Cheng, Xiaoda Yang, Zehan Wang, Qian Yang, Jian Li, Yidi Jiang, Jingzhen He, Yunfei Chu, Jin Xu, Zhou Zhao:
WavChat: A Survey of Spoken Dialogue Models. CoRR abs/2411.13577 (2024)
Coauthor Index

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from
to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the
of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from
,
, and
to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from
and
to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from
.
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2026-03-05 23:49 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID







