


default search action
Fengyun Rao
Person information
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2025
[c15]Xinli Yue, Jianhui Sun, Junda Lu, Liangchao Yao, Fan Xia, Tianyi Wang, Fengyun Rao, Jing Lyu, Yuetang Deng:
Instruction-augmented Multimodal Alignment for Image-Text and Element Matching. CVPR Workshops 2025: 1379-1388
[c14]Zitang Zhou, Ke Mei, Yu Lu, Tianyi Wang, Fengyun Rao:
HarmonySet: A Comprehensive Dataset for Understanding Video-Music Semantic Alignment and Temporal Synchronization. CVPR 2025: 3152-3162
[c13]Jian Yang, Dacheng Yin, Yizhou Zhou, Fengyun Rao, Wei Zhai, Yang Cao, Zheng-Jun Zha:
MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling. CVPR 2025: 7974-7985
[c12]Yongliang Wu, Xinting Hu, Yuyang Sun, Yizhou Zhou, Wenbo Zhu, Fengyun Rao, Bernt Schiele, Xu Yang:
Number it: Temporal Grounding Videos like Flipping Manga. CVPR 2025: 13754-13765
[c11]Cong Chen, Mingyu Liu, Chenchen Jing, Yizhou Zhou, Fengyun Rao, Hao Chen, Bo Zhang, Chunhua Shen:
PerturboLLaVA: Reducing Multimodal Hallucinations with Perturbative Visual Training. ICLR 2025
[i30]Zitang Zhou, Ke Mei, Yu Lu, Tianyi Wang, Fengyun Rao:
HarmonySet: A Comprehensive Dataset for Understanding Video-Music Semantic Alignment and Temporal Synchronization. CoRR abs/2503.01725 (2025)
[i29]Cong Chen, Mingyu Liu, Chenchen Jing, Yizhou Zhou, Fengyun Rao, Hao Chen, Bo Zhang, Chunhua Shen:
PerturboLLaVA: Reducing Multimodal Hallucinations with Perturbative Visual Training. CoRR abs/2503.06486 (2025)
[i28]Yi Yang, Xiaoxuan He, Hongkun Pan, Xiyan Jiang, Yan Deng, Xingtao Yang, Haoyu Lu, Dacheng Yin, Fengyun Rao, Minfeng Zhu, Bo Zhang, Wei Chen:
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization. CoRR abs/2503.10615 (2025)
[i27]Zitian Wang, Yue Liao, Kang Rong, Fengyun Rao, Yibo Yang, Si Liu:
Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs. CoRR abs/2503.20309 (2025)
[i26]Yucheng Suo, Fan Ma, Linchao Zhu, Tianyi Wang, Fengyun Rao, Yi Yang:
From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment. CoRR abs/2503.20472 (2025)
[i25]Xinli Yue, Jianhui Sun, Junda Lu, Liangchao Yao, Fan Xia, Tianyi Wang, Fengyun Rao, Jing Lyu, Yuetang Deng:
Instruction-augmented Multimodal Alignment for Image-Text and Element Matching. CoRR abs/2504.12018 (2025)
[i24]Yunzhu Zhang, Yu Lu, Tianyi Wang, Fengyun Rao, Yi Yang, Linchao Zhu:
FlexSelect: Flexible Token Selection for Efficient Long Video Understanding. CoRR abs/2506.00993 (2025)
[i23]Jie Yang, Feipeng Ma, Zitian Wang, Dacheng Yin, Kang Rong, Fengyun Rao, Ruimao Zhang:
WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning. CoRR abs/2506.07905 (2025)
[i22]Zhixiang Wei, Guangting Wang, Xiaoxiao Ma, Ke Mei, Huaian Chen, Yi Jin, Fengyun Rao:
HQ-CLIP: Leveraging Large Vision-Language Models to Create High-Quality Image-Text Datasets and CLIP Models. CoRR abs/2507.22431 (2025)
[i21]Xiaoxuan He, Siming Fu, Yuke Zhao, Wanli Li, Jian Yang, Dacheng Yin, Fengyun Rao, Bo Zhang:
TempFlow-GRPO: When Timing Matters for GRPO in Flow Models. CoRR abs/2508.04324 (2025)
[i20]Changli Tang, Qinfan Xiao, Ke Mei, Tianyi Wang, Fengyun Rao, Chao Zhang:
WAVE: Learning Unified & Versatile Audio-Visual Embeddings with Multimodal LLM. CoRR abs/2509.21990 (2025)
[i19]Jian Yang, Dacheng Yin, Xiaoxuan He, Yong Liu, Fengyun Rao, Jing Lyu, Wei Zhai, Yang Cao, Zheng-Jun Zha:
WeMMU: Enhanced Bridging of Vision-Language Models and Diffusion Models via Noisy Query Tokens. CoRR abs/2512.02536 (2025)
[i18]Shenghao Fu, Yukun Su, Fengyun Rao, Jing Lyu, Xiaohua Xie, Wei-Shi Zheng:
WeDetect: Fast Open-Vocabulary Object Detection as Retrieval. CoRR abs/2512.12309 (2025)
[i17]Tao Zhang, Ziqi Zhang, Zongyang Ma, Yuxin Chen, Bing Li, Chunfeng Yuan, Guangting Wang, Fengyun Rao, Ying Shan, Weiming Hu:
MMhops-R1: Multimodal Multi-hop Reasoning. CoRR abs/2512.13573 (2025)- 2024
[c10]Feipeng Ma, Yizhou Zhou, Fengyun Rao, Yueyi Zhang, Xiaoyan Sun:
Image Captioning with Multi-Context Synthetic Data. AAAI 2024: 4089-4097
[c9]Yukun Su, Yiwen Cao, Jingliang Deng, Fengyun Rao, Qingyao Wu:
Spatial-Semantic Collaborative Cropping for User Generated Content. AAAI 2024: 4988-4997
[c8]Liang Xu, Yizhou Zhou, Yichao Yan, Xin Jin
, Wenhan Zhu, Fengyun Rao, Xiaokang Yang, Wenjun Zeng:
ReGenNet: Towards Human Action-Reaction Synthesis. CVPR 2024: 1759-1769
[c7]Feipeng Ma, Yizhou Zhou, Yueyi Zhang, Siying Wu, Zheyu Zhang, Zilong He, Fengyun Rao, Xiaoyan Sun:
Task Navigator: Decomposing Complex Tasks for Multimodal Large Language Models. CVPR Workshops 2024: 2248-2257
[c6]Liang Xu, Xintao Lv, Yichao Yan, Xin Jin
, Shuwen Wu, Congsheng Xu, Yifan Liu, Yizhou Zhou, Fengyun Rao, Xingdong Sheng, Yunhui Liu, Wenjun Zeng, Xiaokang Yang:
Inter-X: Towards Versatile Human-Human Interaction Analysis. CVPR 2024: 22260-22271
[c5]Feipeng Ma, Hongwei Xue, Yizhou Zhou, Guangting Wang, Fengyun Rao, Shilin Yan, Yueyi Zhang, Siying Wu, Mike Zheng Shou, Xiaoyan Sun:
Visual Perception by Large Language Model's Weights. NeurIPS 2024
[i16]Yukun Su, Yiwen Cao, Jingliang Deng, Fengyun Rao, Qingyao Wu:
Spatial-Semantic Collaborative Cropping for User Generated Content. CoRR abs/2401.08086 (2024)
[i15]Liang Xu, Yizhou Zhou, Yichao Yan, Xin Jin
, Wenhan Zhu, Fengyun Rao, Xiaokang Yang, Wenjun Zeng
:
ReGenNet: Towards Human Action-Reaction Synthesis. CoRR abs/2403.11882 (2024)
[i14]Feipeng Ma, Hongwei Xue, Guangting Wang, Yizhou Zhou, Fengyun Rao, Shilin Yan, Yueyi Zhang, Siying Wu, Mike Zheng Shou, Xiaoyan Sun:
Multi-Modal Generative Embedding Model. CoRR abs/2405.19333 (2024)
[i13]Feipeng Ma, Hongwei Xue, Guangting Wang, Yizhou Zhou, Fengyun Rao, Shilin Yan, Yueyi Zhang, Siying Wu, Mike Zheng Shou, Xiaoyan Sun:
Visual Perception by Large Language Model's Weights. CoRR abs/2405.20339 (2024)
[i12]Feipeng Ma, Yizhou Zhou, Hebei Li, Zilong He, Siying Wu, Fengyun Rao, Yueyi Zhang, Xiaoyan Sun:
EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model. CoRR abs/2408.11795 (2024)
[i11]Xinli Yue, Jianhui Sun, Liangchao Yao, Fan Xia, Yuetang Deng, Tianyi Wang, Lei Li, Fengyun Rao, Jing Lv, Qian Wang, Lingchen Zhao:
Revisiting Video Quality Assessment from the Perspective of Generalization. CoRR abs/2409.14847 (2024)
[i10]Xinli Yue, Jianhui Sun, Han Kong, Liangchao Yao, Tianyi Wang, Lei Li, Fengyun Rao, Jing Lv, Fan Xia, Yuetang Deng, Qian Wang, Lingchen Zhao:
Advancing Video Quality Assessment for AIGC. CoRR abs/2409.14888 (2024)
[i9]Jian Yang, Dacheng Yin, Yizhou Zhou, Fengyun Rao, Wei Zhai, Yang Cao, Zheng-Jun Zha:
MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling. CoRR abs/2410.10798 (2024)
[i8]Yongliang Wu, Xinting Hu
, Yuyang Sun, Yizhou Zhou, Wenbo Zhu, Fengyun Rao, Bernt Schiele, Xu Yang:
Number it: Temporal Grounding Videos like Flipping Manga. CoRR abs/2411.10332 (2024)- 2023
[i7]Tianyi Wang, Feipeng Ma, Zhenhua Liu, Fengyun Rao:
A Dual-level Detection Method for Video Copy Detection. CoRR abs/2305.12361 (2023)
[i6]Zhenhua Liu, Feipeng Ma, Tianyi Wang, Fengyun Rao:
A Similarity Alignment Model for Video Copy Segment Matching. CoRR abs/2305.15679 (2023)
[i5]Feipeng Ma, Yizhou Zhou, Fengyun Rao, Yueyi Zhang, Xiaoyan Sun:
Text-Only Image Captioning with Multi-Context Data Generation. CoRR abs/2305.18072 (2023)
[i4]Liang Xu, Xintao Lv, Yichao Yan, Xin Jin
, Shuwen Wu, Congsheng Xu, Yifan Liu, Yizhou Zhou, Fengyun Rao, Xingdong Sheng, Yunhui Liu, Wenjun Zeng
, Xiaokang Yang:
Inter-X: Towards Versatile Human-Human Interaction Analysis. CoRR abs/2312.16051 (2023)- 2022
[c4]Zhaoyang Zeng, Yongsheng Luo, Zhenhua Liu, Fengyun Rao, Dian Li, Weidong Guo, Zhen Wen:
Tencent-MVSE: A Large-Scale Benchmark Dataset for Multi-Modal Video Similarity Evaluation. CVPR 2022: 3128-3137
[c3]Lu Qi, Jason Kuen, Zhe Lin, Jiuxiang Gu, Fengyun Rao, Dian Li, Weidong Guo, Zhen Wen, Ming-Hsuan Yang, Jiaya Jia
:
CA-SSL: Class-Agnostic Semi-Supervised Learning for Detection and Segmentation. ECCV (31) 2022: 59-77- 2021
[c2]Mingkang Tang, Zhanyu Wang, Zhenhua Liu, Fengyun Rao, Dian Li, Xiu Li:
CLIP4Caption: CLIP for Video Caption. ACM Multimedia 2021: 4858-4862
[i3]Mingkang Tang, Zhanyu Wang, Zhaoyang Zeng, Fengyun Rao, Dian Li:
CLIP4Caption ++: Multi-CLIP for Video Caption. CoRR abs/2110.05204 (2021)
[i2]Mingkang Tang, Zhanyu Wang, Zhenhua Liu, Fengyun Rao, Dian Li, Xiu Li:
CLIP4Caption: CLIP for Video Caption. CoRR abs/2110.06615 (2021)
[i1]Lu Qi, Jason Kuen, Zhe Lin, Jiuxiang Gu, Fengyun Rao, Dian Li, Weidong Guo, Zhen Wen, Jiaya Jia:
CaSP: Class-agnostic Semi-Supervised Pretraining for Detection and Segmentation. CoRR abs/2112.04966 (2021)
2010 – 2019
- 2019
[c1]Zehui Dai
, Wei Dai
, Zhenhua Liu
, Fengyun Rao
, Huajie Chen, Guangpeng Zhang, Yadong Ding, Jiyang Liu:
Multi-Task Multi-Head Attention Memory Network for Fine-Grained Sentiment Analysis. NLPCC (1) 2019: 609-620
Coauthor Index

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from
to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the
of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from
,
, and
to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from
and
to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from
.
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2026-01-28 02:15 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID







