


default search action
Zhengyuan Yang
Person information
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
showing all ?? records
2020 – today
- 2024
- [j7]Chunyuan Li, Zhe Gan, Zhengyuan Yang, Jianwei Yang, Linjie Li, Lijuan Wang, Jianfeng Gao:
Multimodal Foundation Models: From Specialists to General-Purpose Assistants. Found. Trends Comput. Graph. Vis. 16(1-2): 1-214 (2024) - [j6]Shengxi Li, Xuelong Li, Leonardo Chiariglione, Jiebo Luo, Wenwu Wang, Zhengyuan Yang, Danilo P. Mandic, Hamido Fujita:
Introduction to the Special Issue on AI-Generated Content for Multimedia. IEEE Trans. Circuits Syst. Video Technol. 34(8): 6809-6813 (2024) - [c42]Changsheng Lv, Mengshi Qi, Xia Li, Zhengyuan Yang, Huadong Ma:
SGFormer: Semantic Graph Transformer for Point Cloud-Based 3D Scene Graph Generation. AAAI 2024: 4035-4043 - [c41]Jaemin Cho, Linjie Li, Zhengyuan Yang, Zhe Gan, Lijuan Wang, Mohit Bansal:
Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation. CVPR Workshops 2024: 5280-5289 - [c40]Tan Wang, Linjie Li, Kevin Lin, Yuanhao Zhai, Chung-Ching Lin, Zhengyuan Yang, Hanwang Zhang, Zicheng Liu, Lijuan Wang:
Disco: Disentangled Control for Realistic Human Dance Generation. CVPR 2024: 9326-9336 - [c39]Zichen Miao, Jiang Wang, Ze Wang, Zhengyuan Yang, Lijuan Wang, Qiang Qiu, Zicheng Liu:
Training Diffusion Models Towards Diverse Image Generation with Reinforcement Learning. CVPR 2024: 10844-10853 - [c38]Chaoyi Zhang, Kevin Lin, Zhengyuan Yang, Jianfeng Wang, Linjie Li, Chung-Ching Lin, Zicheng Liu, Lijuan Wang:
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning. CVPR 2024: 13647-13657 - [c37]Jielin Qiu, Jiacheng Zhu, William Han, Aditesh Kumar, Karthik Mittal, Claire Jin, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Ding Zhao, Bo Li, Lijuan Wang:
MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos. CVPR 2024: 21909-21921 - [c36]Yuanhao Zhai
, Kevin Lin
, Linjie Li
, Chung-Ching Lin
, Jianfeng Wang
, Zhengyuan Yang
, David S. Doermann
, Junsong Yuan
, Zicheng Liu
, Lijuan Wang
:
IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation. ECCV (15) 2024: 134-152 - [c35]Zhengyuan Yang
, Jianfeng Wang
, Linjie Li, Kevin Lin, Chung-Ching Lin
, Zicheng Liu
, Lijuan Wang
:
Idea2Img: Iterative Self-refinement with GPT-4V for Automatic Image Design and Generation. ECCV (38) 2024: 167-184 - [c34]Jialian Wu, Jianfeng Wang, Zhengyuan Yang, Zhe Gan, Zicheng Liu, Junsong Yuan, Lijuan Wang:
GRiT: A Generative Region-to-Text Transformer for Object Understanding. ECCV (80) 2024: 207-224 - [c33]Zecheng Tang, Chenfei Wu, Zekai Zhang, Minheng Ni, Shengming Yin, Yu Liu, Zhengyuan Yang, Lijuan Wang, Zicheng Liu, Juntao Li, Nan Duan:
StrokeNUWA - Tokenizing Strokes for Vector Graphic Synthesis. ICML 2024 - [c32]Weihao Yu, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Kevin Lin, Zicheng Liu, Xinchao Wang, Lijuan Wang:
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities. ICML 2024 - [c31]Jie An, Zhengyuan Yang, Jianfeng Wang, Linjie Li, Zicheng Liu, Lijuan Wang, Jiebo Luo:
Bring Metric Functions into Diffusion Models. IJCAI 2024: 578-586 - [c30]Jie An
, Zhengyuan Yang
, Linjie Li
, Jianfeng Wang
, Kevin Lin
, Zicheng Liu
, Lijuan Wang
, Jiebo Luo
:
OpenLEAF: A Novel Benchmark for Open-Domain Interleaved Image-Text Generation. ACM Multimedia 2024: 11137-11145 - [c29]Yuanhao Zhai, Kevin Lin, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Chung-Ching Lin, David S. Doermann, Junsong Yuan, Lijuan Wang:
Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation. NeurIPS 2024 - [c28]Kevin Qinghong Lin, Linjie Li, Difei Gao, Qinchen Wu, Mingyi Yan, Zhengyuan Yang, Lijuan Wang, Mike Zheng Shou:
VideoGUI: A Benchmark for GUI Automation from Instructional Videos. NeurIPS 2024 - [c27]Xueyan Zou, Linjie Li, Jianfeng Wang, Jianwei Yang, Mingyu Ding, Junyi Wei, Zhengyuan Yang, Feng Li, Hao Zhang, Shilong Liu, Arul Aravinthan, Yong Jae Lee, Lijuan Wang:
Interfacing Foundation Models' Embeddings. NeurIPS 2024 - [i69]Alex Jinpeng Wang, Linjie Li, Kevin Qinghong Lin, Jianfeng Wang, Kevin Lin, Zhengyuan Yang, Lijuan Wang, Mike Zheng Shou:
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training. CoRR abs/2401.00849 (2024) - [i68]Jie An, Zhengyuan Yang, Jianfeng Wang, Linjie Li, Zicheng Liu, Lijuan Wang, Jiebo Luo:
Bring Metric Functions into Diffusion Models. CoRR abs/2401.02414 (2024) - [i67]Zecheng Tang, Chenfei Wu, Zekai Zhang, Mingheng Ni, Shengming Yin, Yu Liu, Zhengyuan Yang, Lijuan Wang, Zicheng Liu, Juntao Li, Nan Duan:
StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis. CoRR abs/2401.17093 (2024) - [i66]Chenglei Si, Yanzhe Zhang, Zhengyuan Yang, Ruibo Liu, Diyi Yang:
Design2Code: How Far Are We From Automating Front-End Engineering? CoRR abs/2403.03163 (2024) - [i65]Jielin Qiu, William Han, Winfred Wang, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Christos Faloutsos, Lei Li, Lijuan Wang:
Entity6K: A Large Open-Domain Evaluation Dataset for Real-World Entity Recognition. CoRR abs/2403.12339 (2024) - [i64]An Yan, Zhengyuan Yang, Junda Wu, Wanrong Zhu, Jianwei Yang, Linjie Li, Kevin Lin, Jianfeng Wang, Julian J. McAuley, Jianfeng Gao, Lijuan Wang:
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs. CoRR abs/2404.16375 (2024) - [i63]Yuanhao Zhai, Kevin Lin, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Chung-Ching Lin, David S. Doermann, Junsong Yuan, Lijuan Wang:
Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation. CoRR abs/2406.06890 (2024) - [i62]Xuehai He, Weixi Feng, Kaizhi Zheng, Yujie Lu, Wanrong Zhu, Jiachen Li, Yue Fan, Jianfeng Wang, Linjie Li, Zhengyuan Yang, Kevin Lin, William Yang Wang, Lijuan Wang, Xin Eric Wang:
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos. CoRR abs/2406.08407 (2024) - [i61]Kevin Qinghong Lin, Linjie Li, Difei Gao, Qinchen Wu, Mingyi Yan, Zhengyuan Yang, Lijuan Wang, Mike Zheng Shou:
VideoGUI: A Benchmark for GUI Automation from Instructional Videos. CoRR abs/2406.10227 (2024) - [i60]Yuanhao Zhai, Kevin Lin, Linjie Li, Chung-Ching Lin, Jianfeng Wang, Zhengyuan Yang, David S. Doermann, Junsong Yuan, Zicheng Liu, Lijuan Wang:
IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation. CoRR abs/2407.10937 (2024) - [i59]Weihao Yu, Zhengyuan Yang, Linfeng Ren, Linjie Li, Jianfeng Wang, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Lijuan Wang, Xinchao Wang:
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities. CoRR abs/2408.00765 (2024) - [i58]Minheng Ni, Chenfei Wu, Huaying Yuan, Zhengyuan Yang, Ming Gong, Lijuan Wang, Zicheng Liu, Wangmeng Zuo, Nan Duan:
AutoDirector: Online Auto-scheduling Agents for Multi-sensory Composition. CoRR abs/2408.11564 (2024) - [i57]Zichen Miao, Zhengyuan Yang, Kevin Lin, Ze Wang, Zicheng Liu, Lijuan Wang, Qiang Qiu:
Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization. CoRR abs/2410.03190 (2024) - [i56]Hang Hua, Yunlong Tang, Ziyun Zeng, Liangliang Cao, Zhengyuan Yang, Hangfeng He, Chenliang Xu, Jiebo Luo:
MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models. CoRR abs/2410.09733 (2024) - [i55]Kaizhi Zheng, Xiaotong Chen, Xuehai He, Jing Gu, Linjie Li, Zhengyuan Yang, Kevin Lin, Jianfeng Wang, Lijuan Wang, Xin Eric Wang:
EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing. CoRR abs/2410.12836 (2024) - [i54]Yining Hong, Beide Liu, Maxine Wu, Yuanhao Zhai, Kai-Wei Chang, Lingjie Li, Kevin Lin, Chung-Ching Lin, Jianfeng Wang, Zhengyuan Yang, Yingnian Wu, Lijuan Wang:
SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation. CoRR abs/2410.23277 (2024) - [i53]Yuyang Zhao, Chung-Ching Lin, Kevin Lin, Zhiwen Yan, Linjie Li, Zhengyuan Yang, Jianfeng Wang, Gim Hee Lee, Lijuan Wang:
GenXD: Generating Any 3D and 4D Scenes. CoRR abs/2411.02319 (2024) - [i52]Qin Liu, Jianfeng Wang, Zhengyuan Yang, Linjie Li, Kevin Lin, Marc Niethammer, Lijuan Wang:
LiVOS: Light Video Object Segmentation with Gated Linear Matching. CoRR abs/2411.02818 (2024) - [i51]Taewook Kim, Ze Wang, Zhengyuan Yang, Jiang Wang, Lijuan Wang, Zicheng Liu, Qiang Qiu:
Conditional Text-to-Image Generation with Reference Guidance. CoRR abs/2411.16713 (2024) - [i50]Kevin Qinghong Lin, Linjie Li, Difei Gao, Zhengyuan Yang, Shiwei Wu, Zechen Bai, Weixian Lei, Lijuan Wang, Mike Zheng Shou:
ShowUI: One Vision-Language-Action Model for GUI Visual Agent. CoRR abs/2411.17465 (2024) - [i49]Xiyao Wang, Zhengyuan Yang, Linjie Li, Hongjin Lu, Yuancheng Xu, Chung-Ching Lin, Kevin Lin, Furong Huang, Lijuan Wang:
Scaling Inference-Time Search with Vision Value Model for Improved Visual Comprehension. CoRR abs/2412.03704 (2024) - [i48]Jitesh Jain, Zhengyuan Yang, Humphrey Shi, Jianfeng Gao, Jianwei Yang:
OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation. CoRR abs/2412.09585 (2024) - 2023
- [j5]Jiajun Deng
, Zhengyuan Yang
, Daqing Liu
, Tianlang Chen
, Wengang Zhou
, Yanyong Zhang
, Houqiang Li
, Wanli Ouyang
:
TransVG++: End-to-End Visual Grounding With Language Conditioned Vision Transformer. IEEE Trans. Pattern Anal. Mach. Intell. 45(11): 13636-13652 (2023) - [c26]Shengming Yin, Chenfei Wu, Huan Yang, Jianfeng Wang, Xiaodong Wang, Minheng Ni, Zhengyuan Yang, Linjie Li, Shuguang Liu, Fan Yang, Jianlong Fu, Ming Gong, Lijuan Wang, Zicheng Liu, Houqiang Li, Nan Duan:
NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation. ACL (1) 2023: 1309-1320 - [c25]Zhengyuan Yang, Jianfeng Wang, Zhe Gan, Linjie Li, Kevin Lin, Chenfei Wu, Nan Duan, Zicheng Liu, Ce Liu, Michael Zeng, Lijuan Wang:
ReCo: Region-Controlled Text-to-Image Generation. CVPR 2023: 14246-14255 - [c24]Yushi Hu, Hang Hua, Zhengyuan Yang, Weijia Shi, Noah A. Smith, Jiebo Luo:
PromptCap: Prompt-Guided Image Captioning for VQA with GPT-3. ICCV 2023: 2951-2963 - [c23]Tan Wang, Kevin Lin, Linjie Li, Chung-Ching Lin, Zhengyuan Yang, Hanwang Zhang, Zicheng Liu, Lijuan Wang:
Equivariant Similarity for Vision-Language Foundation Models. ICCV 2023: 11964-11974 - [c22]Chenglei Si, Zhe Gan, Zhengyuan Yang, Shuohang Wang, Jianfeng Wang, Jordan L. Boyd-Graber, Lijuan Wang:
Prompting GPT-3 To Be Reliable. ICLR 2023 - [c21]Xiaodong Wang, Chenfei Wu, Shengming Yin, Minheng Ni, Jianfeng Wang, Linjie Li, Zhengyuan Yang, Fan Yang, Lijuan Wang, Zicheng Liu, Yuejian Fang, Nan Duan:
Learning 3D Photography Videos via Self-supervised Diffusion on Single Images. IJCAI 2023: 1506-1514 - [i47]Xiaodong Wang, Chenfei Wu
, Shengming Yin, Minheng Ni, Jianfeng Wang, Linjie Li, Zhengyuan Yang, Fan Yang, Lijuan Wang, Zicheng Liu, Yuejian Fang, Nan Duan
:
Learning 3D Photography Videos via Self-supervised Diffusion on Single Images. CoRR abs/2302.10781 (2023) - [i46]Changsheng Lv, Mengshi Qi, Xia Li, Zhengyuan Yang, Huadong Ma:
Revisiting Transformer for Point Cloud-based 3D Scene Graph Generation. CoRR abs/2303.11048 (2023) - [i45]Zhengyuan Yang, Linjie Li, Jianfeng Wang, Kevin Lin, Ehsan Azarnasab, Faisal Ahmed, Zicheng Liu, Ce Liu, Michael Zeng, Lijuan Wang:
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action. CoRR abs/2303.11381 (2023) - [i44]Shengming Yin, Chenfei Wu
, Huan Yang, Jianfeng Wang, Xiaodong Wang, Minheng Ni, Zhengyuan Yang, Linjie Li, Shuguang Liu, Fan Yang, Jianlong Fu, Gong Ming, Lijuan Wang, Zicheng Liu, Houqiang Li, Nan Duan:
NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation. CoRR abs/2303.12346 (2023) - [i43]Tan Wang, Kevin Lin, Linjie Li, Chung-Ching Lin, Zhengyuan Yang, Hanwang Zhang, Zicheng Liu, Lijuan Wang:
Equivariant Similarity for Vision-Language Foundation Models. CoRR abs/2303.14465 (2023) - [i42]Jaemin Cho, Linjie Li, Zhengyuan Yang, Zhe Gan, Lijuan Wang, Mohit Bansal:
Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation. CoRR abs/2304.06671 (2023) - [i41]Jielin Qiu, Jiacheng Zhu, William Han, Aditesh Kumar, Karthik Mittal, Claire Jin, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Bo Li, Ding Zhao, Lijuan Wang:
MultiSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos. CoRR abs/2306.04216 (2023) - [i40]Tan Wang, Linjie Li, Kevin Lin, Chung-Ching Lin, Zhengyuan Yang, Hanwang Zhang, Zicheng Liu, Lijuan Wang:
DisCo: Disentangled Control for Referring Human Dance Generation in Real World. CoRR abs/2307.00040 (2023) - [i39]Xin Yuan, Linjie Li, Jianfeng Wang, Zhengyuan Yang, Kevin Lin, Zicheng Liu, Lijuan Wang:
Spatial-Frequency U-Net for Denoising Diffusion Probabilistic Models. CoRR abs/2307.14648 (2023) - [i38]Weihao Yu
, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Kevin Lin, Zicheng Liu, Xinchao Wang, Lijuan Wang:
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities. CoRR abs/2308.02490 (2023) - [i37]Chunyuan Li, Zhe Gan, Zhengyuan Yang, Jianwei Yang, Linjie Li, Lijuan Wang, Jianfeng Gao:
Multimodal Foundation Models: From Specialists to General-Purpose Assistants. CoRR abs/2309.10020 (2023) - [i36]Zhengyuan Yang, Linjie Li, Kevin Lin, Jianfeng Wang, Chung-Ching Lin, Zicheng Liu, Lijuan Wang:
The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision). CoRR abs/2309.17421 (2023) - [i35]Jie An, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Kevin Lin, Zicheng Liu, Lijuan Wang, Jiebo Luo
:
OpenLEAF: Open-Domain Interleaved Image-Text Generation and Evaluation. CoRR abs/2310.07749 (2023) - [i34]Zhengyuan Yang, Jianfeng Wang, Linjie Li, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Lijuan Wang:
Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation. CoRR abs/2310.08541 (2023) - [i33]Kevin Lin, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Lijuan Wang:
DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual Design. CoRR abs/2310.15144 (2023) - [i32]Kevin Lin, Faisal Ahmed, Linjie Li, Chung-Ching Lin, Ehsan Azarnasab, Zhengyuan Yang, Jianfeng Wang, Lin Liang, Zicheng Liu, Yumao Lu, Ce Liu, Lijuan Wang:
MM-VID: Advancing Video Understanding with GPT-4V(ision). CoRR abs/2310.19773 (2023) - [i31]Hanjia Lyu, Jinfa Huang, Daoan Zhang, Yongsheng Yu, Xinyi Mou, Jinsheng Pan, Zhengyuan Yang, Zhongyu Wei, Jiebo Luo
:
GPT-4V(ision) as A Social Media Analysis Engine. CoRR abs/2311.07547 (2023) - [i30]An Yan, Zhengyuan Yang, Wanrong Zhu, Kevin Lin, Linjie Li, Jianfeng Wang, Jianwei Yang, Yiwu Zhong, Julian J. McAuley, Jianfeng Gao, Zicheng Liu, Lijuan Wang:
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation. CoRR abs/2311.07562 (2023) - [i29]Chaoyi Zhang, Kevin Lin, Zhengyuan Yang, Jianfeng Wang, Linjie Li, Chung-Ching Lin, Zicheng Liu, Lijuan Wang:
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning. CoRR abs/2311.17435 (2023) - [i28]Xueyan Zou, Linjie Li, Jianfeng Wang, Jianwei Yang, Mingyu Ding, Zhengyuan Yang, Feng Li, Hao Zhang, Shilong Liu, Arul Aravinthan, Yong Jae Lee, Lijuan Wang:
Interfacing Foundation Models' Embeddings. CoRR abs/2312.07532 (2023) - [i27]Bingbing Wen, Zhengyuan Yang, Jianfeng Wang, Zhe Gan, Bill Howe, Lijuan Wang:
InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large Multimodal and Language Models. CoRR abs/2312.13503 (2023) - 2022
- [j4]Jianfeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, Lijuan Wang:
GIT: A Generative Image-to-text Transformer for Vision and Language. Trans. Mach. Learn. Res. 2022 (2022) - [c20]Zhengyuan Yang, Zhe Gan, Jianfeng Wang, Xiaowei Hu, Yumao Lu, Zicheng Liu, Lijuan Wang:
An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA. AAAI 2022: 3081-3089 - [c19]Xiaowei Hu, Zhe Gan, Jianfeng Wang, Zhengyuan Yang, Zicheng Liu, Yumao Lu, Lijuan Wang:
Scaling Up Vision-Language Pretraining for Image Captioning. CVPR 2022: 17959-17968 - [c18]Zhengyuan Yang, Zhe Gan, Jianfeng Wang, Xiaowei Hu, Faisal Ahmed, Zicheng Liu, Yumao Lu, Lijuan Wang:
UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling. ECCV (36) 2022: 521-539 - [c17]Mengyang Song
, Guoquan Jiang
, Zhanqiang Huo
, Zhengyuan Yang
, Hongxu Zhang
:
Apple Counting Network Before Fruit Thinning Period Based On Dilated Convolution. ICNCC 2022: 129-134 - [c16]Zhengyuan Yang, Jingen Liu, Jing Huang, Xiaodong He, Tao Mei, Chenliang Xu, Jiebo Luo
:
Cross-modal Contrastive Distillation for Instructional Activity Anticipation. ICPR 2022: 5002-5009 - [i26]Zhengyuan Yang, Jingen Liu, Jing Huang, Xiaodong He, Tao Mei, Chenliang Xu, Jiebo Luo:
Cross-modal Contrastive Distillation for Instructional Activity Anticipation. CoRR abs/2201.06734 (2022) - [i25]Jianfeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, Lijuan Wang:
GIT: A Generative Image-to-text Transformer for Vision and Language. CoRR abs/2205.14100 (2022) - [i24]Jiajun Deng, Zhengyuan Yang, Daqing Liu, Tianlang Chen, Wengang Zhou, Yanyong Zhang, Houqiang Li, Wanli Ouyang:
TransVG++: End-to-End Visual Grounding with Language Conditioned Vision Transformer. CoRR abs/2206.06619 (2022) - [i23]Chenglei Si, Zhe Gan, Zhengyuan Yang, Shuohang Wang, Jianfeng Wang, Jordan L. Boyd-Graber, Lijuan Wang:
Prompting GPT-3 To Be Reliable. CoRR abs/2210.09150 (2022) - [i22]Yushi Hu, Hang Hua, Zhengyuan Yang, Weijia Shi, Noah A. Smith, Jiebo Luo
:
PromptCap: Prompt-Guided Task-Aware Image Captioning. CoRR abs/2211.09699 (2022) - [i21]Zhengyuan Yang, Jianfeng Wang, Zhe Gan, Linjie Li, Kevin Lin, Chenfei Wu
, Nan Duan
, Zicheng Liu, Ce Liu, Michael Zeng, Lijuan Wang:
ReCo: Region-Controlled Text-to-Image Generation. CoRR abs/2211.15518 (2022) - [i20]Jialian Wu, Jianfeng Wang, Zhengyuan Yang, Zhe Gan, Zicheng Liu, Junsong Yuan
, Lijuan Wang:
GRiT: A Generative Region-to-text Transformer for Object Understanding. CoRR abs/2212.00280 (2022) - 2021
- [j3]Zhengyuan Yang
, Tushar Kumar
, Tianlang Chen
, Jingsong Su
, Jiebo Luo
:
Grounding-Tracking-Integration. IEEE Trans. Circuits Syst. Video Technol. 31(9): 3433-3443 (2021) - [c15]Zhengyuan Yang, Yijuan Lu, Jianfeng Wang, Xi Yin, Dinei Florêncio, Lijuan Wang, Cha Zhang, Lei Zhang, Jiebo Luo
:
TAP: Text-Aware Pre-Training for Text-VQA and Text-Caption. CVPR 2021: 8751-8761 - [c14]Liwei Wang, Jing Huang
, Yin Li, Kun Xu, Zhengyuan Yang, Dong Yu:
Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation. CVPR 2021: 14090-14100 - [c13]Jiajun Deng, Zhengyuan Yang, Tianlang Chen, Wengang Zhou, Houqiang Li:
TransVG: End-to-End Visual Grounding with Transformers. ICCV 2021: 1749-1759 - [c12]Zhengyuan Yang, Songyang Zhang
, Liwei Wang, Jiebo Luo
:
SAT: 2D Semantics Assisted Training for 3D Visual Grounding. ICCV 2021: 1836-1846 - [i19]Jiajun Deng, Zhengyuan Yang, Tianlang Chen, Wengang Zhou, Houqiang Li:
TransVG: End-to-End Visual Grounding with Transformers. CoRR abs/2104.08541 (2021) - [i18]Zhengyuan Yang, Songyang Zhang, Liwei Wang, Jiebo Luo:
SAT: 2D Semantics Assisted Training for 3D Visual Grounding. CoRR abs/2105.11450 (2021) - [i17]Zhengyuan Yang, Zhe Gan, Jianfeng Wang, Xiaowei Hu, Yumao Lu, Zicheng Liu, Lijuan Wang:
An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA. CoRR abs/2109.05014 (2021) - [i16]Jianfeng Wang, Xiaowei Hu, Zhe Gan, Zhengyuan Yang, Xiyang Dai, Zicheng Liu, Yumao Lu, Lijuan Wang:
UFO: A UniFied TransfOrmer for Vision-Language Representation Learning. CoRR abs/2111.10023 (2021) - [i15]Zhengyuan Yang, Zhe Gan, Jianfeng Wang, Xiaowei Hu, Faisal Ahmed, Zicheng Liu, Yumao Lu, Lijuan Wang:
Crossing the Format Boundary of Text and Boxes: Towards Unified Vision-Language Modeling. CoRR abs/2111.12085 (2021) - [i14]Xiaowei Hu, Zhe Gan, Jianfeng Wang, Zhengyuan Yang, Zicheng Liu, Yumao Lu, Lijuan Wang:
Scaling Up Vision-Language Pre-training for Image Captioning. CoRR abs/2111.12233 (2021) - 2020
- [c11]Yongjing Yin, Fandong Meng, Jinsong Su, Chulun Zhou
, Zhengyuan Yang, Jie Zhou, Jiebo Luo:
A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine Translation. ACL 2020: 3025-3035 - [c10]Zhengyuan Yang, Tianlang Chen, Liwei Wang, Jiebo Luo
:
Improving One-Stage Visual Grounding by Recursive Sub-query Construction. ECCV (14) 2020: 387-404 - [c9]Zhengyuan Yang, Yuncheng Li, Linjie Yang, Ning Zhang, Jiebo Luo
:
Weakly Supervised Body Part Segmentation with Pose based Part Priors. ICPR 2020: 286-293 - [c8]Zhengyuan Yang, Amanda Kay, Yuncheng Li, Wendi Cross, Jiebo Luo
:
Pose-based Body Language Recognition for Emotion and Psychiatric Symptom Interpretation. ICPR 2020: 294-301 - [c7]Huan Lin, Fandong Meng, Jinsong Su, Yongjing Yin, Zhengyuan Yang, Yubin Ge, Jie Zhou, Jiebo Luo
:
Dynamic Context-guided Capsule Network for Multimodal Machine Translation. ACM Multimedia 2020: 1320-1329 - [i13]Liwei Wang, Jing Huang, Yin Li, Kun Xu, Zhengyuan Yang, Dong Yu:
Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation. CoRR abs/2007.01951 (2020) - [i12]Yongjing Yin, Fandong Meng, Jinsong Su, Chulun Zhou, Zhengyuan Yang, Jie Zhou, Jiebo Luo:
A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine Translation. CoRR abs/2007.08742 (2020) - [i11]Zhengyuan Yang, Tianlang Chen, Liwei Wang, Jiebo Luo:
Improving One-stage Visual Grounding by Recursive Sub-query Construction. CoRR abs/2008.01059 (2020) - [i10]Huan Lin, Fandong Meng, Jinsong Su, Yongjing Yin, Zhengyuan Yang, Yubin Ge, Jie Zhou, Jiebo Luo:
Dynamic Context-guided Capsule Network for Multimodal Machine Translation. CoRR abs/2009.02016 (2020) - [i9]Zhengyuan Yang, Amanda Kay, Yuncheng Li, Wendi Cross, Jiebo Luo:
Pose-based Body Language Recognition for Emotion and Psychiatric Symptom Interpretation. CoRR abs/2011.00043 (2020) - [i8]Zhengyuan Yang, Yijuan Lu, Jianfeng Wang, Xi Yin, Dinei A. F. Florêncio, Lijuan Wang, Cha Zhang, Lei Zhang, Jiebo Luo:
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption. CoRR abs/2012.04638 (2020)
2010 – 2019
- 2019
- [j2]Zhengyuan Yang
, Yuncheng Li, Jianchao Yang, Jiebo Luo
:
Action Recognition With Spatio-Temporal Visual Attention on Skeleton Image Sequences. IEEE Trans. Circuits Syst. Video Technol. 29(8): 2405-2415 (2019) - [c6]Mengshi Qi, Weijian Li, Zhengyuan Yang, Yunhong Wang, Jiebo Luo
:
Attentive Relational Networks for Mapping Images to Scene Graphs. CVPR 2019: 3957-3966 - [c5]Zhengyuan Yang, Boqing Gong, Liwei Wang, Wenbing Huang, Dong Yu, Jiebo Luo
:
A Fast and Accurate One-Stage Approach to Visual Grounding. ICCV 2019: 4682-4692 - [c4]