- Sheng Shen, Zhewei Yao, Chunyuan Li, Trevor Darrell, Kurt Keutzer, Yuxiong He:
Scaling Vision-Language Models with Sparse Mixture of Experts. EMNLP (Findings) 2023: 11329-11344 - Ting Liu, Chunyuan Li:
Research on multi-factor quantitative investment strategy of SVM model based on machine learning. ICAICE 2023: 654-659 - Hao Zhang, Feng Li, Xueyan Zou, Shilong Liu, Chunyuan Li, Jianwei Yang, Lei Zhang:
A Simple Framework for Open-Vocabulary Segmentation and Detection. ICCV 2023: 1020-1031 - Liangyu Chen, Bo Li, Sheng Shen, Jingkang Yang, Chunyuan Li, Kurt Keutzer, Trevor Darrell, Ziwei Liu:
Large Language Models are Visual Reasoning Coordinators. NeurIPS 2023 - Chunyuan Li, Cliff Wong, Sheng Zhang, Naoto Usuyama, Haotian Liu, Jianwei Yang, Tristan Naumann, Hoifung Poon, Jianfeng Gao:
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day. NeurIPS 2023 - Haotian Liu, Chunyuan Li, Qingyang Wu, Yong Jae Lee:
Visual Instruction Tuning. NeurIPS 2023 - Yuheng Li, Haotian Liu, Qingyang Wu, Fangzhou Mu, Jianwei Yang, Jianfeng Gao, Chunyuan Li, Yong Jae Lee:
GLIGEN: Open-Set Grounded Text-to-Image Generation. CoRR abs/2301.07093 (2023) - Haotian Liu, Kilho Son, Jianwei Yang, Ce Liu, Jianfeng Gao, Yong Jae Lee, Chunyuan Li:
Learning Customized Visual Models with Retrieval-Augmented Knowledge. CoRR abs/2301.07094 (2023) - Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang:
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection. CoRR abs/2303.05499 (2023) - Sheng Shen, Zhewei Yao, Chunyuan Li, Trevor Darrell, Kurt Keutzer, Yuxiong He:
Scaling Vision-Language Models with Sparse Mixture of Experts. CoRR abs/2303.07226 (2023) - Hao Zhang, Feng Li, Xueyan Zou, Shilong Liu, Chunyuan Li, Jianfeng Gao, Jianwei Yang, Lei Zhang:
A Simple Framework for Open-Vocabulary Segmentation and Detection. CoRR abs/2303.08131 (2023) - Baolin Peng, Chunyuan Li, Pengcheng He, Michel Galley, Jianfeng Gao:
Instruction Tuning with GPT-4. CoRR abs/2304.03277 (2023) - Haotian Liu, Chunyuan Li, Qingyang Wu, Yong Jae Lee:
Visual Instruction Tuning. CoRR abs/2304.08485 (2023) - Jianyi Zhang, Saeed Vahidian, Martin Kuo, Chunyuan Li, Ruiyi Zhang, Guoyin Wang, Yiran Chen:
Towards Building the Federated GPT: Federated Instruction Tuning. CoRR abs/2305.05644 (2023) - Yuliang Liu, Zhang Li, Hongliang Li, Wenwen Yu, Mingxin Huang, Dezhi Peng, Mingyu Liu, Mingrui Chen, Chunyuan Li, Lianwen Jin, Xiang Bai:
On the Hidden Mystery of OCR in Large Multimodal Models. CoRR abs/2305.07895 (2023) - Chunyuan Li, Cliff Wong, Sheng Zhang, Naoto Usuyama, Haotian Liu, Jianwei Yang, Tristan Naumann, Hoifung Poon, Jianfeng Gao:
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day. CoRR abs/2306.00890 (2023) - Bo Li, Yuanhan Zhang, Liangyu Chen, Jinghao Wang, Fanyi Pu, Jingkang Yang, Chunyuan Li, Ziwei Liu:
MIMIC-IT: Multi-Modal In-Context Instruction Tuning. CoRR abs/2306.05425 (2023) - Chunyuan Li:
Large Multimodal Models: Notes on CVPR 2023 Tutorial. CoRR abs/2306.14895 (2023) - Feng Li, Hao Zhang, Peize Sun, Xueyan Zou, Shilong Liu, Jianwei Yang, Chunyuan Li, Lei Zhang, Jianfeng Gao:
Semantic-SAM: Segment and Recognize Anything at Any Granularity. CoRR abs/2307.04767 (2023) - Bo Li, Haotian Liu, Liangyu Chen, Yong Jae Lee, Chunyuan Li, Ziwei Liu:
Benchmarking and Analyzing Generative Data for Visual Recognition. CoRR abs/2307.13697 (2023) - Yadong Lu, Chunyuan Li, Haotian Liu, Jianwei Yang, Jianfeng Gao, Yelong Shen:
An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models. CoRR abs/2309.09958 (2023) - Chunyuan Li, Zhe Gan, Zhengyuan Yang, Jianwei Yang, Linjie Li, Lijuan Wang, Jianfeng Gao:
Multimodal Foundation Models: From Specialists to General-Purpose Assistants. CoRR abs/2309.10020 (2023) - Zhiqing Sun, Sheng Shen, Shengcao Cao, Haotian Liu, Chunyuan Li, Yikang Shen, Chuang Gan, Liang-Yan Gui, Yu-Xiong Wang, Yiming Yang, Kurt Keutzer, Trevor Darrell:
Aligning Large Multimodal Models with Factually Augmented RLHF. CoRR abs/2309.14525 (2023) - Pan Lu, Hritik Bansal, Tony Xia, Jiacheng Liu, Chunyuan Li, Hannaneh Hajishirzi, Hao Cheng, Kai-Wei Chang, Michel Galley, Jianfeng Gao:
MathVista: Evaluating Math Reasoning in Visual Contexts with GPT-4V, Bard, and Other Large Multimodal Models. CoRR abs/2310.02255 (2023) - Haotian Liu, Chunyuan Li, Yuheng Li, Yong Jae Lee:
Improved Baselines with Visual Instruction Tuning. CoRR abs/2310.03744 (2023) - Yu Gu, Jianwei Yang, Naoto Usuyama, Chunyuan Li, Sheng Zhang, Matthew P. Lungren, Jianfeng Gao, Hoifung Poon:
BiomedJourney: Counterfactual Biomedical Image Generation by Instruction-Learning from Multimodal Patient Journeys. CoRR abs/2310.10765 (2023) - Jianwei Yang, Hao Zhang, Feng Li, Xueyan Zou, Chunyuan Li, Jianfeng Gao:
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V. CoRR abs/2310.11441 (2023) - Liangyu Chen, Bo Li, Sheng Shen, Jingkang Yang, Chunyuan Li, Kurt Keutzer, Trevor Darrell, Ziwei Liu:
Large Language Models are Visual Reasoning Coordinators. CoRR abs/2310.15166 (2023) - Wei-Ge Chen, Irina Spiridonova, Jianwei Yang, Jianfeng Gao, Chunyuan Li:
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing. CoRR abs/2311.00571 (2023) - Zalan Fabian, Zhongqi Miao, Chunyuan Li, Yuanhan Zhang, Ziwei Liu, Andrés Hernández, Andrés Montes-Rojas, Rafael Escucha, Laura Siabatto, Andrés Link, Pablo Arbeláez, Rahul Dodhia, Juan Lavista Ferres:
Multimodal Foundation Models for Zero-shot Animal Species Recognition in Camera Trap Images. CoRR abs/2311.01064 (2023)