


default search action
Yaodong Yang 0001
Person information
- unicode name: 杨耀东
- affiliation: Peking University, Institute for AI, Beijing, China
- affiliation (former): King's College London, UK
- affiliation (former): Huawei Technologies, Noah's Ark Lab, UK
- affiliation (PhD): University College London, UK
Other persons with the same name
- Yaodong Yang
- Yaodong Yang 0002
— Chinese University of Hong Kong, Hong Kong (and 2 more) - Yaodong Yang 0003 — University of Nebraska - Lincoln, USA
- Yaodong Yang 0004 — University of Science and Technology Beijing, Beijing, China
- Yaodong Yang 0005 — Hefei University of Technology, School of Mathematics, China
- Adam Yang — disambiguation page
- Adam X. Yang (aka: Adam Yang 0002) — University of Bristol, Department of Computer Science, UK
- Adam Yang 0003 — University of Maryland, Department of Computer Science, USA
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2025
[j29]Jun Gao
, Junlin Cui, Huijia Wu, Liuyu Xiang, Han Zhao, Xiangang Li, Meng Fang, Yaodong Yang, Zhaofeng He
:
Can large language models independently complete tasks? A dynamic evaluation framework for multi-turn task planning and completion. Neurocomputing 638: 130135 (2025)
[j28]Yiran Geng, Jiaming Ji, Yuanpei Chen, Haoran Geng, Fangwei Zhong, Yaodong Yang
:
ReDMan: reliable dexterous manipulation with safe reinforcement learning. Mach. Learn. 114(8): 189 (2025)
[j27]Mingxiao Feng
, Yaodong Yang
, Wengang Zhou, Houqiang Li:
TIMAR: Transition-informed representation for sample-efficient multi-agent reinforcement learning. Neural Networks 184: 107081 (2025)
[j26]Simin Li
, Jun Guo
, Jingqiao Xiu, Yuwei Zheng, Pu Feng, Xin Yu, Jiakai Wang
, Aishan Liu, Yaodong Yang, Bo An, Wenjun Wu, Xianglong Liu:
Attacking cooperative multi-agent reinforcement learning by adversarial minority influence. Neural Networks 191: 107747 (2025)
[j25]Zihao Wang
, Shaofei Cai
, Anji Liu, Yonggang Jin, Jinbing Hou, Bowei Zhang
, Haowei Lin, Zhaofeng He
, Zilong Zheng, Yaodong Yang
, Xiaojian Ma
, Yitao Liang
:
JARVIS-1: Open-World Multi-Task Agents With Memory-Augmented Multimodal Language Models. IEEE Trans. Pattern Anal. Mach. Intell. 47(3): 1894-1907 (2025)
[j24]Hongsong Tang
, Yingzhuo Liu, Letian Ni, Liuyu Xiang
, Yaodong Yang
, Ke Bi
, Zhaofeng He
:
Distributed Policy Space Response Oracles in Two-Player Zero-Sum Games. IEEE Trans. Neural Networks Learn. Syst. 36(6): 9893-9904 (2025)
[j23]Simin Li
, Ruixiao Xu
, Jingqiao Xiu
, Yuwei Zheng, Pu Feng
, Yuqing Ma
, Bo An
, Yaodong Yang
, Xianglong Liu
:
Robust Multi-Agent Reinforcement Learning by Mutual Information Regularization. IEEE Trans. Neural Networks Learn. Syst. 36(10): 18118-18132 (2025)
[c101]Fengshuo Bai, Runze Liu, Yali Du, Ying Wen, Yaodong Yang:
RAT: Adversarial Attacks on Deep Reinforcement Agents for Targeted Behaviors. AAAI 2025: 15453-15461
[c100]Wenzhe Fan, Zishun Yu, Chengdong Ma, Changye Li, Yaodong Yang, Xinhua Zhang:
Towards Efficient Collaboration via Graph Modeling in Reinforcement Learning. AAAI 2025: 16505-16513
[c99]Xiaoyuan Zhang, Xinyan Cai, Bo Liu, Weidong Huang, Song-Chun Zhu, Siyuan Qi, Yaodong Yang:
Differentiable Information Enhanced Model-Based Reinforcement Learning. AAAI 2025: 22605-22613
[c98]Hantao Lou, Jiaming Ji, Kaile Wang, Yaodong Yang:
Stream Aligner: Efficient Sentence-Level Alignment via Distribution Induction. AAAI 2025: 27500-27508
[c97]Jiayi Zhou, Jiaming Ji, Josef Dai, Yaodong Yang:
Sequence to Sequence Reward Modeling: Improving RLHF by Language Feedback. AAAI 2025: 27765-27773
[c96]Chi-Min Chan, Chunpu Xu, Junqi Zhu, Jiaming Ji, Donghai Hong, Pengcheng Wen, Chunyang Jiang, Zhen Ye, Yaodong Yang, Wei Xue, Sirui Han, Yike Guo:
Boosting Policy and Process Reward Models with Monte Carlo Tree Search in Open-Domain QA. ACL (Findings) 2025: 7433-7451
[c95]Chuxue Cao, Han Zhu, Jiaming Ji, Qichao Sun, Zhenghao Zhu, Yinyu Wu, Josef Dai, Yaodong Yang, Sirui Han, Yike Guo:
SafeLawBench: Towards Safe Alignment of Large Language Models. ACL (Findings) 2025: 14015-14048
[c94]Tianyi Alex Qiu, Fanzhi Zeng, Jiaming Ji, Dong Yan, Kaile Wang, Jiayi Zhou, Yang Han, Josef Dai, Xuehai Pan, Yaodong Yang:
Reward Generalization in RLHF: A Topological Perspective. ACL (Findings) 2025: 15884-15930
[c93]Chengyi Ju, Weijie Shi, Chengzhong Liu, Jiaming Ji, Jipeng Zhang, Ruiyuan Zhang, Jiajie Xu, Yaodong Yang, Sirui Han, Yike Guo:
Benchmarking Multi-National Value Alignment for Large Language Models. ACL (Findings) 2025: 20042-20058
[c92]Jiaming Ji, Kaile Wang, Tianyi Alex Qiu, Boyuan Chen, Jiayi Zhou, Changye Li, Hantao Lou, Josef Dai, Yunhuai Liu, Yaodong Yang:
Language Models Resist Alignment: Evidence From Data Compression. ACL (1) 2025: 23411-23432
[c91]Jiaming Ji, Donghai Hong, Borong Zhang, Boyuan Chen, Josef Dai, Boren Zheng, Tianyi Alex Qiu, Jiayi Zhou, Kaile Wang, Boxun Li, Sirui Han, Yike Guo, Yaodong Yang:
PKU-SafeRLHF: Towards Multi-Level Safety Alignment for LLMs with Human Preference. ACL (1) 2025: 31983-32016
[c90]Zhaowei Zhang, Fengshuo Bai, Mingzhi Wang, Haoyang Ye, Chengdong Ma, Yaodong Yang:
Roadmap on Incentive Compatibility for AI Alignment and Governance in Sociotechnical Systems. AGI (2) 2025: 370-380
[c89]Zhaowei Zhang, Ceyao Zhang, Nian Liu, Siyuan Qi, Ziqi Rong, Song-Chun Zhu, Yaodong Yang:
Heterogeneous Value Alignment Evaluation for Large Language Models. AGI (2) 2025: 381-392
[c88]Qinghao Wang, Jinyang Jiang
, Xiaotian Liu, Tao Ren, Yi Zheng, Cheng Zhang, Yaodong Yang, Yijie Peng:
A Unified Framework for Multi-Stage Decision Optimization with Deep Reinforcement Learning and Foundation Models. CASE 2025: 1975-1980
[c87]Juntao Dai, Taiye Chen, Yaodong Yang, Qian Zheng, Gang Pan:
Mitigating Reward Over-Optimization in RLHF via Behavior-Supported Regularization. ICLR 2025
[c86]Siyuan Qi, Bangcheng Yang, Kailin Jiang, Xiaobo Wang, Jiaqi Li, Yifan Zhong, Yaodong Yang, Zilong Zheng:
In-Context Editing: Learning Knowledge from Self-Induced Distributions. ICLR 2025
[c85]Mingzhi Wang, Chengdong Ma, Qizhi Chen, Linjian Meng, Yang Han, Jiancong Xiao, Zhaowei Zhang, Jing Huo, Weijie J. Su, Yaodong Yang:
Magnetic Preference Optimization: Achieving Last-iterate Convergence for Language Model Alignment. ICLR 2025
[c84]Rui Ye, Jingyi Chai, Xiangrui Liu, Yaodong Yang, Yanfeng Wang, Siheng Chen:
Emerging Safety Attack and Defense in Federated Instruction Tuning of Large Language Models. ICLR 2025
[c83]Zhaowei Zhang, Fengshuo Bai, Qizhi Chen, Chengdong Ma, Mingzhi Wang, Haoran Sun, Zilong Zheng, Yaodong Yang:
Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs. ICLR 2025
[c82]Haojun Chen, Minghao Liu, Chengdong Ma, Xiaojian Ma, Zailin Ma, Huimin Wu, Yuanpei Chen, Yifan Zhong, Mingzhi Wang, Qing Li, Yaodong Yang:
Falcon: Fast Visuomotor Policies via Partial Denoising. ICML 2025
[c81]Hantao Lou, Changye Li, Jiaming Ji, Yaodong Yang:
SAE-V: Interpreting Multimodal Models for Enhanced Alignment. ICML 2025
[c80]Zhiyu Zhao, Chengdong Ma, Qirui Mi, Ning Yang, Xue Yan, Mengyue Yang, Haifeng Zhang, Jun Wang, Yaodong Yang:
Mean Field Correlated Imitation Learning. AAMAS 2025: 2364-2372
[c79]Zhixun Chen, Zijing Shi, Yaodong Yang, Meng Fang, Yali Du:
Hierarchical Multi-Agent Framework for Dynamic Macroeconomic Modelling Using Large Language Models. AAMAS 2025: 2460-2462
[c78]Qinghao Wang, Yaodong Yang:
Carbon Trading Supply Chain Management Based on Constrained Deep Reinforcement Learning. AAMAS 2025: 3083-3086
[c77]Jinzhou Li, Tianhao Wu, Jiyao Zhang, Zeyuan Chen, Haotian Jin, Mingdong Wu, Yujun Shen, Yaodong Yang, Hao Dong:
Adaptive Visuo-Tactile Fusion with Predictive Force Attention for Dexterous Manipulation. IROS 2025: 3232-3239
[i168]Dongge Wang, Xiang Yan, Zehao Dou, Wenhan Huang, Yaodong Yang, Xiaotie Deng:
Approximating N-Player Nash Equilibrium through Gradient Descent. CoRR abs/2501.03001 (2025)
[i167]Hantao Lou, Jiaming Ji, Kaile Wang, Yaodong Yang:
Stream Aligner: Efficient Sentence-Level Alignment via Distribution Induction. CoRR abs/2501.05336 (2025)
[i166]Haotian Xu, Xing Wu, Weinong Wang, Zhongzhi Li, Da Zheng, Boyuan Chen, Yi Hu, Shijia Kang, Jiaming Ji, Yingying Zhang, Zhijiang Guo, Yaodong Yang, Muhan Zhang, Debing Zhang:
RedStar: Does Scaling Long-CoT Data Unlock Better Slow-Reasoning Systems? CoRR abs/2501.11284 (2025)
[i165]Yan Yu, Wengang Zhou, Yaodong Yang, Wanxuan Lu, Yingyan Hou, Houqiang Li:
Model Evolution Framework with Genetic Algorithm for Multi-Task Reinforcement Learning. CoRR abs/2502.13569 (2025)
[i164]Hantao Lou, Changye Li, Jiaming Ji, Yaodong Yang:
SAE-V: Interpreting Multimodal Models for Enhanced Alignment. CoRR abs/2502.17514 (2025)
[i163]Fengshuo Bai, Yu Li, Jie Chu, Tawei Chou, Runchuan Zhu, Ying Wen, Yaodong Yang, Yuanpei Chen:
Retrieval Dexterity: Efficient Object Retrieval in Clutters with Dexterous Hand. CoRR abs/2502.18423 (2025)
[i162]Zhaowei Zhang, Fengshuo Bai, Qizhi Chen, Chengdong Ma, Mingzhi Wang, Haoran Sun, Zilong Zheng, Yaodong Yang:
Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs. CoRR abs/2502.19148 (2025)
[i161]Yifan Zhong, Xuchuan Huang, Ruochong Li, Ceyao Zhang, Yitao Liang, Yaodong Yang, Yuanpei Chen:
DexGraspVLA: A Vision-Language-Action Framework Towards General Dexterous Grasping. CoRR abs/2502.20900 (2025)
[i160]Haojun Chen, Minghao Liu, Xiaojian Ma, Zailin Ma, Huimin Wu
, Chengdong Ma, Yuanpei Chen, Yifan Zhong, Mingzhi Wang, Qing Li, Yaodong Yang:
Falcon: Fast Visuomotor Policies via Partial Denoising. CoRR abs/2503.00339 (2025)
[i159]Xiaoyuan Zhang, Xinyan Cai, Bo Liu, Weidong Huang, Song-Chun Zhu, Siyuan Qi, Yaodong Yang:
Differentiable Information Enhanced Model-Based Reinforcement Learning. CoRR abs/2503.01178 (2025)
[i158]Borong Zhang, Yuhao Zhang
, Jiaming Ji, Yingshan Lei, Josef Dai, Yuanpei Chen, Yaodong Yang:
SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Safe Reinforcement Learning. CoRR abs/2503.03480 (2025)
[i157]Pengcheng Wen, Jiaming Ji, Chi-Min Chan, Juntao Dai, Donghai Hong, Yaodong Yang, Sirui Han, Yike Guo
:
ThinkPatterns-21k: A Systematic Study on the Impact of Thinking Patterns in LLMs. CoRR abs/2503.12918 (2025)
[i156]Jiaming Ji, Xinyu Chen, Rui Pan, Han Zhu
, Conghui Zhang, Jiahao Li, Donghai Hong, Boyuan Chen, Jiayi Zhou, Kaile Wang, Juntao Dai, Chi-Min Chan, Sirui Han, Yike Guo
, Yaodong Yang:
Safe RLHF-V: Safe Reinforcement Learning from Human Feedback in Multimodal Large Language Models. CoRR abs/2503.17682 (2025)
[i155]Juntao Dai, Taiye Chen, Yaodong Yang, Qian Zheng, Gang Pan:
Mitigating Reward Over-Optimization in RLHF via Behavior-Supported Regularization. CoRR abs/2503.18130 (2025)
[i154]Yuhan Wang, Yu Li, Yaodong Yang, Yuanpei Chen:
Dexterous Non-Prehensile Manipulation for Ungraspable Object via Extrinsic Dexterity. CoRR abs/2503.23120 (2025)
[i153]Weijie Shi, Chengyi Ju, Chengzhong Liu, Jiaming Ji, Jipeng Zhang, Ruiyuan Zhang, Jia Zhu, Jiajie Xu, Yaodong Yang, Sirui Han, Yike Guo:
Benchmarking Multi-National Value Alignment for Large Language Models. CoRR abs/2504.12911 (2025)
[i152]Kun Wang, Guibin Zhang, Zhenhong Zhou, Jiahao Wu, Miao Yu, Shiqian Zhao, Chenlong Yin, Jinhu Fu, Yibo Yan, Hanjun Luo, Liang Lin, Zhihao Xu, Haolang Lu, Xinye Cao, Xinyun Zhou, Weifei Jin, Fanci Meng, Junyuan Mao, Hao Wu, Minghe Wang, Fan Zhang, Junfeng Fang, Chengwei Liu, Yifan Zhang, Qiankun Li, Chongye Guo, Yalan Qin, Yi Ding, Donghai Hong, Jiaming Ji, Xinfeng Li, Yifan Jiang, Dongxia Wang, Yihao Huang, Yufei Guo, Jen-tse Huang, Yanwei Yue, Wenke Huang, Guancheng Wan
, Tianlin Li, Lei Bai, Jie Zhang, Qing Guo, Jingyi Wang, Tianlong Chen, Joey Tianyi Zhou, Xiaojun Jia, Weisong Sun, Cong Wu, Jing Chen, Xuming Hu, Yiming Li, Xiao Wang, Ningyu Zhang, Luu Anh Tuan, Guowen Xu, Tianwei Zhang, Xingjun Ma, Xiang Wang, Bo An, Jun Sun, Mohit Bansal, Shirui Pan, Yuval Elovici, Bhavya Kailkhura
, Bo Li, Yaodong Yang, Hongwei Li, Wenyuan Xu, Yizhou Sun, Wei Wang, Qing Li, Ke Tang, Yu-Gang Jiang, Felix Juefei-Xu, Hui Xiong, Xiaofeng Wang, Shuicheng Yan, Dacheng Tao, Philip S. Yu, Qingsong Wen, Yang Liu:
A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment. CoRR abs/2504.15585 (2025)
[i151]Yi Zeng, Feifei Zhao, Yuwei Wang, Enmeng Lu, Yaodong Yang, Lei Wang, Chao Liu, Yitao Liang, Dongcheng Zhao, Bing Han, Haibo Tong, Yao Liang, Dongqi Liang, Kang Sun, Boyuan Chen, Jinyu Fan:
Redefining Superalignment: From Weak-to-Strong Alignment to Human-AI Co-Alignment to Sustainable Symbiotic Society. CoRR abs/2504.17404 (2025)
[i150]Chi-Min Chan, Chunpu Xu, Jiaming Ji, Zhen Ye, Pengcheng Wen, Chunyang Jiang, Yaodong Yang, Wei Xue, Sirui Han, Yike Guo:
J1: Exploring Simple Test-Time Scaling for LLM-as-a-Judge. CoRR abs/2505.11875 (2025)
[i149]Jinzhou Li, Tianhao Wu, Jiyao Zhang, Zeyuan Chen, Haotian Jin, Mingdong Wu, Yujun Shen, Yaodong Yang, Hao Dong:
Adaptive Visuo-Tactile Fusion with Predictive Force Attention for Dexterous Manipulation. CoRR abs/2505.13982 (2025)
[i148]Jiayi Zhou, Jiaming Ji, Boyuan Chen, Jiapeng Sun, Wenqi Chen, Donghai Hong, Sirui Han, Yike Guo, Yaodong Yang:
Generative RLHF-V: Learning Principles from Multi-modal Human Preference. CoRR abs/2505.18531 (2025)
[i147]Jiaming Ji, Wenqi Chen, Kaile Wang, Donghai Hong, Sitong Fang, Boyuan Chen, Jiayi Zhou, Juntao Dai, Sirui Han, Yike Guo, Yaodong Yang:
Mitigating Deceptive Alignment via Self-Monitoring. CoRR abs/2505.18807 (2025)
[i146]Zhaowei Zhang, Minghua Yi, Mengmeng Wang, Fengshuo Bai, Zilong Zheng, Yipeng Kang, Yaodong Yang:
EuroCon: Benchmarking Parliament Deliberation for Political Consensus Finding. CoRR abs/2505.19558 (2025)
[i145]Jiaming Ji, Sitong Fang, Wenjing Cao, Jiahao Li, Xuyao Wang, Juntao Dai, Chi-Min Chan, Sirui Han, Yike Guo, Yaodong Yang:
The Mirage of Multimodality: Where Truth is Tested and Honesty Unravels. CoRR abs/2505.20214 (2025)
[i144]Lijun Zhang, Lin Li, Yajie Qi, Huizhong Song, Yaodong Yang, Jun Wang, Wei Wei:
Risk-aware Direct Preference Optimization under Nested Risk Measure. CoRR abs/2505.20359 (2025)
[i143]Yuanfei Wang, Xinju Huang, Fangwei Zhong, Yaodong Yang, Yizhou Wang, Yuanpei Chen, Hao Dong:
From Strangers to Assistants: Fast Desire Alignment for Embodied Agent-User Adaptation. CoRR abs/2505.22503 (2025)
[i142]Boyuan Chen, Donghai Hong, Jiaming Ji, Jiacheng Zheng, Bowen Dong, Jiayi Zhou, Kaile Wang, Juntao Dai, Xuyao Wang, Wenqi Chen, Qirui Zheng, Wenxin Li, Sirui Han, Yike Guo, Yaodong Yang:
InterMT: Multi-Turn Interleaved Preference Alignment with Human Feedback. CoRR abs/2505.23950 (2025)
[i141]Fanqi Kong, Weiqin Zu, Xinyu Chen, Yaodong Yang, Song-Chun Zhu, Xue Feng:
SIV-Bench: A Video Benchmark for Social Interaction Understanding and Reasoning. CoRR abs/2506.05425 (2025)
[i140]Chuxue Cao, Han Zhu
, Jiaming Ji, Qichao Sun, Zhenghao Zhu, Yinyu Wu, Juntao Dai, Yaodong Yang, Sirui Han, Yike Guo:
SafeLawBench: Towards Safe Alignment of Large Language Models. CoRR abs/2506.06636 (2025)
[i139]Guoxi Zhang, Jiawei Chen, Tianzhuo Yang, Jiaming Ji, Yaodong Yang, Juntao Dai:
A Game-Theoretic Negotiation Framework for Cross-Cultural Consensus in LLMs. CoRR abs/2506.13245 (2025)
[i138]Zeyuan Chen, Qiyang Yan, Yuanpei Chen, Tianhao Wu, Jiyao Zhang, Zihan Ding, Jinzhou Li, Yaodong Yang, Hao Dong:
ClutterDexGrasp: A Sim-to-Real System for General Dexterous Grasping in Cluttered Scenes. CoRR abs/2506.14317 (2025)
[i137]Yifan Zhong, Fengshuo Bai, Shaofei Cai, Xuchuan Huang, Zhang Chen, Xiaowei Zhang, Yuanfei Wang, Shaoyang Guo, Tianrui Guan, Ka Nam Lui, Zhiquan Qi, Yitao Liang, Yuanpei Chen, Yaodong Yang:
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective. CoRR abs/2507.01925 (2025)
[i136]Chuanhao Yan, Fengdi Che, Xuhan Huang, Xu Xu, Xin Li, Yizhi Li, Xingwei Qu, Jingzhe Shi, Zhuangzhuang He, Chenghua Lin, Yaodong Yang, Binhang Yuan, Hang Zhao, Yu Qiao, Bowen Zhou, Jie Fu:
Re:Form - Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny. CoRR abs/2507.16331 (2025)
[i135]Yi Ma, Hongyao Tang, Chenjun Xiao, Yaodong Yang, Wei Wei, Jianye Hao, Jiye Liang:
Scaling DRL for Decision Making: A Survey on Data, Network, and Training Budget Strategies. CoRR abs/2508.03194 (2025)
[i134]Yuchu Jiang, Jian Zhao, Yuchen Yuan, Tianle Zhang, Yao Huang, Yanghao Zhang, Yan Wang, Yanshu Li, Xizhong Guo, Yusheng Zhao, Jun Zhang, Zhi Zhang, Xiaojian Lin, Yixiu Zou, Haoxuan Ma, Yuhu Shang, Yuzhi Hu, Keshu Cai, Ruochen Zhang, Boyuan Chen, Yilan Gao, Ziheng Jiao, Yi Qin, Shuangjun Du, Xiao Tong, Zhekun Liu, Yu Chen, Xuankun Rong, Rui Wang, Yejie Zheng, Zhaoxin Fan, Murat Sensoy, Hongyuan Zhang, Pan Zhou, Lei Jin, Hao Zhao, Xu Yang, Jiaojiao Zhao, Jianshu Li, Joey Tianyi Zhou, Zhi-Qi Cheng, Longtao Huang, Zhiyi Liu, Zheng Zhu, Jianan Li, Gang Wang, Qi Li, Xu-Yao Zhang, Yaodong Yang, Mang Ye, Wenqi Ren, Zhaofeng He, Hang Su, Rongrong Ni, Liping Jing, Xingxing Wei, Junliang Xing, Massimo Alioto, Shengmei Shen, Petia Radeva, Dacheng Tao, Ya-Qin Zhang, Shuicheng Yan, Chi Zhang, Zhongjiang He, Xuelong Li:
Never Compromise to Vulnerabilities: A Comprehensive Survey on AI Governance. CoRR abs/2508.08789 (2025)
[i133]David Mguni, Yaqi Sun, Haojun Chen, Amir Darabi, Larry Olanrewaju Orimoloye, Yaodong Yang:
Fault Tolerant Multi-Agent Learning with Adversarial Budget Constraints. CoRR abs/2508.08800 (2025)
[i132]Yan Yu, Yaodong Yang, Zhengbo Lu, Chengdong Ma, Wengang Zhou, Houqiang Li:
Goal Discovery with Causal Capacity for Efficient Reinforcement Learning. CoRR abs/2508.09624 (2025)
[i131]Fanqi Kong, Xiaoyuan Zhang, Xinyu Chen, Yaodong Yang, Song-Chun Zhu, Xue Feng:
Enhancing LLM-Based Social Bot via an Adversarial Learning Framework. CoRR abs/2508.17711 (2025)
[i130]Simin Li, Yuwei Zheng, Zihao Mao, Linhao Wang, Ruixiao Xu, Chengdong Ma, Xin Yu, Yuqing Ma, Qi Dou, Xin Wang, Jie Luo, Bo An, Yaodong Yang, Weifeng Lv, Xianglong Liu:
Vulnerable Agent Identification in Large-Scale Multi-Agent Reinforcement Learning. CoRR abs/2509.15103 (2025)
[i129]Kefei Zhu, Fengshuo Bai, YuanHao Xiang, Yishuai Cai, Xinglin Chen, Ruochong Li, Xingtao Wang, Hao Dong, Yaodong Yang, Xiaopeng Fan
, Yuanpei Chen:
DexFlyWheel: A Scalable and Self-improving Data Generation Framework for Dexterous Manipulation. CoRR abs/2509.23829 (2025)
[i128]Jianing Guo, Zhenhong Wu, Chang Tu, Yiyao Ma, Xiangqi Kong, Zhiqian Liu, Jiaming Ji, Shuning Zhang, Yuanpei Chen, Kai Chen, Qi Dou, Yaodong Yang, Xianglong Liu, Huijie Zhao, Weifeng Lv, Simin Li:
On Robustness of Vision-Language-Action Model against Multi-Modal Perturbations. CoRR abs/2510.00037 (2025)
[i127]Simin Li, Zihao Mao, Hanxiao Li, Zonglei Jing, Zhuohang bian, Jun Guo, Li Wang, Zhuoran Han, Ruixiao Xu, Xin Yu, Chengdong Ma, Yuqing Ma, Bo An, Yaodong Yang, Weifeng Lv, Xianglong Liu:
Empirical Study on Robustness and Resilience in Cooperative Multi-Agent Reinforcement Learning. CoRR abs/2510.11824 (2025)
[i126]Han Zhu, Juntao Dai, Jiaming Ji, Haoran Li, Chengkun Cai, Pengcheng Wen, Chi-Min Chan, Boyuan Chen, Yaodong Yang, Sirui Han, Yike Guo:
SafeMT: Multi-turn Safety for Multimodal Language Models. CoRR abs/2510.12133 (2025)
[i125]Zailin Ma, Jiansheng Yang, Yaodong Yang:
On the Generalization Properties of Learning the Random Feature Models with Learnable Activation Functions. CoRR abs/2510.15327 (2025)
[i124]Xiaoyuan Zhang, Yizhe Huang, Chengdong Ma, Zhixun Chen, Long Ma, Yali Du, Song-Chun Zhu, Yaodong Yang, Xue Feng:
Social World Model-Augmented Mechanism Design Policy Learning. CoRR abs/2510.19270 (2025)
[i123]Xiaoyuan Zhang, Chengdong Ma, Yizhe Huang, Weidong Huang, Siyuan Qi, Song-Chun Zhu, Xue Feng, Yaodong Yang:
World Models Should Prioritize the Unification of Physical and Social Dynamics. CoRR abs/2510.21219 (2025)
[i122]Ruiyang Zhang, Jiahao Luo, Xiaoru Feng, Qiufan Pang, Yaodong Yang, Juntao Dai:
SafeEditor: Unified MLLM for Efficient Post-hoc T2I Safety Editing. CoRR abs/2510.24820 (2025)
[i121]Chengdong Ma, Théo Tao Zhaowei, Pengyu Li, Minghao Liu, Haojun Chen, Zihao Mao, Yuan Cheng, Yuan Qi, Yaodong Yang:
Finding Kissing Numbers with Game-theoretic Reinforcement Learning. CoRR abs/2511.13391 (2025)
[i120]Boyuan Chen, Sitong Fang, Jiaming Ji, Yanxu Zhu, Pengcheng Wen, Jinzhou Wu, Yingshui Tan, Boren Zheng, Mengying Yuan, Wenqi Chen, Donghai Hong, Alex Qiu, Xin Chen, Jiayi Zhou, Kaile Wang, Juntao Dai, Borong Zhang, Tianzhuo Yang, Saad Siddiqui, Isabella Duan, Yawen Duan, Brian Tse, Jen-Tse Huang, Kun Wang, Baihui Zheng, Jiaheng Liu, Jian Yang, Yiming Li, Wenting Chen, Dongrui Liu, Lukas Vierling, Zhiheng Xi, Haobo Fu, Wenxuan Wang, Jitao Sang, Zhengyan Shi, Chi-Min Chan, Eugenie Shi, Simin Li, Juncheng Li, Jian Yang, Wei Ji, Dong Li, Jinglin Yang, Jun Song, Yinpeng Dong, Jie Fu, Bo Zheng, Min Yang, Yike Guo, Philip Torr, Robert Trager, Yi Zeng, Zhongyuan Wang, Yaodong Yang, Tiejun Huang, Ya-Qin Zhang, Hongjiang Zhang, Andrew Yao:
AI Deception: Risks, Dynamics, and Controls. CoRR abs/2511.22619 (2025)- 2024
[j22]Qinghao Wang, Yaodong Yang:
Carbon trading supply chain management based on constrained deep reinforcement learning. Auton. Agents Multi Agent Syst. 38(2): 38 (2024)
[j21]Yifan Zhong, Jakub Grudzien Kuba, Xidong Feng, Siyi Hu, Jiaming Ji, Yaodong Yang:
Heterogeneous-Agent Reinforcement Learning. J. Mach. Learn. Res. 25: 32:1-32:67 (2024)
[j20]Dongzi Wang
, Fangwei Zhong
, Minglong Li, Muning Wen
, Yuanxi Peng, Teng Li
, Adam Yang
:
RoMAT: Role-based multi-agent transformer for generalizable heterogeneous cooperation. Neural Networks 174: 106129 (2024)
[j19]Jie Liu
, Yinmin Zhang
, Chuming Li, Yaodong Yang
, Yu Liu, Wanli Ouyang
:
Adaptive pessimism via target Q-value for offline reinforcement learning. Neural Networks 180: 106588 (2024)
[j18]Yuanpei Chen
, Yiran Geng
, Fangwei Zhong
, Jiaming Ji
, Jiechuang Jiang
, Zongqing Lu
, Hao Dong
, Yaodong Yang
:
Bi-DexHands: Towards Human-Level Bimanual Dexterous Manipulation. IEEE Trans. Pattern Anal. Mach. Intell. 46(5): 2804-2818 (2024)
[j17]Chenguang Wang
, Zhouliang Yu
, Stephen McAleer
, Tianshu Yu
, Yaodong Yang
:
ASP: Learn a Universal Neural Solver! IEEE Trans. Pattern Anal. Mach. Intell. 46(6): 4102-4114 (2024)
[j16]Yuyang Li
, Bo Liu
, Yiran Geng
, Puhao Li
, Yaodong Yang
, Yixin Zhu
, Tengyu Liu
, Siyuan Huang
:
Grasp Multiple Objects With One Hand. IEEE Robotics Autom. Lett. 9(5): 4027-4034 (2024)
[j15]Yang Li
, Fanglei Sun, Jingchen Hu, Chang Liu, Fan Wu, Kai Li, Ying Wen, Zheng Tian, Yaodong Yang
, Jiangcheng Zhu, Zhifeng Chen, Jun Wang, Yang Yang:
Self-Supervised MAFENN for Classifying Low-Labeled Distorted Images Over Mobile Fading Channels. IEEE Trans. Mob. Comput. 23(8): 8077-8091 (2024)
[j14]Jie Liu, Yinmin Zhang, Chuming Li, Zhiyuan You, Zhanhui Zhou, Chao Yang, Yaodong Yang, Yu Liu, Wanli Ouyang:
MaskMA: Towards Zero-Shot Multi-Agent Decision Making with Mask-Based Collaborative Learning. Trans. Mach. Learn. Res. 2024 (2024)
[c76]Yinmin Zhang, Jie Liu
, Chuming Li, Yazhe Niu, Yaodong Yang, Yu Liu, Wanli Ouyang
:
A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning. AAAI 2024: 16908-16916
[c75]Sirui Chen, Zhaowei Zhang, Yaodong Yang, Yali Du:
STAS: Spatial-Temporal Return Decomposition for Solving Sparse Rewards Problems in Multi-agent Reinforcement Learning. AAAI 2024: 17337-17345
[c74]Ceyao Zhang, Kaijie Yang, Siyi Hu
, Zihao Wang, Guanghe Li, Yihang Sun, Cheng Zhang, Zhaowei Zhang, Anji Liu, Song-Chun Zhu, Xiaojun Chang
, Junge Zhang, Feng Yin, Yitao Liang, Yaodong Yang:
ProAgent: Building Proactive Cooperative Agents with Large Language Models. AAAI 2024: 17591-17599
[c73]Shaoting Feng, Qinya Li, Yaodong Yang, Fan Wu, Guihai Chen:
GIPUT: Maximizing Photo Coverage Efficiency for UAV Trajectory. APWeb/WAIM (1) 2024: 391-406
[c72]Le Cong Dinh, David Henry Mguni, Long Tran-Thanh, Jun Wang, Yaodong Yang:
A Summary of Online Markov Decision Processes with Non-oblivious Strategic Adversary. AAMAS 2024: 2830-2832
[c71]Yuanpei Chen, Chen Wang, Yaodong Yang, C. Karen Liu:
Object-Centric Dexterous Manipulation from Human Motion Data. CoRL 2024: 3828-3846
[c70]Qianxu Wang, Congyue Deng, Tyler Ga Wei Lum, Yuanpei Chen, Yaodong Yang, Jeannette Bohg, Yixin Zhu, Leonidas J. Guibas:
Neural Attention Field: Emerging Point Relevance in 3D Scenes for One-Shot Dexterous Grasping. CoRL 2024: 4495-4508
[c69]Jieming Cui, Tengyu Liu, Nian Liu, Yaodong Yang, Yixin Zhu, Siyuan Huang:
AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents. CVPR 2024: 852-862
[c68]Weidong Huang, Jiaming Ji, Chunhe Xia, Borong Zhang, Yaodong Yang:
SafeDreamer: Safe Reinforcement Learning with World Models. ICLR 2024
[c67]Josef Dai, Xuehai Pan, Ruiyang Sun, Jiaming Ji, Xinbo Xu, Mickel Liu, Yizhou Wang, Yaodong Yang:
Safe RLHF: Safe Reinforcement Learning from Human Feedback. ICLR 2024
[c66]Simin Li, Jun Guo, Jingqiao Xiu, Ruixiao Xu, Xin Yu, Jiakai Wang, Aishan Liu, Yaodong Yang, Xianglong Liu:
Byzantine Robust Cooperative Multi-Agent Reinforcement Learning as a Bayesian Game. ICLR 2024
[c65]Jiarong Liu, Yifan Zhong, Siyi Hu, Haobo Fu, Qiang Fu, Xiaojun Chang, Yaodong Yang:
Maximum Entropy Heterogeneous-Agent Reinforcement Learning. ICLR 2024
[c64]Siyuan Qi, Shuo Chen, Yexin Li, Xiangyu Kong, Junqi Wang, Bangcheng Yang, Pring Wong, Yifan Zhong, Xiaoyuan Zhang, Zhaowei Zhang, Nian Liu, Yaodong Yang, Song-Chun Zhu:
CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents. ICLR 2024
[c63]Juntao Dai, Yaodong Yang, Qian Zheng, Gang Pan:
Safe Reinforcement Learning using Finite-Horizon Gradient-based Estimation. ICML 2024
[c62]Yizhe Huang, Anji Liu, Fanqi Kong, Yaodong Yang, Song-Chun Zhu, Xue Feng:
Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning. ICML 2024
[c61]Lirui Luo, Guoxi Zhang, Hongming Xu, Yaodong Yang, Cong Fang, Qing Li:
End-to-End Neuro-Symbolic Reinforcement Learning with Textual Explanations. ICML 2024
[c60]Ruiqing Chen, Xiaoyuan Zhang, Yali Du, Yifan Zhong, Zheng Tian, Fanglei Sun, Yaodong Yang:
Off-Agent Trust Region Policy Optimization. IJCAI 2024: 3798-3806
[c59]Yue Zhang, Yaodong Yang, Zhenbo Lu, Wengang Zhou, Houqiang Li:
Remember the Past for Better Future: Memory-Augmented Offline RL. IJCNN 2024: 1-8
[c58]Juntao Dai, Tianle Chen, Xuyao Wang, Ziran Yang, Taiye Chen, Jiaming Ji, Yaodong Yang:
SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset. NeurIPS 2024
[c57]Jiaming Ji, Boyuan Chen, Hantao Lou, Donghai Hong, Borong Zhang, Xuehai Pan, Tianyi Qiu, Juntao Dai, Yaodong Yang:
Aligner: Efficient Alignment by Learning to Correct. NeurIPS 2024
[c56]Tianyi Qiu, Yang Zhang, Xuchuan Huang, Jasmine Xinze Li, Jiaming Ji, Yaodong Yang:
ProgressGym: Alignment with a Millennium of Moral Progress. NeurIPS 2024
[c55]Lijun Zhang, Lin Li, Wei Wei, Huizhong Song, Yaodong Yang, Jiye Liang:
Scalable Constrained Policy Optimization for Safe Multi-agent Reinforcement Learning. NeurIPS 2024
[c54]Yifan Zhong, Chengdong Ma, Xiaoyuan Zhang, Ziran Yang, Haojun Chen, Qingfu Zhang, Siyuan Qi, Yaodong Yang:
Panacea: Pareto Alignment via Preference Adaptation for LLMs. NeurIPS 2024
[i119]Siyuan Qi, Shuo Chen, Yexin Li, Xiangyu Kong, Junqi Wang, Bangcheng Yang
, Pring Wong, Yifan Zhong, Xiaoyuan Zhang, Zhaowei Zhang, Nian Liu, Wei Wang, Yaodong Yang, Song-Chun Zhu:
CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents. CoRR abs/2401.10568 (2024)
[i118]Yifan Zhong, Chengdong Ma, Xiaoyuan Zhang, Ziran Yang, Haojun Chen, Qingfu Zhang, Siyuan Qi, Yaodong Yang
:
Panacea: Pareto Alignment via Preference Adaptation for LLMs. CoRR abs/2402.02030 (2024)
[i117]Jiaming Ji, Boyuan Chen, Hantao Lou, Donghai Hong, Borong Zhang, Xuehai Pan, Juntao Dai, Yaodong Yang
:
Aligner: Achieving Efficient Alignment through Weak-to-Strong Correction. CoRR abs/2402.02416 (2024)
[i116]Tianyi Qiu, Fanzhi Zeng, Jiaming Ji, Dong Yan, Kaile Wang, Jiayi Zhou, Han Yang, Josef Dai, Xuehai Pan, Yaodong Yang:
Rethinking Information Structures in RLHF: Reward Generalization from a Graph Theory Perspective. CoRR abs/2402.10184 (2024)
[i115]Zhaowei Zhang, Fengshuo Bai, Mingzhi Wang
, Haoyang Ye
, Chengdong Ma, Yaodong Yang:
Incentive Compatibility for AI Alignment in Sociotechnical Systems: Positions and Prospects. CoRR abs/2402.12907 (2024)
[i114]Naming Liu, Mingzhi Wang, Youzhi Zhang, Yaodong Yang, Bo An, Ying Wen:
Leveraging Team Correlation for Approximating Equilibrium in Two-Team Zero-Sum Games. CoRR abs/2403.00255 (2024)
[i113]Tianhao Wu, Yunchong Gan, Mingdong Wu, Jingbo Cheng, Yaodong Yang, Yixin Zhu, Hao Dong:
UniDexFPM: Universal Dexterous Functional Pre-grasp Manipulation Via Diffusion Policy. CoRR abs/2403.12421 (2024)
[i112]Lirui Luo, Guoxi Zhang, Hongming Xu, Yaodong Yang, Cong Fang, Qing Li:
INSIGHT: End-to-End Neuro-Symbolic Visual Reinforcement Learning with Language Explanations. CoRR abs/2403.12451 (2024)
[i111]Jieming Cui, Tengyu Liu, Nian Liu, Yaodong Yang, Yixin Zhu, Siyuan Huang:
AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents. CoRR abs/2403.12835 (2024)
[i110]Zhiyu Zhao, Ning Yang, Xue Yan, Haifeng Zhang, Jun Wang, Yaodong Yang:
Correlated Mean Field Imitation Learning. CoRR abs/2404.09324 (2024)
[i109]Fengshuo Bai, Rui Zhao, Hongming Zhang, Sijia Cui, Ying Wen, Yaodong Yang, Bo Xu, Lei Han:
Efficient Preference-based Reinforcement Learning via Aligned Experience Estimation. CoRR abs/2405.18688 (2024)
[i108]Fengshuo Bai, Mingzhi Wang
, Zhaowei Zhang, Boyuan Chen, Yinda Xu, Ying Wen, Yaodong Yang:
Efficient Model-agnostic Alignment via Bayesian Persuasion. CoRR abs/2405.18718 (2024)
[i107]Jiesong Lian, Yucong Huang, Mingzhi Wang, Chengdong Ma, Yixue Hao, Ying Wen, Yaodong Yang:
Fusion-PSRO: Nash Policy Fusion for Policy Space Response Oracles. CoRR abs/2405.21027 (2024)
[i106]Jiaming Ji, Kaile Wang, Tianyi Qiu, Boyuan Chen, Jiayi Zhou, Changye Li, Hantao Lou, Yaodong Yang:
Language Models Resist Alignment. CoRR abs/2406.06144 (2024)
[i105]Yizhe Huang, Anji Liu, Fanqi Kong, Yaodong Yang, Song-Chun Zhu, Xue Feng:
Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning. CoRR abs/2406.08002 (2024)
[i104]Rui Ye, Jingyi Chai, Xiangrui Liu, Yaodong Yang, Yanfeng Wang, Siheng Chen:
Emerging Safety Attack and Defense in Federated Instruction Tuning of Large Language Models. CoRR abs/2406.10630 (2024)
[i103]Siyuan Qi, Bangcheng Yang
, Kailin Jiang, Xiaobo Wang, Jiaqi Li, Yifan Zhong, Yaodong Yang, Zilong Zheng:
In-Context Editing: Learning Knowledge from Self-Induced Distributions. CoRR abs/2406.11194 (2024)
[i102]Josef Dai, Tianle Chen, Xuyao Wang, Ziran Yang, Taiye Chen, Jiaming Ji, Yaodong Yang:
SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset. CoRR abs/2406.14477 (2024)
[i101]Jiaming Ji, Donghai Hong, Borong Zhang, Boyuan Chen, Josef Dai, Boren Zheng, Tianyi Qiu, Boxun Li, Yaodong Yang:
PKU-SafeRLHF: A Safety Alignment Preference Dataset for Llama Family Models. CoRR abs/2406.15513 (2024)
[i100]Tianyi Qiu, Yang Zhang, Xuchuan Huang, Jasmine Xinze Li, Jiaming Ji, Yaodong Yang:
ProgressGym: Alignment with a Millennium of Moral Progress. CoRR abs/2406.20087 (2024)
[i99]Ruize Zhang, Zelai Xu, Chengdong Ma, Chao Yu, Wei-Wei Tu, Shiyu Huang, Deheng Ye, Wenbo Ding, Yaodong Yang, Yu Wang:
A Survey on Self-play Methods in Reinforcement Learning. CoRR abs/2408.01072 (2024)
[i98]Jiayi Zhou, Jiaming Ji, Juntao Dai, Yaodong Yang:
Sequence to Sequence Reward Modeling: Improving RLHF by Language Feedback. CoRR abs/2409.00162 (2024)
[i97]Naming Liu, Mingzhi Wang
, Xihuai Wang, Weinan Zhang, Yaodong Yang, Youzhi Zhang, Bo An, Ying Wen:
Computing Ex Ante Equilibrium in Heterogeneous Zero-Sum Team Games. CoRR abs/2410.01575 (2024)
[i96]Wenzhe Fan, Zishun Yu, Chengdong Ma, Changye Li, Yaodong Yang, Xinhua Zhang:
Towards Efficient Collaboration via Graph Modeling in Reinforcement Learning. CoRR abs/2410.15841 (2024)
[i95]Mingzhi Wang
, Chengdong Ma, Qizhi Chen, Linjian Meng, Yang Han, Jiancong Xiao, Zhaowei Zhang, Jing Huo, Weijie J. Su
, Yaodong Yang:
Magnetic Preference Optimization: Achieving Last-iterate Convergence for Language Models Alignment. CoRR abs/2410.16714 (2024)
[i94]Qianxu Wang, Congyue Deng, Tyler Ga Wei Lum, Yuanpei Chen, Yaodong Yang, Jeannette Bohg, Yixin Zhu, Leonidas J. Guibas:
Neural Attention Field: Emerging Point Relevance in 3D Scenes for One-Shot Dexterous Grasping. CoRR abs/2410.23039 (2024)
[i93]Xiaohang Tang, Chiyuan Wang, Chengdong Ma, Ilija Bogunovic, Stephen McAleer, Yaodong Yang:
Sample-Efficient Regret-Minimizing Double Oracle in Extensive-Form Games. CoRR abs/2411.00954 (2024)
[i92]Yuanpei Chen, Chen Wang, Yaodong Yang, C. Karen Liu:
Object-Centric Dexterous Manipulation from Human Motion Data. CoRR abs/2411.04005 (2024)
[i91]Nian Liu, Libin Liu, Zilong Zhang, Zi Wang, Hongzhao Xie, Tengyu Liu, Xinyi Tong, Yaodong Yang, Zhaofeng He:
Learning Uniformly Distributed Embedding Clusters of Stylistic Skills for Physically Simulated Characters. CoRR abs/2411.06459 (2024)
[i90]Zailin Ma, Jiansheng Yang, Yaodong Yang:
Random Feature Models with Learnable Activation Functions. CoRR abs/2411.19468 (2024)
[i89]Fengshuo Bai, Runze Liu, Yali Du, Ying Wen, Yaodong Yang:
RAT: Adversarial Attacks on Deep Reinforcement Agents for Targeted Behaviors. CoRR abs/2412.10713 (2024)
[i88]Juntao Dai, Yaodong Yang, Qian Zheng, Gang Pan:
Safe Reinforcement Learning using Finite-Horizon Gradient-based Estimation. CoRR abs/2412.11138 (2024)
[i87]Jiaming Ji, Jiayi Zhou, Hantao Lou, Boyuan Chen, Donghai Hong, Xuyao Wang, Wenqi Chen, Kaile Wang, Rui Pan, Jiahao Li, Mohan Wang, Josef Dai, Tianyi Qiu, Hua Xu, Dong Li, Weipeng Chen, Jun Song, Bo Zheng, Yaodong Yang:
Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback. CoRR abs/2412.15838 (2024)
[i86]Haonan Li, Xudong Han, Zenan Zhai, Honglin Mu, Hao Wang, Zhenxuan Zhang, Yilin Geng, Shom Lin, Renxi Wang, Artem Shelmanov, Xiangyu Qi, Yuxia Wang, Donghai Hong, Youliang Yuan, Meng Chen, Haoqin Tu, Fajri Koto, Tatsuki Kuribayashi
, Cong Zeng, Rishabh Bhardwaj, Bingchen Zhao, Yawen Duan, Yi Liu, Emad A. Alghamdi, Yaodong Yang, Yinpeng Dong, Soujanya Poria, Pengfei Liu, Zhengzhong Liu, Xuguang Ren, Eduard H. Hovy, Iryna Gurevych, Preslav Nakov, Monojit Choudhury, Timothy Baldwin:
Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability. CoRR abs/2412.18551 (2024)- 2023
[j13]Le Cong Dinh
, David Henry Mguni, Long Tran-Thanh, Jun Wang, Yaodong Yang
:
Online Markov decision processes with non-oblivious strategic adversary. Auton. Agents Multi Agent Syst. 37(1): 15 (2023)
[j12]Shangding Gu
, Jakub Grudzien Kuba, Yuanpei Chen, Yali Du, Long Yang, Alois C. Knoll, Yaodong Yang
:
Safe multi-agent reinforcement learning for multi-robot control. Artif. Intell. 319: 103905 (2023)
[j11]Muning Wen, Runji Lin, Hanjing Wang, Yaodong Yang
, Ying Wen, Luo Mai
, Jun Wang, Hai-Feng Zhang, Weinan Zhang:
Large sequence models for sequential decision-making: a survey. Frontiers Comput. Sci. 17(6): 176349 (2023)
[j10]Linghui Meng
, Muning Wen, Chenyang Le, Xiyun Li, Dengpeng Xing, Weinan Zhang, Ying Wen, Haifeng Zhang, Jun Wang
, Yaodong Yang
, Bo Xu
:
Offline Pre-trained Multi-agent Decision Transformer. Mach. Intell. Res. 20(2): 233-248 (2023)
[j9]Ming Zhou, Ziyu Wan, Hanjing Wang, Muning Wen, Runzhe Wu, Ying Wen, Yaodong Yang, Yong Yu, Jun Wang, Weinan Zhang:
MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning. J. Mach. Learn. Res. 24: 150:1-150:12 (2023)
[j8]Siyi Hu, Yifan Zhong, Minquan Gao, Weixun Wang, Hao Dong, Xiaodan Liang, Zhihui Li, Xiaojun Chang, Yaodong Yang:
MARLlib: A Scalable and Efficient Multi-agent Reinforcement Learning Library. J. Mach. Learn. Res. 24: 315:1-315:23 (2023)
[j7]Jie Ren, Xidong Feng, Bo Liu, Xuehai Pan, Yao Fu, Luo Mai, Yaodong Yang:
TorchOpt: An Efficient Library for Differentiable Optimization. J. Mach. Learn. Res. 24: 367:1-367:14 (2023)
[j6]Yang Li, Kun Xiong, Yingping Zhang, Jiangcheng Zhu, Stephen Marcus McAleer, Wei Pan, Jun Wang, Zonghong Dai, Yaodong Yang:
JiangJun: Mastering Xiangqi by Tackling Non-Transitivity in Two-Player Zero-Sum Games. Trans. Mach. Learn. Res. 2023 (2023)
[c53]Chuming Li, Jie Liu
, Yinmin Zhang, Yuhong Wei, Yazhe Niu, Yaodong Yang, Yu Liu, Wanli Ouyang:
ACE: Cooperative Multi-Agent Q-learning with Bidirectional Action-Dependency. AAAI 2023: 8536-8544
[c52]David Mguni, Taher Jafferjee, Jianhong Wang, Nicolas Perez Nieves, Wenbin Song, Feifei Tong, Matthew E. Taylor, Tianpei Yang, Zipeng Dai, Hui Chen, Jiangcheng Zhu, Kun Shao, Jun Wang, Yaodong Yang:
Learning to Shape Rewards Using a Game of Two Partners. AAAI 2023: 11604-11612
[c51]Pei Xu
, Junge Zhang, Qiyue Yin, Chao Yu, Yaodong Yang, Kaiqi Huang:
Subspace-Aware Exploration for Sparse-Reward Multi-Agent Tasks. AAAI 2023: 11717-11725
[c50]Zhijian Duan, Wenhan Huang, Dinghuai Zhang, Yali Du, Jun Wang, Yaodong Yang, Xiaotie Deng:
Is Nash Equilibrium Approximator Learnable? AAMAS 2023: 233-241
[c49]Binghao Huang, Yuanpei Chen, Tianyu Wang, Yuzhe Qin, Yaodong Yang, Nikolay Atanasov, Xiaolong Wang:
Dynamic Handover: Throw and Catch with Bimanual Hands. CoRL 2023: 1887-1902
[c48]Chuming Li, Ruonan Jia
, Jie Liu, Yinmin Zhang, Yazhe Niu, Yaodong Yang
, Yu Liu, Wanli Ouyang
:
Theoretically Guaranteed Policy Improvement Distilled from Model-Based Planning. ECAI 2023: 1381-1388
[c47]Weikang Wan, Haoran Geng, Yun Liu, Zikang Shan, Yaodong Yang
, Li Yi, He Wang:
UniDexGrasp++: Improving Dexterous Grasping Policy Learning via Geometry-aware Curriculum and Iterative Generalist-Specialist Learning. ICCV 2023: 3868-3879
[c46]Shuang Wu, Jian Yao, Haobo Fu, Ye Tian, Chao Qian, Yaodong Yang, Qiang Fu, Wei Yang:
Quality-Similar Diversity via Population Based Reinforcement Learning. ICLR 2023
[c45]David Henry Mguni, Haojun Chen, Taher Jafferjee, Jianhong Wang, Longfei Yue, Xidong Feng, Stephen Marcus McAleer, Feifei Tong, Jun Wang, Yaodong Yang:
MANSA: Learning Fast and Slow in Multi-Agent Systems. ICML 2023: 24631-24658
[c44]Oliver Slumbers, David Henry Mguni, Stefano B. Blumberg, Stephen Marcus McAleer, Yaodong Yang, Jun Wang:
A Game-Theoretic Framework for Managing Risk in Multi-Agent Systems. ICML 2023: 32059-32087
[c43]Xiaohang Tang, Le Cong Dinh, Stephen Marcus McAleer, Yaodong Yang:
Regret-Minimizing Double Oracle for Extensive-Form Games. ICML 2023: 33599-33615
[c42]Hanjing Wang, Man-Kit Sit, Congjie He, Ying Wen, Weinan Zhang, Jun Wang, Yaodong Yang, Luo Mai:
GEAR: A GPU-Centric Experience Replay System for Large Reinforcement Learning Models. ICML 2023: 36380-36390
[c41]Yiran Geng, Boshi An, Haoran Geng, Yuanpei Chen, Yaodong Yang
, Hao Dong:
RLAfford: End-to-End Affordance Learning for Robotic Manipulation. ICRA 2023: 5880-5886
[c40]Puhao Li, Tengyu Liu, Yuyang Li, Yiran Geng, Yixin Zhu, Yaodong Yang
, Siyuan Huang:
GenDexGrasp: Generalizable Dexterous Grasping. ICRA 2023: 8068-8074
[c39]Jiaming Ji, Mickel Liu, Josef Dai, Xuehai Pan, Chi Zhang, Ce Bian, Boyuan Chen, Ruiyang Sun, Yizhou Wang, Yaodong Yang:
BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset. NeurIPS 2023
[c38]Jiaming Ji, Borong Zhang, Jiayi Zhou, Xuehai Pan, Weidong Huang, Ruiyang Sun, Yiran Geng, Yifan Zhong, Josef Dai, Yaodong Yang:
Safety Gymnasium: A Unified Safe Reinforcement Learning Benchmark. NeurIPS 2023
[c37]Stephen McAleer, Gabriele Farina, Gaoyue Zhou, Mingzhi Wang, Yaodong Yang, Tuomas Sandholm:
Team-PSRO for Learning Approximate TMECor in Large Team Games via Cooperative Reinforcement Learning. NeurIPS 2023
[c36]Mingyu Yang, Yaodong Yang, Zhenbo Lu, Wengang Zhou, Houqiang Li:
Hierarchical Multi-Agent Skill Discovery. NeurIPS 2023
[c35]Jian Yao, Weiming Liu, Haobo Fu, Yaodong Yang, Stephen McAleer, Qiang Fu, Wei Yang:
Policy Space Diversity for Non-Transitive Games. NeurIPS 2023
[c34]Youpeng Zhao, Yaodong Yang, Zhenbo Lu, Wengang Zhou, Houqiang Li:
Multi-Agent First Order Constrained Optimization in Policy Space. NeurIPS 2023
[c33]Huanzhou Zhu, Bo Zhao, Gang Chen, Weifeng Chen, Yijie Chen, Liang Shi, Yaodong Yang, Peter R. Pietzuch, Lei Chen:
MSRL: Distributed Reinforcement Learning with Dataflow Fragments. USENIX ATC 2023: 977-993
[i85]David Mguni, Taher Jafferjee, Haojun Chen, Jianhong Wang, Long Fei, Xidong Feng, Stephen McAleer, Feifei Tong, Jun Wang, Yaodong Yang:
MANSA: Learning Fast and Slow in Multi-Agent Systems. CoRR abs/2302.05910 (2023)
[i84]Shangding Gu, Alap Kshirsagar, Yali Du, Guang Chen, Yaodong Yang, Jan Peters, Alois C. Knoll:
A Human-Centered Safe Robot Reinforcement Learning Framework with Interactive Behaviors. CoRR abs/2302.13137 (2023)
[i83]Chenguang Wang, Zhouliang Yu, Stephen McAleer, Tianshu Yu, Yaodong Yang
:
ASP: Learn a Universal Neural Solver! CoRR abs/2303.00466 (2023)
[i82]Weikang Wan, Haoran Geng, Yun Liu, Zikang Shan, Yaodong Yang
, Li Yi, He Wang:
UniDexGrasp++: Improving Dexterous Grasping Policy Learning via Geometry-aware Curriculum and Iterative Generalist-Specialist Learning. CoRR abs/2304.00464 (2023)
[i81]Sirui Chen, Zhaowei Zhang, Yali Du, Yaodong Yang
:
STAS: Spatial-Temporal Return Decomposition for Multi-agent Reinforcement Learning. CoRR abs/2304.07520 (2023)
[i80]Yifan Zhong, Jakub Grudzien Kuba, Siyi Hu, Jiaming Ji, Yaodong Yang
:
Heterogeneous-Agent Reinforcement Learning. CoRR abs/2304.09870 (2023)
[i79]Xiaohang Tang, Le Cong Dinh, Stephen Marcus McAleer, Yaodong Yang:
Regret-Minimizing Double Oracle for Extensive-Form Games. CoRR abs/2304.10498 (2023)
[i78]Jiaming Ji, Jiayi Zhou, Borong Zhang, Juntao Dai, Xuehai Pan, Ruiyang Sun, Weidong Huang, Yiran Geng, Mickel Liu, Yaodong Yang:
OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning Research. CoRR abs/2305.09304 (2023)
[i77]Simin Li, Jun Guo, Jingqiao Xiu, Xini Yu, Jiakai Wang, Aishan Liu, Yaodong Yang, Xianglong Liu:
Byzantine Robust Cooperative Multi-Agent Reinforcement Learning as a Bayesian Game. CoRR abs/2305.12872 (2023)
[i76]Zhaowei Zhang, Nian Liu, Siyuan Qi, Ceyao Zhang, Ziqi Rong, Song-Chun Zhu, Shuguang Cui, Yaodong Yang
:
Heterogeneous Value Evaluation for Large Language Models. CoRR abs/2305.17147 (2023)
[i75]Yonggang Jin, Chenxu Wang, Liuyu Xiang, Yaodong Yang
, Jie Fu, Zhaofeng He:
Deep Reinforcement Learning with Multitask Episodic Memory Based on Task-Conditioned Hypernetwork. CoRR abs/2306.10698 (2023)
[i74]Jiarong Liu, Yifan Zhong, Siyi Hu, Haobo Fu, Qiang Fu, Xiaojun Chang, Yaodong Yang
:
Maximum Entropy Heterogeneous-Agent Mirror Learning. CoRR abs/2306.10715 (2023)
[i73]Muning Wen, Runji Lin, Hanjing Wang, Yaodong Yang, Ying Wen, Luo Mai, Jun Wang, Haifeng Zhang, Weinan Zhang:
Large Sequence Models for Sequential Decision-Making: A Survey. CoRR abs/2306.13945 (2023)
[i72]Jian Yao, Weiming Liu, Haobo Fu, Yaodong Yang
, Stephen McAleer, Qiang Fu, Wei Yang:
Policy Space Diversity for Non-Transitive Games. CoRR abs/2306.16884 (2023)
[i71]Jiaming Ji, Mickel Liu, Juntao Dai, Xuehai Pan, Chi Zhang, Ce Bian, Boyuan Zhang, Ruiyang Sun, Yizhou Wang, Yaodong Yang
:
BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset. CoRR abs/2307.04657 (2023)
[i70]Weidong Huang, Jiaming Ji, Borong Zhang, Chunhe Xia, Yaodong Yang
:
Safe DreamerV3: Safe Reinforcement Learning with World Models. CoRR abs/2307.07176 (2023)
[i69]Chuming Li, Ruonan Jia, Jie Liu, Yinmin Zhang, Yazhe Niu, Yaodong Yang, Yu Liu, Wanli Ouyang:
Theoretically Guaranteed Policy Improvement Distilled from Model-Based Planning. CoRR abs/2307.12933 (2023)
[i68]Yang Li, Kun Xiong, Yingping Zhang, Jiangcheng Zhu, Stephen McAleer, Wei Pan, Jun Wang, Zonghong Dai, Yaodong Yang
:
JiangJun: Mastering Xiangqi by Tackling Non-Transitivity in Two-Player Zero-Sum Games. CoRR abs/2308.04719 (2023)
[i67]Ceyao Zhang, Kaijie Yang, Siyi Hu, Zihao Wang, Guanghe Li, Yihang Sun, Cheng Zhang, Zhaowei Zhang, Anji Liu, Song-Chun Zhu, Xiaojun Chang, Junge Zhang, Feng Yin, Yitao Liang, Yaodong Yang
:
ProAgent: Building Proactive Cooperative AI with Large Language Models. CoRR abs/2308.11339 (2023)
[i66]Jingbang Chen, Yian Wang, Xingwei Qu, Shuangjia Zheng, Yaodong Yang
, Hao Dong, Jie Fu:
Mixup-Augmented Meta-Learning for Sample-Efficient Fine-Tuning of Protein Simulators. CoRR abs/2308.15116 (2023)
[i65]Binghao Huang, Yuanpei Chen, Tianyu Wang, Yuzhe Qin, Yaodong Yang
, Nikolay Atanasov, Xiaolong Wang:
Dynamic Handover: Throw and Catch with Bimanual Hands. CoRR abs/2309.05655 (2023)
[i64]Chengdong Ma, Ziran Yang, Minquan Gao, Hai Ci, Jun Gao, Xuehai Pan, Yaodong Yang
:
Red Teaming Game: A Game-Theoretic Framework for Red Teaming Language Models. CoRR abs/2310.00322 (2023)
[i63]Zhaowei Zhang, Fengshuo Bai, Jun Gao, Yaodong Yang
:
Measuring Value Understanding in Language Models through Discriminator-Critique Gap. CoRR abs/2310.00378 (2023)
[i62]Hanjing Wang, Man-Kit Sit, Congjie He, Ying Wen, Weinan Zhang, Jun Wang, Yaodong Yang
, Luo Mai:
GEAR: A GPU-Centric Experience Replay System for Large Reinforcement Learning Models. CoRR abs/2310.05205 (2023)
[i61]Simin Li, Ruixiao Xu, Jun Guo, Pu Feng, Jiakai Wang, Aishan Liu, Yaodong Yang
, Xianglong Liu, Weifeng Lv:
MIR2: Towards Provably Robust Multi-Agent Reinforcement Learning by Mutual Information Regularization. CoRR abs/2310.09833 (2023)
[i60]Jie Liu, Yinmin Zhang, Chuming Li, Chao Yang, Yaodong Yang
, Yu Liu, Wanli Ouyang
:
Masked Pretraining for Multi-Agent Decision Making. CoRR abs/2310.11846 (2023)
[i59]Jiaming Ji, Borong Zhang, Jiayi Zhou, Xuehai Pan, Weidong Huang, Ruiyang Sun, Yiran Geng, Yifan Zhong, Juntao Dai, Yaodong Yang
:
Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark. CoRR abs/2310.12567 (2023)
[i58]Josef Dai, Xuehai Pan, Ruiyang Sun, Jiaming Ji, Xinbo Xu, Mickel Liu, Yizhou Wang, Yaodong Yang
:
Safe RLHF: Safe Reinforcement Learning from Human Feedback. CoRR abs/2310.12773 (2023)
[i57]Yuyang Li, Bo Liu, Yiran Geng, Puhao Li, Yaodong Yang
, Yixin Zhu, Tengyu Liu, Siyuan Huang:
Grasp Multiple Objects with One Hand. CoRR abs/2310.15599 (2023)
[i56]Jiaming Ji, Tianyi Qiu, Boyuan Chen, Borong Zhang, Hantao Lou, Kaile Wang, Yawen Duan, Zhonghao He, Jiayi Zhou, Zhaowei Zhang, Fanzhi Zeng, Kwan Yee Ng, Juntao Dai, Xuehai Pan, Aidan O'Gara, Yingshan Lei, Hua Xu, Brian Tse, Jie Fu, Stephen McAleer, Yaodong Yang
, Yizhou Wang, Song-Chun Zhu, Yike Guo
, Wen Gao:
AI Alignment: A Comprehensive Survey. CoRR abs/2310.19852 (2023)
[i55]Zihao Wang, Shaofei Cai, Anji Liu, Yonggang Jin, Jinbing Hou, Bowei Zhang, Haowei Lin, Zhaofeng He, Zilong Zheng, Yaodong Yang, Xiaojian Ma, Yitao Liang:
JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models. CoRR abs/2311.05997 (2023)
[i54]Yinmin Zhang, Jie Liu, Chuming Li, Yazhe Niu, Yaodong Yang
, Yu Liu, Wanli Ouyang
:
A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning. CoRR abs/2312.07685 (2023)- 2022
[j5]Ricky Sanjaya, Jun Wang, Yaodong Yang
:
Measuring the Non-Transitivity in Chess. Algorithms 15(5): 152 (2022)
[j4]Qingduo Zeng, Qiang Zhang, Shancun Liu, Yaodong Yang:
Illiquidity Comovement and Market Crisis. J. Syst. Sci. Complex. 35(5): 1863-1874 (2022)
[j3]Le Cong Dinh, Stephen Marcus McAleer, Zheng Tian, Nicolas Perez Nieves, Oliver Slumbers, David Henry Mguni, Jun Wang, Haitham Bou-Ammar, Yaodong Yang:
Online Double Oracle. Trans. Mach. Learn. Res. 2022 (2022)
[c32]Ying Wen, Hui Chen, Yaodong Yang
, Minne Li, Zheng Tian, Xu Chen, Jun Wang:
A Game-Theoretic Approach to Multi-agent Trust Region Optimization. DAI 2022: 74-87
[c31]Jakub Grudzien Kuba, Ruiqing Chen, Muning Wen, Ying Wen, Fanglei Sun, Jun Wang, Yaodong Yang:
Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning. ICLR 2022
[c30]David Henry Mguni, Taher Jafferjee, Jianhong Wang, Nicolas Perez Nieves, Oliver Slumbers, Feifei Tong, Yang Li, Jiangcheng Zhu, Yaodong Yang, Jun Wang:
LIGS: Learnable Intrinsic-Reward Generation Selection for Multi-Agent Learning. ICLR 2022
[c29]Yurong Chen
, Xiaotie Deng, Chenchen Li, David Mguni, Jun Wang, Xiang Yan, Yaodong Yang:
On the Convergence of Fictitious Play: A Decomposition Approach. IJCAI 2022: 179-185
[c28]Yali Du, Chengdong Ma, Yuchen Liu, Runji Lin, Hao Dong, Jun Wang, Yaodong Yang
:
Scalable Model-based Policy Optimization for Decentralized Networked Systems. IROS 2022: 9019-9026
[c27]Bo Liu, Xidong Feng, Jie Ren, Luo Mai, Rui Zhu, Haifeng Zhang, Jun Wang, Yaodong Yang:
A Theoretical Understanding of Gradient Bias in Meta-Reinforcement Learning. NeurIPS 2022
[c26]Yuanpei Chen, Tianhao Wu, Shengjie Wang, Xidong Feng, Jiechuan Jiang, Zongqing Lu, Stephen McAleer, Hao Dong, Song-Chun Zhu, Yaodong Yang:
Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning. NeurIPS 2022
[c25]Runze Liu, Fengshuo Bai, Yali Du, Yaodong Yang:
Meta-Reward-Net: Implicitly Differentiable Reward Learning for Preference-based Reinforcement Learning. NeurIPS 2022
[c24]Xuehai Pan, Mickel Liu, Fangwei Zhong, Yaodong Yang, Song-Chun Zhu, Yizhou Wang:
MATE: Benchmarking Multi-Agent Reinforcement Learning in Distributed Target Coverage Control. NeurIPS 2022
[c23]Muning Wen, Jakub Grudzien Kuba, Runji Lin, Weinan Zhang, Ying Wen, Jun Wang, Yaodong Yang:
Multi-Agent Reinforcement Learning is a Sequence Modeling Problem. NeurIPS 2022
[c22]Long Yang, Jiaming Ji, Juntao Dai, Linrui Zhang, Binbin Zhou, Pengfei Li, Yaodong Yang, Gang Pan:
Constrained Update Projection Approach to Safe Policy Optimization. NeurIPS 2022
[c21]Zhitao Zhu, Shijing Si, Jianzong Wang
, Yaodong Yang
, Jing Xiao:
Debias the Black-Box: A Fair Ranking Framework via Knowledge Distillation. WISE 2022: 395-405
[i53]Ming Zhou, Jingxiao Chen, Ying Wen, Weinan Zhang, Yaodong Yang, Yong Yu:
Efficient Policy Space Response Oracles. CoRR abs/2202.00633 (2022)
[i52]Juliusz Krysztof Ziomek, Jun Wang, Yaodong Yang:
Settling the Communication Complexity for Distributed Offline Reinforcement Learning. CoRR abs/2202.04862 (2022)
[i51]Zehao Dou, Jakub Grudzien Kuba, Yaodong Yang:
Understanding Value Decomposition Algorithms in Deep Cooperative Multi-Agent Reinforcement Learning. CoRR abs/2202.04868 (2022)
[i50]Yurong Chen, Xiaotie Deng, Chenchen Li, David Mguni, Jun Wang, Xiang Yan, Yaodong Yang
:
On the Convergence of Fictitious Play: A Decomposition Approach. CoRR abs/2205.01469 (2022)
[i49]Shangding Gu, Long Yang, Yali Du, Guang Chen, Florian Walter, Jun Wang, Yaodong Yang
, Alois C. Knoll:
A Review of Safe Reinforcement Learning: Methods, Theory and Applications. CoRR abs/2205.10330 (2022)
[i48]Muning Wen, Jakub Grudzien Kuba, Runji Lin, Weinan Zhang, Ying Wen, Jun Wang, Yaodong Yang
:
Multi-Agent Reinforcement Learning is a Sequence Modeling Problem. CoRR abs/2205.14953 (2022)
[i47]Oliver Slumbers, David Henry Mguni, Stephen McAleer, Jun Wang, Yaodong Yang
:
Learning Risk-Averse Equilibria in Multi-Agent Systems. CoRR abs/2205.15434 (2022)
[i46]Yuanpei Chen, Yaodong Yang
, Tianhao Wu, Shengjie Wang, Xidong Feng, Jiechuang Jiang, Stephen Marcus McAleer, Hao Dong, Zongqing Lu, Song-Chun Zhu:
Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning. CoRR abs/2206.08686 (2022)
[i45]Yali Du, Chengdong Ma, Yuchen Liu, Runji Lin, Hao Dong, Jun Wang, Yaodong Yang
:
Fully Decentralized Model-based Policy Optimization for Networked Systems. CoRR abs/2207.06559 (2022)
[i44]Jakub Grudzien Kuba, Xidong Feng, Shiyao Ding, Hao Dong, Jun Wang, Yaodong Yang
:
Heterogeneous-Agent Mirror Learning: A Continuum of Solutions to Cooperative MARL. CoRR abs/2208.01682 (2022)
[i43]Zhitao Zhu, Shijing Si, Jianzong Wang, Yaodong Yang
, Jing Xiao:
Debias the Black-box: A Fair Ranking Framework via Knowledge Distillation. CoRR abs/2208.11628 (2022)
[i42]Long Yang, Jiaming Ji, Juntao Dai, Linrui Zhang, Binbin Zhou, Pengfei Li, Yaodong Yang
, Gang Pan:
Constrained Update Projection Approach to Safe Policy Optimization. CoRR abs/2209.07089 (2022)
[i41]Yiran Geng, Boshi An
, Haoran Geng, Yuanpei Chen, Yaodong Yang
, Hao Dong:
End-to-End Affordance Learning for Robotic Manipulation. CoRR abs/2209.12941 (2022)
[i40]Puhao Li, Tengyu Liu, Yuyang Li, Yiran Geng, Yixin Zhu, Yaodong Yang
, Siyuan Huang:
GenDexGrasp: Generalizable Dexterous Grasping. CoRR abs/2210.00722 (2022)
[i39]Huanzhou Zhu, Bo Zhao
, Gang Chen, Weifeng Chen, Yijie Chen, Liang Shi, Yaodong Yang, Peter R. Pietzuch, Lei Chen:
MSRL: Distributed Reinforcement Learning with Dataflow Fragments. CoRR abs/2210.00882 (2022)
[i38]Siyi Hu, Yifan Zhong, Minquan Gao, Weixun Wang, Hao Dong, Zhihui Li, Xiaodan Liang, Xiaojun Chang, Yaodong Yang
:
MARLlib: Extending RLlib for Multi-agent Reinforcement Learning. CoRR abs/2210.13708 (2022)
[i37]Jie Ren, Xidong Feng, Bo Liu, Xuehai Pan, Yao Fu, Luo Mai, Yaodong Yang
:
TorchOpt: An Efficient Library for Differentiable Optimization. CoRR abs/2211.06934 (2022)
[i36]Runji Lin, Ye Li
, Xidong Feng, Zhaowei Zhang, Xian Hong Wu Fung, Haifeng Zhang, Jun Wang, Yali Du, Yaodong Yang
:
Contextual Transformer for Offline Meta Reinforcement Learning. CoRR abs/2211.08016 (2022)
[i35]Chuming Li, Jie Liu, Yinmin Zhang, Yuhong Wei, Yazhe Niu, Yaodong Yang
, Yu Liu, Wanli Ouyang
:
ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency. CoRR abs/2211.16068 (2022)- 2021
[b1]Yaodong Yang:
Many-agent reinforcement learning. University College London (University of London), UK, 2021
[c20]David Henry Mguni, Yutong Wu, Yali Du, Yaodong Yang, Ziyi Wang, Minne Li, Ying Wen, Joel Jennings, Jun Wang:
Learning in Nonzero-Sum Stochastic Games with Potentials. ICML 2021: 7688-7699
[c19]Nicolas Perez Nieves, Yaodong Yang, Oliver Slumbers, David Henry Mguni, Ying Wen, Jun Wang:
Modelling Behavioural Diversity for Learning in Open-Ended Games. ICML 2021: 8514-8524
[c18]Vittorio Caggiano, Guillaume Durandau, Huawei Wang, Alberto Silvio Chiappa, Alexander Mathis, Pablo Tano, Nisheet Patel, Alexandre Pouget, Pierre Schumacher, Georg Martius, Daniel F. B. Haeufle, Yiran Geng, Boshi An, Yifan Zhong, Jiaming Ji, Yuanpei Chen, Hao Dong, Yaodong Yang, Rahul Siripurapu, Luis Eduardo Ferro Diez, Michael Kopp, Vihang Patil, Sepp Hochreiter, Yuval Tassa, Josh Merel, Randy Schultheis, Seungmoon Song, Massimo Sartori, Vikash Kumar:
MyoChallenge 2022: Learning contact-rich manipulation using a musculoskeletal hand. NeurIPS (Competition and Demos) 2021: 233-250
[c17]Xiangyu Liu, Hangtian Jia, Ying Wen, Yujing Hu, Yingfeng Chen, Changjie Fan, Zhipeng Hu, Yaodong Yang:
Towards Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games. NeurIPS 2021: 941-952
[c16]Xidong Feng, Oliver Slumbers, Ziyu Wan, Bo Liu, Stephen McAleer, Ying Wen, Jun Wang, Yaodong Yang:
Neural Auto-Curricula in Two-Player Zero-Sum Games. NeurIPS 2021: 3504-3517
[c15]Jakub Grudzien Kuba, Muning Wen, Linghui Meng, Shangding Gu, Haifeng Zhang, David Mguni, Jun Wang, Yaodong Yang:
Settling the Variance of Multi-Agent Policy Gradients. NeurIPS 2021: 13458-13470
[i34]Le Cong Dinh, Yaodong Yang, Zheng Tian, Nicolas Perez Nieves, Oliver Slumbers, David Henry Mguni, Haitham Bou-Ammar, Jun Wang:
Online Double Oracle. CoRR abs/2103.07780 (2021)
[i33]Nicolas Perez Nieves, Yaodong Yang, Oliver Slumbers, David Henry Mguni, Jun Wang:
Modelling Behavioural Diversity for Learning in Open-Ended Games. CoRR abs/2103.07927 (2021)
[i32]David Mguni, Jianhong Wang, Taher Jafferjee, Nicolas Perez Nieves, Wenbin Song, Yaodong Yang, Feifei Tong, Hui Chen, Jiangcheng Zhu, Yali Du, Jun Wang:
Learning to Shape Rewards using a Game of Switching Controls. CoRR abs/2103.09159 (2021)
[i31]David Mguni, Yutong Wu, Yali Du, Yaodong Yang, Ziyi Wang, Minne Li, Ying Wen, Joel Jennings, Jun Wang:
Learning in Nonzero-Sum Stochastic Games with Potentials. CoRR abs/2103.09284 (2021)
[i30]Xidong Feng, Oliver Slumbers, Yaodong Yang, Ziyu Wan, Bo Liu
, Stephen McAleer, Ying Wen, Jun Wang:
Discovering Multi-Agent Auto-Curricula in Two-Player Zero-Sum Games. CoRR abs/2106.02745 (2021)
[i29]Xiangyu Liu, Hangtian Jia, Ying Wen, Yaodong Yang, Yujing Hu, Yingfeng Chen, Changjie Fan, Zhipeng Hu:
Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games. CoRR abs/2106.04958 (2021)
[i28]Ying Wen, Hui Chen, Yaodong Yang, Zheng Tian, Minne Li, Xu Chen, Jun Wang:
A Game-Theoretic Approach to Multi-Agent Trust Region Optimization. CoRR abs/2106.06828 (2021)
[i27]Ming Zhou, Ziyu Wan, Hanjing Wang, Muning Wen, Runzhe Wu, Ying Wen, Yaodong Yang, Weinan Zhang, Jun Wang:
MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning. CoRR abs/2106.07551 (2021)
[i26]Jakub Grudzien Kuba, Muning Wen, Yaodong Yang, Linghui Meng, Shangding Gu, Haifeng Zhang, David Henry Mguni, Jun Wang:
Settling the Variance of Multi-Agent Policy Gradients. CoRR abs/2108.08612 (2021)
[i25]Xiaotie Deng, Yuhao Li, David Henry Mguni, Jun Wang, Yaodong Yang:
On the Complexity of Computing Markov Perfect Equilibrium in General-Sum Stochastic Games. CoRR abs/2109.01795 (2021)
[i24]Yixin Wu, Rui Luo, Chen Zhang, Jun Wang, Yaodong Yang:
Revisiting the Characteristics of Stochastic Gradient Noise and Dynamics. CoRR abs/2109.09833 (2021)
[i23]Jakub Grudzien Kuba, Ruiqing Chen, Muning Wen, Ying Wen, Fanglei Sun, Jun Wang, Yaodong Yang:
Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning. CoRR abs/2109.11251 (2021)
[i22]Shangding Gu, Jakub Grudzien Kuba, Muning Wen, Ruiqing Chen, Ziyan Wang, Zheng Tian, Jun Wang, Alois C. Knoll, Yaodong Yang:
Multi-Agent Constrained Policy Optimisation. CoRR abs/2110.02793 (2021)
[i21]Le Cong Dinh, David Henry Mguni, Long Tran-Thanh, Jun Wang, Yaodong Yang:
Online Markov Decision Processes with Non-oblivious Strategic Adversary. CoRR abs/2110.03604 (2021)
[i20]Ricky Sanjaya, Jun Wang, Yaodong Yang:
Measuring the Non-Transitivity in Chess. CoRR abs/2110.11737 (2021)
[i19]David Mguni, Joel Jennings, Taher Jafferjee, Aivar Sootla, Yaodong Yang, Changmin Yu, Usman Islam, Ziyan Wang, Jun Wang:
DESTA: A Framework for Safe Reinforcement Learning with Markov Games of Intervention. CoRR abs/2110.14468 (2021)
[i18]Chenguang Wang, Yaodong Yang, Oliver Slumbers, Congying Han, Tiande Guo, Haifeng Zhang, Jun Wang:
A Game-Theoretic Approach for Improving Generalization Ability of TSP Solvers. CoRR abs/2110.15105 (2021)
[i17]David Henry Mguni, Taher Jafferjee, Jianhong Wang, Nicolas Perez Nieves, Oliver Slumbers, Feifei Tong, Yang Li, Jiangcheng Zhu, Yaodong Yang, Jun Wang:
LIGS: Learnable Intrinsic-Reward Generation Selection for Multi-Agent Learning. CoRR abs/2112.02618 (2021)
[i16]Linghui Meng, Muning Wen, Yaodong Yang, Chenyang Le, Xiyun Li, Weinan Zhang, Ying Wen, Haifeng Zhang, Jun Wang, Bo Xu:
Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Tackles All SMAC Tasks. CoRR abs/2112.02845 (2021)
[i15]Bo Liu, Xidong Feng, Haifeng Zhang, Jun Wang, Yaodong Yang:
Settling the Bias and Variance of Meta-Gradient Estimation for Meta-Reinforcement Learning. CoRR abs/2112.15400 (2021)
[i14]Xiaotie Deng, Yuhao Li, David Mguni, Jun Wang, Yaodong Yang:
On the Complexity of Computing Markov Perfect Equilibrium in General-Sum Stochastic Games. Electron. Colloquium Comput. Complex. TR21 (2021)- 2020
[j2]Alisa Kim, Yaodong Yang, Stefan Lessmann
, Tiejun Ma
, Ming-Chien Sung
, Johnnie E. V. Johnson:
Can deep learning predict risky retail investors? A case study in financial risk behavior forecasting. Eur. J. Oper. Res. 283(1): 217-234 (2020)
[j1]Qiang Zhang, Chao Wang, Shancun Liu, Yaodong Yang
:
Order Execution Probability and Order Queue in Limit Order Markets. J. Syst. Sci. Complex. 33(5): 1545-1557 (2020)
[c14]Haifeng Zhang, Weizhe Chen, Zeren Huang, Minne Li, Yaodong Yang, Weinan Zhang, Jun Wang:
Bi-Level Actor-Critic for Multi-Agent Coordination. AAAI 2020: 7325-7332
[c13]Yaodong Yang, Rasul Tutunov, Phu Sakulwongtana, Haitham Bou-Ammar:
αα-Rank: Practically Scaling α-Rank through Stochastic Optimisation. AAMAS 2020: 1575-1583
[c12]Zhaoqing Peng, Junqi Jin, Lan Luo, Yaodong Yang, Rui Luo, Jun Wang, Weinan Zhang, Miao Xu, Chuan Yu, Tiejian Luo, Han Li, Jian Xu, Kun Gai:
Sequential Advertising Agent with Interpretable User Hidden Intents. AAMAS 2020: 1966-1968
[c11]Zhaoqing Peng, Junqi Jin, Lan Luo, Yaodong Yang
, Rui Luo, Jun Wang, Weinan Zhang, Haiyang Xu, Miao Xu, Chuan Yu, Tiejian Luo, Han Li, Jian Xu, Kun Gai:
Learning to Infer User Hidden States for Online Sequential Advertising. CIKM 2020: 2677-2684
[c10]Yaodong Yang, Ying Wen, Jun Wang, Liheng Chen, Kun Shao, David Mguni, Weinan Zhang:
Multi-Agent Determinantal Q-Learning. ICML 2020: 10757-10766
[c9]Ying Wen, Yaodong Yang, Jun Wang:
Modelling Bounded Rationality in Multi-Agent Interactions by Generalized Recursive Reasoning. IJCAI 2020: 414-421
[c8]Rui Luo, Qiang Zhang, Yaodong Yang, Jun Wang:
Replica-Exchange Nosé-Hoover Dynamics for Bayesian Learning on Large Datasets. NeurIPS 2020
[i13]Yaodong Yang, Ying Wen, Liheng Chen, Jun Wang, Kun Shao, David Mguni, Weinan Zhang:
Multi-Agent Determinantal Q-Learning. CoRR abs/2006.01482 (2020)
[i12]Zhaoqing Peng, Junqi Jin, Lan Luo, Yaodong Yang, Rui Luo, Jun Wang, Weinan Zhang, Haiyang Xu, Miao Xu, Chuan Yu, Tiejian Luo, Han Li, Jian Xu, Kun Gai:
Learning to Infer User Hidden States for Online Sequential Advertising. CoRR abs/2009.01453 (2020)
[i11]Yaodong Yang, Jun Wang:
An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective. CoRR abs/2011.00583 (2020)
2010 – 2019
- 2019
[c7]Ming Zhou
, Yong Chen, Ying Wen, Yaodong Yang
, Yufeng Su, Weinan Zhang, Dell Zhang, Jun Wang:
Factorized Q-learning for large-scale multi-agent systems. DAI 2019: 7:1-7:7
[c6]Yaodong Yang
, Rui Luo, Yuanyuan Liu:
Adversarial Variational Bayes Methods for Tweedie Compound Poisson Mixed Models. ICASSP 2019: 3377-3381
[c5]Ying Wen, Yaodong Yang, Rui Luo, Jun Wang, Wei Pan:
Probabilistic Recursive Reasoning for Multi-Agent Reinforcement Learning. ICLR (Poster) 2019
[c4]Minne Li, Zhiwei (Tony) Qin, Yan Jiao, Yaodong Yang
, Jun Wang, Chenxi Wang, Guobin Wu, Jieping Ye:
Efficient Ridesharing Order Dispatching with Mean Field Multi-Agent Reinforcement Learning. WWW 2019: 983-994
[i10]Ying Wen, Yaodong Yang, Rui Luo, Jun Wang, Wei Pan:
Probabilistic Recursive Reasoning for Multi-Agent Reinforcement Learning. CoRR abs/1901.09207 (2019)
[i9]Ying Wen, Yaodong Yang, Rui Lu, Jun Wang:
Multi-Agent Generalized Recursive Reasoning. CoRR abs/1901.09216 (2019)
[i8]Minne Li, Zhiwei (Tony) Qin, Yan Jiao, Yaodong Yang, Zhichen Gong, Jun Wang, Chenxi Wang, Guobin Wu, Jieping Ye:
Efficient Ridesharing Order Dispatching with Mean Field Multi-Agent Reinforcement Learning. CoRR abs/1901.11454 (2019)
[i7]Rui Luo, Qiang Zhang, Yaodong Yang, Jun Wang:
Replica-exchange Nosé-Hoover dynamics for Bayesian learning on large datasets. CoRR abs/1905.12569 (2019)
[i6]Haifeng Zhang, Weizhe Chen, Zeren Huang, Minne Li, Yaodong Yang, Weinan Zhang, Jun Wang:
Bi-level Actor-Critic for Multi-agent Coordination. CoRR abs/1909.03510 (2019)- 2018
[c3]Yaodong Yang, Lantao Yu, Yiwei Bai, Ying Wen, Weinan Zhang, Jun Wang:
A Study of AI Population Dynamics with Million-agent Reinforcement Learning. AAMAS 2018: 2133-2135
[c2]Yaodong Yang, Rui Luo, Minne Li, Ming Zhou
, Weinan Zhang, Jun Wang:
Mean Field Multi-Agent Reinforcement Learning. ICML 2018: 5567-5576
[c1]Rui Luo, Jianhong Wang, Yaodong Yang, Jun Wang, Zhanxing Zhu:
Thermostat-assisted continuously-tempered Hamiltonian Monte Carlo for Bayesian learning. NeurIPS 2018: 10696-10705
[i5]Yaodong Yang, Rui Luo, Minne Li, Ming Zhou, Weinan Zhang, Jun Wang:
Mean Field Multi-Agent Reinforcement Learning. CoRR abs/1802.05438 (2018)
[i4]Yong Chen, Ming Zhou, Ying Wen, Yaodong Yang, Yufeng Su, Weinan Zhang, Dell Zhang, Jun Wang, Han Liu:
Factorized Q-Learning for Large-Scale Multi-Agent Systems. CoRR abs/1809.03738 (2018)
[i3]Qiang Zhang, Rui Luo, Yaodong Yang, Yuanyuan Liu:
Benchmarking Deep Sequential Models on Volatility Predictions for Financial Time Series. CoRR abs/1811.03711 (2018)- 2017
[i2]Peng Peng, Quan Yuan, Ying Wen, Yaodong Yang, Zhenkun Tang, Haitao Long, Jun Wang:
Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games. CoRR abs/1703.10069 (2017)
[i1]Yaodong Yang, Lantao Yu, Yiwei Bai, Jun Wang, Weinan Zhang, Ying Wen, Yong Yu:
An Empirical Study of AI Population Dynamics with Million-agent Reinforcement Learning. CoRR abs/1709.04511 (2017)
Coauthor Index
aka: Stephen Marcus McAleer
aka: David Henry Mguni
aka: Tianyi Alex Qiu

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from
to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the
of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from
,
, and
to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from
and
to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from
.
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2026-01-16 01:23 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID







