Остановите войну!
for scientists:
default search action
Rafael Rafailov
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [c13]Joey Hejna, Rafael Rafailov, Harshit Sikchi, Chelsea Finn, Scott Niekum, W. Bradley Knox, Dorsa Sadigh:
Contrastive Preference Learning: Learning from Human Feedback without Reinforcement Learning. ICLR 2024 - [c12]Eric Mitchell, Rafael Rafailov, Archit Sharma, Chelsea Finn, Christopher D. Manning:
An Emulator for Fine-tuning Large Language Models using Small Language Models. ICLR 2024 - [c11]Charlotte Nicks, Eric Mitchell, Rafael Rafailov, Archit Sharma, Christopher D. Manning, Chelsea Finn, Stefano Ermon:
Language Model Detectors Are Easily Optimized Against. ICLR 2024 - [c10]Victor Kolev, Rafael Rafailov, Kyle Hatch, Jiajun Wu, Chelsea Finn:
Efficient imitation learning with conservative world models. L4DC 2024: 1777-1790 - [i24]Rafael Rafailov, Kyle Hatch, Victor Kolev, John D. Martin, Mariano Phielipp, Chelsea Finn:
MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot Learning. CoRR abs/2401.03306 (2024) - [i23]Yiyang Zhou, Chenhang Cui, Rafael Rafailov, Chelsea Finn, Huaxiu Yao:
Aligning Modalities in Vision Large Language Models via Preference Fine-tuning. CoRR abs/2402.11411 (2024) - [i22]Ryan Park, Rafael Rafailov, Stefano Ermon, Chelsea Finn:
Disentangling Length from Quality in Direct Preference Optimization. CoRR abs/2403.19159 (2024) - [i21]Matthias Gerstgrasser, Rylan Schaeffer, Apratim Dey, Rafael Rafailov, Henry Sleight, John Hughes, Tomasz Korbak, Rajashree Agrawal, Dhruv Pai, Andrey Gromov, Daniel A. Roberts, Diyi Yang, David L. Donoho, Sanmi Koyejo:
Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data. CoRR abs/2404.01413 (2024) - [i20]Rafael Rafailov, Joey Hejna, Ryan Park, Chelsea Finn:
From r to Q*: Your Language Model is Secretly a Q-Function. CoRR abs/2404.12358 (2024) - [i19]Jan-Philipp Fränken, Eric Zelikman, Rafael Rafailov, Kanishk Gandhi, Tobias Gerstenberg, Noah D. Goodman:
Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels. CoRR abs/2404.14313 (2024) - [i18]Fahim Tajwar, Anikait Singh, Archit Sharma, Rafael Rafailov, Jeff Schneider, Tengyang Xie, Stefano Ermon, Chelsea Finn, Aviral Kumar:
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data. CoRR abs/2404.14367 (2024) - [i17]Victor Kolev, Rafael Rafailov, Kyle Hatch, Jiajun Wu, Chelsea Finn:
Efficient Imitation Learning with Conservative World Models. CoRR abs/2405.13193 (2024) - [i16]Pierre Harvey Richemond, Yunhao Tang, Daniel Guo, Daniele Calandriello, Mohammad Gheshlaghi Azar, Rafael Rafailov, Bernardo Ávila Pires, Eugene Tarassov, Lucas Spangher, Will Ellsworth, Aliaksei Severyn, Jonathan Mallinson, Lior Shani, Gil Shamir, Rishabh Joshi, Tianqi Liu, Rémi Munos, Bilal Piot:
Offline Regularised Reinforcement Learning for Large Language Models Alignment. CoRR abs/2405.19107 (2024) - [i15]Ahmed M. Ahmed, Rafael Rafailov, Stepan Sharkov, Xuechen Li, Sanmi Koyejo:
Scalable Ensembling For Mitigating Reward Overoptimisation. CoRR abs/2406.01013 (2024) - [i14]Rafael Rafailov, Yaswanth Chittepu, Ryan Park, Harshit Sikchi, Joey Hejna, W. Bradley Knox, Chelsea Finn, Scott Niekum:
Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms. CoRR abs/2406.02900 (2024) - [i13]Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Paul Foster, Grace Lam, Pannag Sanketi, Quan Vuong, Thomas Kollar, Benjamin Burchfiel, Russ Tedrake, Dorsa Sadigh, Sergey Levine, Percy Liang, Chelsea Finn:
OpenVLA: An Open-Source Vision-Language-Action Model. CoRR abs/2406.09246 (2024) - 2023
- [c9]Rafael Rafailov, Kyle Beltran Hatch, Victor Kolev, John D. Martin, Mariano Phielipp, Chelsea Finn:
MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot Learning. CoRL 2023: 3654-3671 - [c8]Katherine Tian, Eric Mitchell, Allan Zhou, Archit Sharma, Rafael Rafailov, Huaxiu Yao, Chelsea Finn, Christopher D. Manning:
Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback. EMNLP 2023: 5433-5442 - [c7]Kyle Beltran Hatch, Benjamin Eysenbach, Rafael Rafailov, Tianhe Yu, Ruslan Salakhutdinov, Sergey Levine, Chelsea Finn:
Contrastive Example-Based Control. L4DC 2023: 155-169 - [c6]Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D. Manning, Stefano Ermon, Chelsea Finn:
Direct Preference Optimization: Your Language Model is Secretly a Reward Model. NeurIPS 2023 - [i12]Katherine Tian, Eric Mitchell, Allan Zhou, Archit Sharma, Rafael Rafailov, Huaxiu Yao, Chelsea Finn, Christopher D. Manning:
Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback. CoRR abs/2305.14975 (2023) - [i11]Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, Chelsea Finn:
Direct Preference Optimization: Your Language Model is Secretly a Reward Model. CoRR abs/2305.18290 (2023) - [i10]Kyle Hatch, Benjamin Eysenbach, Rafael Rafailov, Tianhe Yu, Ruslan Salakhutdinov, Sergey Levine, Chelsea Finn:
Contrastive Example-Based Control. CoRR abs/2307.13101 (2023) - [i9]Max Sobol Mark, Archit Sharma, Fahim Tajwar, Rafael Rafailov, Sergey Levine, Chelsea Finn:
Offline Retraining for Online RL: Decoupled Policy Learning to Mitigate Exploration Bias. CoRR abs/2310.08558 (2023) - [i8]Eric Mitchell, Rafael Rafailov, Archit Sharma, Chelsea Finn, Christopher D. Manning:
An Emulator for Fine-Tuning Large Language Models using Small Language Models. CoRR abs/2310.12962 (2023) - [i7]Joey Hejna, Rafael Rafailov, Harshit Sikchi, Chelsea Finn, Scott Niekum, W. Bradley Knox, Dorsa Sadigh:
Contrastive Preference Learning: Learning from Human Feedback without RL. CoRR abs/2310.13639 (2023) - [i6]Bram Wallace, Meihua Dang, Rafael Rafailov, Linqi Zhou, Aaron Lou, Senthil Purushwalkam, Stefano Ermon, Caiming Xiong, Shafiq Joty, Nikhil Naik:
Diffusion Model Alignment Using Direct Preference Optimization. CoRR abs/2311.12908 (2023) - 2022
- [c5]Kyle Hsu, Moo Jin Kim, Rafael Rafailov, Jiajun Wu, Chelsea Finn:
Vision-Based Manipulators Need to Also See from Their Hands. ICLR 2022 - [i5]Kyle Hsu, Moo Jin Kim, Rafael Rafailov, Jiajun Wu, Chelsea Finn:
Vision-Based Manipulators Need to Also See from Their Hands. CoRR abs/2203.12677 (2022) - 2021
- [c4]Eric Mitchell, Rafael Rafailov, Xue Bin Peng, Sergey Levine, Chelsea Finn:
Offline Meta-Reinforcement Learning with Advantage Weighting. ICML 2021: 7780-7791 - [c3]Rafael Rafailov, Tianhe Yu, Aravind Rajeswaran, Chelsea Finn:
Offline Reinforcement Learning from Images with Latent Space Models. L4DC 2021: 1154-1168 - [c2]Rafael Rafailov, Tianhe Yu, Aravind Rajeswaran, Chelsea Finn:
Visual Adversarial Imitation Learning using Variational Models. NeurIPS 2021: 3016-3028 - [c1]Tianhe Yu, Aviral Kumar, Rafael Rafailov, Aravind Rajeswaran, Sergey Levine, Chelsea Finn:
COMBO: Conservative Offline Model-Based Policy Optimization. NeurIPS 2021: 28954-28967 - [i4]Tianhe Yu, Aviral Kumar, Rafael Rafailov, Aravind Rajeswaran, Sergey Levine, Chelsea Finn:
COMBO: Conservative Offline Model-Based Policy Optimization. CoRR abs/2102.08363 (2021) - [i3]Rafael Rafailov, Tianhe Yu, Aravind Rajeswaran, Chelsea Finn:
Visual Adversarial Imitation Learning using Variational Models. CoRR abs/2107.08829 (2021) - 2020
- [i2]Eric Mitchell, Rafael Rafailov, Xue Bin Peng, Sergey Levine, Chelsea Finn:
Offline Meta-Reinforcement Learning with Advantage Weighting. CoRR abs/2008.06043 (2020) - [i1]Rafael Rafailov, Tianhe Yu, Aravind Rajeswaran, Chelsea Finn:
Offline Reinforcement Learning from Images with Latent Space Models. CoRR abs/2012.11547 (2020)
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-08-08 20:10 CEST by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint