default search action
BibTeX record journals/corr/abs-2401-00243
@article{DBLP:journals/corr/abs-2401-00243, author = {Yuanzhao Zhai and Han Zhang and Yu Lei and Yue Yu and Kele Xu and Dawei Feng and Bo Ding and Huaimin Wang}, title = {Uncertainty-Penalized Reinforcement Learning from Human Feedback with Diverse Reward LoRA Ensembles}, journal = {CoRR}, volume = {abs/2401.00243}, year = {2024}, url = {https://doi.org/10.48550/arXiv.2401.00243}, doi = {10.48550/ARXIV.2401.00243}, eprinttype = {arXiv}, eprint = {2401.00243}, timestamp = {Mon, 15 Jan 2024 16:37:16 +0100}, biburl = {https://dblp.org/rec/journals/corr/abs-2401-00243.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.