default search action
Benjamin Van Roy
Person information
- affiliation: Stanford University, USA
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
showing all ?? records
2020 – today
- 2024
- [c60]Vikranth Dwaracherla, Seyed Mohammad Asghari, Botao Hao, Benjamin Van Roy:
Efficient Exploration for LLMs. ICML 2024 - [c59]Hong Jun Jeon, Jason D. Lee, Qi Lei, Benjamin Van Roy:
An Information-Theoretic Analysis of In-Context Learning. ICML 2024 - [i87]Anmol Kagrecha, Henrik Marklund, Benjamin Van Roy, Hong Jun Jeon, Richard Zeckhauser:
Adaptive Crowdsourcing Via Self-Supervised Learning. CoRR abs/2401.13239 (2024) - [i86]Hong Jun Jeon, Jason D. Lee, Qi Lei, Benjamin Van Roy:
An Information-Theoretic Analysis of In-Context Learning. CoRR abs/2401.15530 (2024) - [i85]Vikranth Dwaracherla, Seyed Mohammad Asghari, Botao Hao, Benjamin Van Roy:
Efficient Exploration for LLMs. CoRR abs/2402.00396 (2024) - [i84]Hong Jun Jeon, Benjamin Van Roy:
Information-Theoretic Foundations for Neural Scaling Laws. CoRR abs/2407.01456 (2024) - [i83]Dilip Arumugam, Wanqiao Xu, Benjamin Van Roy:
Exploration Unbound. CoRR abs/2407.12178 (2024) - [i82]Dilip Arumugam, Saurabh Kumar, Ramki Gummadi, Benjamin Van Roy:
Satisficing Exploration for Deep Reinforcement Learning. CoRR abs/2407.12185 (2024) - [i81]Hong Jun Jeon, Benjamin Van Roy:
Information-Theoretic Foundations for Machine Learning. CoRR abs/2407.12288 (2024) - [i80]Saurabh Kumar, Hong Jun Jeon, Alex Lewandowski, Benjamin Van Roy:
The Need for a Big World Simulator: A Scientific Challenge for Continual Learning. CoRR abs/2408.02930 (2024) - 2023
- [j43]Xiuyuan Lu, Benjamin Van Roy, Vikranth Dwaracherla, Morteza Ibrahimi, Ian Osband, Zheng Wen:
Reinforcement Learning, Bit by Bit. Found. Trends Mach. Learn. 16(6): 733-865 (2023) - [j42]Vikranth Dwaracherla, Zheng Wen, Ian Osband, Xiuyuan Lu, Seyed Mohammad Asghari, Benjamin Van Roy:
Ensembles for Uncertainty Estimation: Benefits of Prior Functions and Bootstrapping. Trans. Mach. Learn. Res. 2023 (2023) - [c58]Yueyang Liu, Benjamin Van Roy, Kuang Xu:
Nonstationary Bandit Learning via Predictive Sampling. AISTATS 2023: 6215-6244 - [c57]Zheqing Zhu, Benjamin Van Roy:
Scalable Neural Contextual Bandit for Recommender Systems. CIKM 2023: 3636-3646 - [c56]Botao Hao, Rahul Jain, Tor Lattimore, Benjamin Van Roy, Zheng Wen:
Leveraging Demonstrations to Improve Online Learning: Quality Matters. ICML 2023: 12527-12545 - [c55]David Abel, André Barreto, Benjamin Van Roy, Doina Precup, Hado Philip van Hasselt, Satinder Singh:
A Definition of Continual Reinforcement Learning. NeurIPS 2023 - [c54]Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy:
Epistemic Neural Networks. NeurIPS 2023 - [c53]Zheqing Zhu, Benjamin Van Roy:
Deep Exploration for Recommendation Systems. RecSys 2023: 963-970 - [c52]Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy:
Approximate Thompson Sampling via Epistemic Neural Networks. UAI 2023: 1586-1595 - [i79]Botao Hao, Rahul Jain, Tor Lattimore, Benjamin Van Roy, Zheng Wen:
Leveraging Demonstrations to Improve Online Learning: Quality Matters. CoRR abs/2302.03319 (2023) - [i78]Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy:
Approximate Thompson Sampling via Epistemic Neural Networks. CoRR abs/2302.09205 (2023) - [i77]Yueyang Liu, Benjamin Van Roy, Kuang Xu:
A Definition of Non-Stationary Bandits. CoRR abs/2302.12202 (2023) - [i76]Dilip Arumugam, Mark K. Ho, Noah D. Goodman, Benjamin Van Roy:
Bayesian Reinforcement Learning with Limited Cognitive Load. CoRR abs/2305.03263 (2023) - [i75]Wanqiao Xu, Shi Dong, Dilip Arumugam, Benjamin Van Roy:
Shattering the Agent-Environment Interface for Fine-Tuning Inclusive Language Models. CoRR abs/2305.11455 (2023) - [i74]Zheqing Zhu, Benjamin Van Roy:
Scalable Neural Contextual Bandit for Recommender Systems. CoRR abs/2306.14834 (2023) - [i73]Saurabh Kumar, Henrik Marklund, Ashish Rao, Yifan Zhu, Hong Jun Jeon, Yueyang Liu, Benjamin Van Roy:
Continual Learning as Computationally Constrained Reinforcement Learning. CoRR abs/2307.04345 (2023) - [i72]David Abel, André Barreto, Hado van Hasselt, Benjamin Van Roy, Doina Precup, Satinder Singh:
On the Convergence of Bounded Agents. CoRR abs/2307.11044 (2023) - [i71]David Abel, André Barreto, Benjamin Van Roy, Doina Precup, Hado van Hasselt, Satinder Singh:
A Definition of Continual Reinforcement Learning. CoRR abs/2307.11046 (2023) - [i70]Saurabh Kumar, Henrik Marklund, Benjamin Van Roy:
Maintaining Plasticity via Regenerative Regularization. CoRR abs/2308.11958 (2023) - [i69]Zheqing Zhu, Yueyang Liu, Xu Kuang, Benjamin Van Roy:
Non-Stationary Contextual Bandit Learning via Neural Predictive Ensemble Sampling. CoRR abs/2310.07786 (2023) - [i68]Wanqiao Xu, Shi Dong, Xiuyuan Lu, Grace Lam, Zheng Wen, Benjamin Van Roy:
RLHF and IIA: Perverse Incentives. CoRR abs/2312.01057 (2023) - 2022
- [j41]Shi Dong, Benjamin Van Roy, Zhengyuan Zhou:
Simple Agent, Complex Environment: Efficient Reinforcement Learning with Agent States. J. Mach. Learn. Res. 23: 255:1-255:54 (2022) - [j40]Daniel Russo, Benjamin Van Roy:
Satisficing in Time-Sensitive Bandit Learning. Math. Oper. Res. 47(4): 2815-2839 (2022) - [c51]Dilip Arumugam, Benjamin Van Roy:
Deciding What to Model: Value-Equivalent Sampling for Reinforcement Learning. NeurIPS 2022 - [c50]Hong Jun Jeon, Benjamin Van Roy:
An Information-Theoretic Framework for Deep Learning. NeurIPS 2022 - [c49]Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Xiuyuan Lu, Morteza Ibrahimi, Dieterich Lawson, Botao Hao, Brendan O'Donoghue, Benjamin Van Roy:
The Neural Testbed: Evaluating Joint Predictions. NeurIPS 2022 - [c48]Chao Qin, Zheng Wen, Xiuyuan Lu, Benjamin Van Roy:
An Analysis of Ensemble Sampling. NeurIPS 2022 - [c47]Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Xiuyuan Lu, Benjamin Van Roy:
Evaluating high-order predictive distributions in deep learning. UAI 2022: 1552-1560 - [i67]Yueyang Liu, Adithya M. Devraj, Benjamin Van Roy, Kuang Xu:
Gaussian Imagination in Bandit Learning. CoRR abs/2201.01902 (2022) - [i66]Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Xiuyuan Lu, Benjamin Van Roy:
Evaluating High-Order Predictive Distributions in Deep Learning. CoRR abs/2202.13509 (2022) - [i65]Hong Jun Jeon, Benjamin Van Roy:
Sample Complexity versus Depth: An Information Theoretic Analysis. CoRR abs/2203.00246 (2022) - [i64]Chao Qin, Zheng Wen, Xiuyuan Lu, Benjamin Van Roy:
An Analysis of Ensemble Sampling. CoRR abs/2203.01303 (2022) - [i63]Yueyang Liu, Benjamin Van Roy, Kuang Xu:
Nonstationary Bandit Learning via Predictive Sampling. CoRR abs/2205.01970 (2022) - [i62]Dilip Arumugam, Benjamin Van Roy:
Between Rate-Distortion Theory & Value Equivalence in Model-Based Reinforcement Learning. CoRR abs/2206.02025 (2022) - [i61]Dilip Arumugam, Benjamin Van Roy:
Deciding What to Model: Value-Equivalent Sampling for Reinforcement Learning. CoRR abs/2206.02072 (2022) - [i60]Vikranth Dwaracherla, Zheng Wen, Ian Osband, Xiuyuan Lu, Seyed Mohammad Asghari, Benjamin Van Roy:
Ensembles for Uncertainty Estimation: Benefits of Prior Functions and Bootstrapping. CoRR abs/2206.03633 (2022) - [i59]Xiuyuan Lu, Ian Osband, Seyed Mohammad Asghari, Sven Gowal, Vikranth Dwaracherla, Zheng Wen, Benjamin Van Roy:
Robustness of Epinets against Distributional Shifts. CoRR abs/2207.00137 (2022) - [i58]Yifan Zhu, Hong Jun Jeon, Benjamin Van Roy:
Is Stochastic Gradient Descent Near Optimal? CoRR abs/2209.08627 (2022) - [i57]Dilip Arumugam, Mark K. Ho, Noah D. Goodman, Benjamin Van Roy:
On Rate-Distortion Theory in Capacity-Limited Cognition & Reinforcement Learning. CoRR abs/2210.16877 (2022) - [i56]Ian Osband, Seyed Mohammad Asghari, Benjamin Van Roy, Nat McAleese, John Aslanides, Geoffrey Irving:
Fine-Tuning Language Models via Epistemic Neural Networks. CoRR abs/2211.01568 (2022) - [i55]Wanqiao Xu, Shi Dong, Benjamin Van Roy:
Posterior Sampling for Continuing Environments. CoRR abs/2211.15931 (2022) - [i54]Hong Jun Jeon, Benjamin Van Roy:
An Information-Theoretic Analysis of Compute-Optimal Neural Scaling Laws. CoRR abs/2212.01365 (2022) - [i53]Dilip Arumugam, Shi Dong, Benjamin Van Roy:
Inclusive Artificial Intelligence. CoRR abs/2212.12633 (2022) - 2021
- [c46]Dilip Arumugam, Benjamin Van Roy:
Deciding What to Learn: A Rate-Distortion Approach. ICML 2021: 373-382 - [c45]Dilip Arumugam, Benjamin Van Roy:
The Value of Information When Deciding What to Learn. NeurIPS 2021: 9816-9827 - [i52]Dilip Arumugam, Benjamin Van Roy:
Deciding What to Learn: A Rate-Distortion Approach. CoRR abs/2101.06197 (2021) - [i51]Shi Dong, Benjamin Van Roy, Zhengyuan Zhou:
Simple Agent, Complex Environment: Efficient Reinforcement Learning with Agent State. CoRR abs/2102.05261 (2021) - [i50]Adithya M. Devraj, Benjamin Van Roy, Kuang Xu:
A Bit Better? Quantifying Information for Bandit Learning. CoRR abs/2102.09488 (2021) - [i49]Xiuyuan Lu, Benjamin Van Roy, Vikranth Dwaracherla, Morteza Ibrahimi, Ian Osband, Zheng Wen:
Reinforcement Learning, Bit by Bit. CoRR abs/2103.04047 (2021) - [i48]Ian Osband, Zheng Wen, Mohammad Asghari, Morteza Ibrahimi, Xiyuan Lu, Benjamin Van Roy:
Epistemic Neural Networks. CoRR abs/2107.08924 (2021) - [i47]Xiuyuan Lu, Ian Osband, Benjamin Van Roy, Zheng Wen:
Evaluating Probabilistic Inference in Deep Learning: Beyond Marginal Predictions. CoRR abs/2107.09224 (2021) - [i46]Zheqing Zhu, Benjamin Van Roy:
Deep Exploration for Recommendation Systems. CoRR abs/2109.12509 (2021) - [i45]Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Botao Hao, Morteza Ibrahimi, Dieterich Lawson, Xiuyuan Lu, Brendan O'Donoghue, Benjamin Van Roy:
Evaluating Predictive Distributions: Does Bayesian Deep Learning Work? CoRR abs/2110.04629 (2021) - [i44]Dilip Arumugam, Benjamin Van Roy:
The Value of Information When Deciding What to Learn. CoRR abs/2110.13973 (2021) - 2020
- [c44]Vikranth Dwaracherla, Xiuyuan Lu, Morteza Ibrahimi, Ian Osband, Zheng Wen, Benjamin Van Roy:
Hypermodels for Exploration. ICLR 2020 - [c43]Ian Osband, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener, Andre Saraiva, Katrina McKinney, Tor Lattimore, Csaba Szepesvári, Satinder Singh, Benjamin Van Roy, Richard S. Sutton, David Silver, Hado van Hasselt:
Behaviour Suite for Reinforcement Learning. ICLR 2020 - [c42]Zheng Wen, Doina Precup, Morteza Ibrahimi, André Barreto, Benjamin Van Roy, Satinder Singh:
On Efficiency in Hierarchical Reinforcement Learning. NeurIPS 2020 - [i43]Vikranth Dwaracherla, Benjamin Van Roy:
Langevin DQN. CoRR abs/2002.07282 (2020) - [i42]Vikranth Dwaracherla, Xiuyuan Lu, Morteza Ibrahimi, Ian Osband, Zheng Wen, Benjamin Van Roy:
Hypermodels for Exploration. CoRR abs/2006.07464 (2020) - [i41]Dilip Arumugam, Benjamin Van Roy:
Randomized Value Functions via Posterior State-Abstraction Sampling. CoRR abs/2010.02383 (2020)
2010 – 2019
- 2019
- [j39]Ian Osband, Benjamin Van Roy, Daniel J. Russo, Zheng Wen:
Deep Exploration via Randomized Value Functions. J. Mach. Learn. Res. 20: 124:1-124:62 (2019) - [c41]Shi Dong, Tengyu Ma, Benjamin Van Roy:
On the Performance of Thompson Sampling on Logistic Bandits. COLT 2019: 1158-1160 - [c40]Xiuyuan Lu, Benjamin Van Roy:
Information-Theoretic Confidence Bounds for Reinforcement Learning. NeurIPS 2019: 2458-2466 - [i40]Shi Dong, Tengyu Ma, Benjamin Van Roy:
On the Performance of Thompson Sampling on Logistic Bandits. CoRR abs/1905.04654 (2019) - [i39]Ian Osband, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener, Andre Saraiva, Katrina McKinney, Tor Lattimore, Csaba Szepesvári, Satinder Singh, Benjamin Van Roy, Richard S. Sutton, David Silver, Hado van Hasselt:
Behaviour Suite for Reinforcement Learning. CoRR abs/1908.03568 (2019) - [i38]Benjamin Van Roy, Shi Dong:
Comments on the Du-Kakade-Wang-Yang Lower Bounds. CoRR abs/1911.07910 (2019) - [i37]Xiuyuan Lu, Benjamin Van Roy:
Information-Theoretic Confidence Bounds for Reinforcement Learning. CoRR abs/1911.09724 (2019) - [i36]Shi Dong, Benjamin Van Roy, Zhengyuan Zhou:
Provably Efficient Reinforcement Learning with Aggregated States. CoRR abs/1912.06366 (2019) - 2018
- [j38]Daniel Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband, Zheng Wen:
A Tutorial on Thompson Sampling. Found. Trends Mach. Learn. 11(1): 1-96 (2018) - [j37]Daniel Russo, Benjamin Van Roy:
Learning to Optimize via Information-Directed Sampling. Oper. Res. 66(1): 230-252 (2018) - [c39]Maria Dimakopoulou, Benjamin Van Roy:
Coordinated Exploration in Concurrent Reinforcement Learning. ICML 2018: 1270-1278 - [c38]Shi Dong, Benjamin Van Roy:
An Information-Theoretic Analysis for Thompson Sampling with Many Actions. NeurIPS 2018: 4161-4169 - [c37]Maria Dimakopoulou, Ian Osband, Benjamin Van Roy:
Scalable Coordinated Exploration in Concurrent Reinforcement Learning. NeurIPS 2018: 4223-4232 - [i35]Maria Dimakopoulou, Benjamin Van Roy:
Coordinated Exploration in Concurrent Reinforcement Learning. CoRR abs/1802.01282 (2018) - [i34]Daniel Russo, Benjamin Van Roy:
Satisficing in Time-Sensitive Bandit Learning. CoRR abs/1803.02855 (2018) - [i33]Maria Dimakopoulou, Ian Osband, Benjamin Van Roy:
Scalable Coordinated Exploration in Concurrent Reinforcement Learning. CoRR abs/1805.08948 (2018) - [i32]Shi Dong, Benjamin Van Roy:
An Information-Theoretic Analysis of Thompson Sampling for Large Action Spaces. CoRR abs/1805.11845 (2018) - 2017
- [j36]Zheng Wen, Benjamin Van Roy:
Efficient Reinforcement Learning in Deterministic Systems with Value Function Generalization. Math. Oper. Res. 42(3): 762-782 (2017) - [c36]Ian Osband, Benjamin Van Roy:
Why is Posterior Sampling Better than Optimism for Reinforcement Learning? ICML 2017: 2701-2710 - [c35]Xiuyuan Lu, Benjamin Van Roy:
Ensemble Sampling. NIPS 2017: 3258-3266 - [c34]Abbas Kazerouni, Mohammad Ghavamzadeh, Yasin Abbasi, Benjamin Van Roy:
Conservative Contextual Linear Bandits. NIPS 2017: 3910-3919 - [i31]Ian Osband, Benjamin Van Roy:
Gaussian-Dirichlet Posterior Dominance in Sequential Learning. CoRR abs/1702.04126 (2017) - [i30]Ian Osband, Daniel Russo, Zheng Wen, Benjamin Van Roy:
Deep Exploration via Randomized Value Functions. CoRR abs/1703.07608 (2017) - [i29]Daniel Russo, David Tse, Benjamin Van Roy:
Time-Sensitive Bandit Learning and Satisficing Thompson Sampling. CoRR abs/1704.09028 (2017) - [i28]Xiuyuan Lu, Benjamin Van Roy:
Ensemble Sampling. CoRR abs/1705.07347 (2017) - [i27]Ian Osband, Benjamin Van Roy:
On Optimistic versus Randomized Exploration in Reinforcement Learning. CoRR abs/1706.04241 (2017) - [i26]Daniel Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband:
A Tutorial on Thompson Sampling. CoRR abs/1707.02038 (2017) - [i25]Abbas Kazerouni, Benjamin Van Roy:
Learning to Price with Reference Effects. CoRR abs/1708.09020 (2017) - 2016
- [j35]Daniel Russo, Benjamin Van Roy:
An Information-Theoretic Analysis of Thompson Sampling. J. Mach. Learn. Res. 17: 68:1-68:30 (2016) - [c33]Ian Osband, Benjamin Van Roy, Zheng Wen:
Generalization and Exploration via Randomized Value Functions. ICML 2016: 2377-2386 - [c32]Ian Osband, Charles Blundell, Alexander Pritzel, Benjamin Van Roy:
Deep Exploration via Bootstrapped DQN. NIPS 2016: 4026-4034 - [i24]Ian Osband, Charles Blundell, Alexander Pritzel, Benjamin Van Roy:
Deep Exploration via Bootstrapped DQN. CoRR abs/1602.04621 (2016) - [i23]Ian Osband, Benjamin Van Roy:
Why is Posterior Sampling Better than Optimism for Reinforcement Learning. CoRR abs/1607.00215 (2016) - [i22]Ian Osband, Benjamin Van Roy:
Posterior Sampling for Reinforcement Learning Without Episodes. CoRR abs/1608.02731 (2016) - [i21]Ian Osband, Benjamin Van Roy:
On Lower Bounds for Regret in Reinforcement Learning. CoRR abs/1608.02732 (2016) - [i20]Abbas Kazerouni, Mohammad Ghavamzadeh, Benjamin Van Roy:
Conservative Contextual Linear Bandits. CoRR abs/1611.06426 (2016) - 2015
- [j34]Beomsoo Park, Benjamin Van Roy:
Adaptive Execution: Exploration and Learning of Price Impact. Oper. Res. 63(5): 1058-1076 (2015) - [i19]Ian Osband, Benjamin Van Roy:
Bootstrapped Thompson Sampling and Deep Exploration. CoRR abs/1507.00300 (2015) - 2014
- [j33]Yi-Hao Kao, Benjamin Van Roy:
Directed Principal Component Analysis. Oper. Res. 62(4): 957-972 (2014) - [j32]Daniel Russo, Benjamin Van Roy:
Learning to Optimize via Posterior Sampling. Math. Oper. Res. 39(4): 1221-1243 (2014) - [c31]Ian Osband, Benjamin Van Roy:
Near-optimal Reinforcement Learning in Factored MDPs. NIPS 2014: 604-612 - [c30]Ian Osband, Benjamin Van Roy:
Model-based Reinforcement Learning and the Eluder Dimension. NIPS 2014: 1466-1474 - [c29]Daniel Russo, Benjamin Van Roy:
Learning to Optimize via Information-Directed Sampling. NIPS 2014: 1583-1591 - [i18]Benjamin Van Roy, Zheng Wen:
Generalization and Exploration via Randomized Value Functions. CoRR abs/1402.0635 (2014) - [i17]Ian Osband, Benjamin Van Roy:
Near-optimal Regret Bounds for Reinforcement Learning in Factored MDPs. CoRR abs/1403.3741 (2014) - [i16]Daniel Russo, Benjamin Van Roy:
An Information-Theoretic Analysis of Thompson Sampling. CoRR abs/1403.5341 (2014) - [i15]Daniel Russo, Benjamin Van Roy:
Learning to Optimize Via Information Directed Sampling. CoRR abs/1403.5556 (2014) - [i14]Ian Osband, Benjamin Van Roy:
Model-based Reinforcement Learning and the Eluder Dimension. CoRR abs/1406.1853 (2014) - 2013
- [j31]Yi-Hao Kao, Benjamin Van Roy:
Learning a factor model via regularized PCA. Mach. Learn. 91(3): 279-303 (2013) - [c28]Daniel Russo, Benjamin Van Roy:
Eluder Dimension and the Sample Complexity of Optimistic Exploration. NIPS 2013: 2256-2264 - [c27]Ian Osband, Daniel Russo, Benjamin Van Roy:
(More) Efficient Reinforcement Learning via Posterior Sampling. NIPS 2013: 3003-3011 - [c26]Zheng Wen, Benjamin Van Roy:
Efficient Exploration and Value Function Generalization in Deterministic Systems. NIPS 2013: 3021-3029 - [i13]Paat Rusmevichientong, Benjamin Van Roy:
A Tractable POMDP for a Class of Sequencing Problems. CoRR abs/1301.2308 (2013) - [i12]Daniel Russo, Benjamin Van Roy:
Learning to Optimize Via Posterior Sampling. CoRR abs/1301.2609 (2013) - [i11]