


default search action
Richard S. Sutton
Richard Sutton 0001
Person information
- affiliation: DeepMind Alberta, Edmonton, AB, Canada
- affiliation: University of Alberta, Department of Computing Science, Edmonton, AB, Canada
- affiliation (PhD 1984): University of Massachusetts Amherst, MA, USA
Other persons with the same name
- Richard Sutton 0002 — Skyhook Wireless, Boston, MA, USA
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
showing all ?? records
2020 – today
- 2024
- [j29]Khurram Javed, Arsalan Sharifnassab, Richard S. Sutton:
SwiftTD: A Fast and Robust Algorithm for Temporal Difference Learning. RLJ 2: 840-863 (2024) - [j28]Kris De Asis, Richard S. Sutton:
An Idiosyncrasy of Time-discretization in Reinforcement Learning. RLJ 3: 1306-1316 (2024) - [j27]Abhishek Naik, Yi Wan, Manan Tomar, Richard S. Sutton:
Reward Centering. RLJ 4: 1995-2016 (2024) - [j26]Shibhansh Dohare
, J. Fernando Hernandez-Garcia, Qingfeng Lan
, Parash Rahman
, A. Rupam Mahmood
, Richard S. Sutton:
Loss of plasticity in deep continual learning. Nat. 632(8026): 768-774 (2024) - [c87]Richard S. Sutton, Marlos C. Machado, G. Zacharias Holland, David Szepesvari, Finbarr Timbers, Brian Tanner, Adam White:
Reward-Respecting Subtasks for Model-Based Reinforcement Learning (Abstract Reprint). AAAI 2024: 22713 - [i73]Thomas Degris, Khurram Javed, Arsalan Sharifnassab, Yuxin Liu, Richard S. Sutton:
Step-size Optimization for Continual Learning. CoRR abs/2401.17401 (2024) - [i72]Arsalan Sharifnassab, Saber Salehkaleybar, Richard S. Sutton:
MetaOptimize: A Framework for Optimizing Step Sizes and Other Meta-parameters. CoRR abs/2402.02342 (2024) - [i71]Abhishek Naik, Yi Wan, Manan Tomar, Richard S. Sutton:
Reward Centering. CoRR abs/2405.09999 (2024) - [i70]Kris De Asis, Richard S. Sutton:
An Idiosyncrasy of Time-discretization in Reinforcement Learning. CoRR abs/2406.14951 (2024) - [i69]Yi Wan, Huizhen Yu, Richard S. Sutton:
On Convergence of Average-Reward Q-Learning in Weakly Communicating Markov Decision Processes. CoRR abs/2408.16262 (2024) - [i68]Huizhen Yu, Yi Wan, Richard S. Sutton:
Asynchronous Stochastic Approximation and Average-Reward Reinforcement Learning. CoRR abs/2409.03915 (2024) - 2023
- [j25]Banafsheh Rafiee
, Zaheer Abbas, Sina Ghiassian, Raksha Kumaraswamy, Richard S. Sutton, Elliot A. Ludvig
, Adam White:
From eye-blinks to state construction: Diagnostic benchmarks for online representation learning. Adapt. Behav. 31(1): 3-19 (2023) - [j24]Richard S. Sutton, Marlos C. Machado
, G. Zacharias Holland, David Szepesvari, Finbarr Timbers, Brian Tanner, Adam White:
Reward-respecting subtasks for model-based reinforcement learning. Artif. Intell. 324: 104001 (2023) - [j23]Khurram Javed, Haseeb Shah
, Richard S. Sutton, Martha White:
Scalable Real-Time Recurrent Learning Using Columnar-Constructive Networks. J. Mach. Learn. Res. 24: 256:1-256:34 (2023) - [j22]Kory W. Mathewson
, Adam S. R. Parker, Craig Sherstan, Ann L. Edwards, Richard S. Sutton, Patrick M. Pilarski
:
Communicative capital: a key resource for human-machine shared agency and collaborative capacity. Neural Comput. Appl. 35(23): 16805-16819 (2023) - [c86]Banafsheh Rafiee, Sina Ghiassian, Jun Jin, Richard S. Sutton, Jun Luo, Adam White:
Auxiliary task discovery through generate-and-test. CoLLAs 2023: 703-714 - [c85]Kristopher De Asis, Eric Graves, Richard S. Sutton:
Value-aware Importance Weighting for Off-policy Reinforcement Learning. CoLLAs 2023: 745-763 - [c84]Arsalan Sharifnassab, Richard S. Sutton:
Toward Efficient Gradient-Based Value Estimation. ICML 2023: 30827-30849 - [i67]Arsalan Sharifnassab, Richard Sutton:
Toward Efficient Gradient-Based Value Estimation. CoRR abs/2301.13757 (2023) - [i66]Khurram Javed, Haseeb Shah, Richard S. Sutton, Martha White:
Online Real-Time Recurrent Learning Using Sparse Connections and Selective Learning. CoRR abs/2302.05326 (2023) - [i65]Shibhansh Dohare, J. Fernando Hernandez-Garcia, Parash Rahman, Richard S. Sutton, A. Rupam Mahmood:
Maintaining Plasticity in Deep Continual Learning. CoRR abs/2306.13812 (2023) - [i64]Kristopher De Asis, Eric Graves, Richard S. Sutton:
Value-aware Importance Weighting for Off-policy Reinforcement Learning. CoRR abs/2306.15625 (2023) - [i63]Kenny Young, Richard S. Sutton:
Iterative Option Discovery for Planning, by Planning. CoRR abs/2310.01569 (2023) - [i62]Huizhen Yu, Yi Wan, Richard S. Sutton:
A Note on Stability in Asynchronous Stochastic Approximation without Communication Delays. CoRR abs/2312.15091 (2023) - 2022
- [c83]Tian Tian, Kenny Young, Richard S. Sutton:
Doubly-Asynchronous Value Iteration: Making Value Iteration Asynchronous in Actions. NeurIPS 2022 - [i61]Richard S. Sutton, Marlos C. Machado, G. Zacharias Holland, David Szepesvari, Finbarr Timbers, Brian Tanner, Adam White:
Reward-Respecting Subtasks for Model-Based Reinforcement Learning. CoRR abs/2202.03466 (2022) - [i60]Richard S. Sutton:
A History of Meta-gradient: Gradient Methods for Meta-learning. CoRR abs/2202.09701 (2022) - [i59]Richard S. Sutton:
The Quest for a Common Model of the Intelligent Decision Maker. CoRR abs/2202.13252 (2022) - [i58]Yi Wan, Richard S. Sutton:
Toward Discovering Options that Achieve Faster Planning. CoRR abs/2205.12515 (2022) - [i57]Tian Tian, Kenny Young, Richard S. Sutton:
Doubly-Asynchronous Value Iteration: Making Value Iteration Asynchronous in Actions. CoRR abs/2207.01613 (2022) - [i56]Richard S. Sutton, Michael H. Bowling, Patrick M. Pilarski
:
The Alberta Plan for AI Research. CoRR abs/2208.11173 (2022) - [i55]Yi Wan, Richard S. Sutton:
On Convergence of Average-Reward Off-Policy Control Algorithms in Weakly-Communicating MDPs. CoRR abs/2209.15141 (2022) - [i54]Banafsheh Rafiee, Sina Ghiassian, Jun Jin, Richard S. Sutton, Jun Luo, Adam White:
Auxiliary task discovery through generate-and-test. CoRR abs/2210.14361 (2022) - 2021
- [j21]David Silver
, Satinder Singh, Doina Precup, Richard S. Sutton
:
Reward is enough. Artif. Intell. 299: 103535 (2021) - [j20]Jae Young Lee, Richard S. Sutton:
Policy iterations for reinforcement learning problems in continuous time and space - Fundamental theory and methods. Autom. 126: 109421 (2021) - [j19]Andrew G. Barto
, Richard S. Sutton, Charles W. Anderson
:
Looking Back on the Actor-Critic Architecture. IEEE Trans. Syst. Man Cybern. Syst. 51(1): 40-50 (2021) - [c82]Yi Wan, Abhishek Naik, Richard S. Sutton:
Learning and Planning in Average-Reward Markov Decision Processes. ICML 2021: 10653-10662 - [c81]Shangtong Zhang, Yi Wan, Richard S. Sutton, Shimon Whiteson:
Average-Reward Off-Policy Policy Evaluation with Function Approximation. ICML 2021: 12578-12588 - [c80]Yi Wan, Abhishek Naik, Richard S. Sutton:
Average-Reward Learning and Planning with Options. NeurIPS 2021: 22758-22769 - [i53]Shangtong Zhang, Yi Wan, Richard S. Sutton, Shimon Whiteson:
Average-Reward Off-Policy Policy Evaluation with Function Approximation. CoRR abs/2101.02808 (2021) - [i52]Dylan R. Ashley
, Sina Ghiassian, Richard S. Sutton:
Does Standard Backpropagation Forget Less Catastrophically Than Adam? CoRR abs/2102.07686 (2021) - [i51]Khurram Javed, Martha White, Richard S. Sutton:
Scalable Online Recurrent Learning Using Columnar Neural Networks. CoRR abs/2103.05787 (2021) - [i50]Katya Kudashkina, Yi Wan, Abhishek Naik, Richard S. Sutton:
Planning with Expectation Models for Control. CoRR abs/2104.08543 (2021) - [i49]Sina Ghiassian, Richard S. Sutton:
An Empirical Comparison of Off-policy Prediction Learning Algorithms on the Collision Task. CoRR abs/2106.00922 (2021) - [i48]Shibhansh Dohare, A. Rupam Mahmood, Richard S. Sutton:
Continual Backprop: Stochastic Gradient Descent with Persistent Randomness. CoRR abs/2108.06325 (2021) - [i47]Sina Ghiassian, Richard S. Sutton:
An Empirical Comparison of Off-policy Prediction Learning Algorithms in the Four Rooms Environment. CoRR abs/2109.05110 (2021) - [i46]Yi Wan, Abhishek Naik, Richard S. Sutton:
Average-Reward Learning and Planning with Options. CoRR abs/2110.13855 (2021) - [i45]Amir Samani, Richard S. Sutton:
Learning Agent State Online with Recurrent Generate-and-Test. CoRR abs/2112.15236 (2021) - 2020
- [j18]Dagmar Monett, Colin W. P. Lewis, Kristinn R. Thórisson, Joscha Bach, Gianluca Baldassarre, Giovanni Granato, Istvan S. N. Berkeley, François Chollet, Matthew Crosby, Henry Shevlin, John F. Sowa, John E. Laird
, Shane Legg, Peter Lindes, Tomás Mikolov, William J. Rapaport, Raúl Rojas, Marek Rosa, Peter Stone, Richard S. Sutton, Roman V. Yampolskiy, Pei Wang, Roger C. Schank, Aaron Sloman, Alan F. T. Winfield:
Special Issue "On Defining Artificial Intelligence" - Commentaries and Author's Response. J. Artif. Gen. Intell. 11(2): 1-100 (2020) - [c79]Kristopher De Asis, Alan Chan, Silviu Pitis, Richard S. Sutton, Daniel Graves:
Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning. AAAI 2020: 3741-3748 - [c78]Ian Osband, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener, Andre Saraiva, Katrina McKinney, Tor Lattimore, Csaba Szepesvári, Satinder Singh, Benjamin Van Roy, Richard S. Sutton, David Silver, Hado van Hasselt:
Behaviour Suite for Reinforcement Learning. ICLR 2020 - [i44]Yi Wan, Abhishek Naik, Richard S. Sutton:
Learning and Planning in Average-Reward Markov Decision Processes. CoRR abs/2006.16318 (2020) - [i43]Alan Chan, Kristopher De Asis, Richard S. Sutton:
Inverse Policy Evaluation for Value-based Sequential Decision-making. CoRR abs/2008.11329 (2020) - [i42]Katya Kudashkina, Patrick M. Pilarski, Richard S. Sutton:
Document-editing Assistants and Model-based Reinforcement Learning as a Path to Conversational AI. CoRR abs/2008.12095 (2020) - [i41]Kenny Young, Richard S. Sutton:
Understanding the Pathologies of Approximate Policy Evaluation when Combined with Greedification in Reinforcement Learning. CoRR abs/2010.15268 (2020)
2010 – 2019
- 2019
- [c77]Banafsheh Rafiee, Sina Ghiassian, Adam White, Richard S. Sutton:
Prediction in Intelligence: An Empirical Comparison of Off-policy Algorithms on Robots. AAMAS 2019: 332-340 - [c76]Tian Tian
, Richard S. Sutton:
Extending Sliding-Step Importance Weighting from Supervised Learning to Reinforcement Learning. IJCAI 2019: 67-82 - [c75]Yi Wan, Muhammad Zaheer, Adam White, Martha White, Richard S. Sutton:
Planning with Expectation Models. IJCAI 2019: 3649-3655 - [i40]J. Fernando Hernandez-Garcia, Richard S. Sutton:
Understanding Multi-Step Deep Reinforcement Learning: A Systematic Study of the DQN Target. CoRR abs/1901.07510 (2019) - [i39]Xiang Gu, Sina Ghiassian, Richard S. Sutton:
Should All Temporal Difference Learning Use Emphasis? CoRR abs/1903.00194 (2019) - [i38]Alexandra Kearney, Vivek Veeriah, Jaden B. Travnik, Patrick M. Pilarski, Richard S. Sutton:
Learning Feature Relevance Through Step Size Adaptation in Temporal-Difference Learning. CoRR abs/1903.03252 (2019) - [i37]Yi Wan, Muhammad Zaheer, Adam White, Martha White, Richard S. Sutton:
Planning with Expectation Models. CoRR abs/1904.01191 (2019) - [i36]Ian Osband, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener, Andre Saraiva, Katrina McKinney, Tor Lattimore, Csaba Szepesvári, Satinder Singh, Benjamin Van Roy, Richard S. Sutton, David Silver, Hado van Hasselt:
Behaviour Suite for Reinforcement Learning. CoRR abs/1908.03568 (2019) - [i35]Kristopher De Asis, Alan Chan, Silviu Pitis, Richard S. Sutton, Daniel Graves:
Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning. CoRR abs/1909.03906 (2019) - [i34]Abhishek Naik, Roshan Shariff, Niko Yasui, Richard S. Sutton:
Discounted Reinforcement Learning is Not an Optimization Problem. CoRR abs/1910.02140 (2019) - [i33]J. Fernando Hernandez-Garcia, Richard S. Sutton:
Learning Sparse Representations Incrementally in Deep Reinforcement Learning. CoRR abs/1912.04002 (2019) - 2018
- [j17]Jaden B. Travnik, Kory W. Mathewson, Richard S. Sutton, Patrick M. Pilarski
:
Reactive Reinforcement Learning in Asynchronous Environments. Frontiers Robotics AI 5: 79 (2018) - [j16]Huizhen Yu, Ashique Rupam Mahmood, Richard S. Sutton:
On Generalized Bellman Equations and Temporal-Difference Learning. J. Mach. Learn. Res. 19: 48:1-48:49 (2018) - [c74]Kristopher De Asis, J. Fernando Hernandez-Garcia, G. Zacharias Holland, Richard S. Sutton:
Multi-Step Reinforcement Learning: A Unifying Algorithm. AAAI 2018: 2902-2909 - [c73]Craig Sherstan, Dylan R. Ashley, Brendan Bennett, Kenny Young, Adam White, Martha White, Richard S. Sutton:
Comparing Direct and Indirect Temporal-Difference Methods for Estimating the Variance of the Return. UAI 2018: 63-72 - [c72]Kristopher De Asis, Richard S. Sutton:
Per-decision Multi-step Temporal Difference Learning with Control Variates. UAI 2018: 786-794 - [i32]Craig Sherstan, Brendan Bennett, Kenny Young, Dylan R. Ashley
, Adam White, Martha White, Richard S. Sutton:
Directly Estimating the Variance of the λ-Return Using Temporal-Difference Methods. CoRR abs/1801.08287 (2018) - [i31]Jaden B. Travnik, Kory W. Mathewson, Richard S. Sutton, Patrick M. Pilarski:
Reactive Reinforcement Learning in Asynchronous Environments. CoRR abs/1802.06139 (2018) - [i30]Alexandra Kearney, Vivek Veeriah, Jaden B. Travnik, Richard S. Sutton, Patrick M. Pilarski:
TIDBD: Adapting Temporal-difference Step-sizes Through Stochastic Meta-descent. CoRR abs/1804.03334 (2018) - [i29]Sina Ghiassian, Huizhen Yu, Banafsheh Rafiee, Richard S. Sutton:
Two geometric input transformation methods for fast online reinforcement learning with neural nets. CoRR abs/1805.07476 (2018) - [i28]Kenny J. Young, Richard S. Sutton, Shuo Yang:
Integrating Episodic Memory into a Reinforcement Learning Agent using Reservoir Sampling. CoRR abs/1806.00540 (2018) - [i27]Kristopher De Asis, Richard S. Sutton:
Per-decision Multi-step Temporal Difference Learning with Control Variates. CoRR abs/1807.01830 (2018) - [i26]Kristopher De Asis, Brendan Bennett, Richard S. Sutton:
Predicting Periodicity with Temporal Difference Learning. CoRR abs/1809.07435 (2018) - [i25]Sina Ghiassian, Andrew Patterson, Martha White, Richard S. Sutton, Adam White:
Online Off-policy Prediction. CoRR abs/1811.02597 (2018) - 2017
- [c71]Huizhen Yu, Ashique Rupam Mahmood, Richard S. Sutton:
On Generalized Bellman Equations and Temporal-Difference Learning. Canadian AI 2017: 3-14 - [c70]Vivek Veeriah, Harm van Seijen, Richard S. Sutton:
Forward Actor-Critic for Nonlinear Function Approximation in Reinforcement Learning. AAMAS 2017: 556-564 - [c69]Vivek Veeriah, Shangtong Zhang, Richard S. Sutton:
Crossprop: Learning Representations by Stochastic Meta-Gradient Descent in Neural Networks. ECML/PKDD (1) 2017: 445-459 - [i24]Ashique Rupam Mahmood, Huizhen Yu, Richard S. Sutton:
Multi-step Off-policy Learning Without Importance Sampling Ratios. CoRR abs/1702.03006 (2017) - [i23]Kristopher De Asis, J. Fernando Hernandez-Garcia, G. Zacharias Holland, Richard S. Sutton:
Multi-step Reinforcement Learning: A Unifying Algorithm. CoRR abs/1703.01327 (2017) - [i22]Huizhen Yu, Ashique Rupam Mahmood, Richard S. Sutton:
On Generalized Bellman Equations and Temporal-Difference Learning. CoRR abs/1704.04463 (2017) - [i21]Jae Young Lee, Richard S. Sutton:
Integral Policy Iterations for Reinforcement Learning Problems in Continuous Time and Space. CoRR abs/1705.03520 (2017) - [i20]Adam White, Richard S. Sutton:
GQ($λ$) Quick Reference and Implementation Guide. CoRR abs/1705.03967 (2017) - [i19]Sina Ghiassian, Banafsheh Rafiee, Richard S. Sutton:
A First Empirical Study of Emphatic Temporal Difference Learning. CoRR abs/1705.04185 (2017) - [i18]Patrick M. Pilarski, Richard S. Sutton, Kory W. Mathewson, Craig Sherstan, Adam S. R. Parker, Ann L. Edwards:
Communicative Capital for Prosthetic Agents. CoRR abs/1711.03676 (2017) - [i17]Shangtong Zhang, Richard S. Sutton:
A Deeper Look at Experience Replay. CoRR abs/1712.01275 (2017) - 2016
- [j15]Richard S. Sutton, Ashique Rupam Mahmood, Martha White:
An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning. J. Mach. Learn. Res. 17: 73:1-73:29 (2016) - [j14]Harm van Seijen, Ashique Rupam Mahmood, Patrick M. Pilarski, Marlos C. Machado
, Richard S. Sutton:
True Online Temporal-Difference Learning. J. Mach. Learn. Res. 17: 145:1-145:40 (2016) - [i16]Vivek Veeriah, Patrick M. Pilarski, Richard S. Sutton:
Face valuing: Training user interfaces with facial expressions and reinforcement learning. CoRR abs/1606.02807 (2016) - [i15]Susan A. Murphy, Yanzhen Deng, Eric B. Laber, Hamid Reza Maei, Richard S. Sutton, Katie Witkiewitz:
A Batch, Off-Policy, Actor-Critic Algorithm for Optimizing the Average Reward. CoRR abs/1607.05047 (2016) - [i14]Richard S. Sutton, Vivek Veeriah:
Learning representations through stochastic gradient descent in cross-validation error. CoRR abs/1612.02879 (2016) - 2015
- [c68]Harm Vanseijen, Richard S. Sutton:
A Deeper Look at Planning as Learning from Replay. ICML 2015: 2314-2322 - [c67]Ashique Rupam Mahmood, Richard S. Sutton:
Off-policy learning based on weighted importance sampling with linear computational complexity. UAI 2015: 552-561 - [i13]Richard S. Sutton, Ashique Rupam Mahmood, Martha White:
An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning. CoRR abs/1503.04269 (2015) - [i12]Richard S. Sutton, Brian Tanner:
Temporal-Difference Networks. CoRR abs/1504.05539 (2015) - [i11]Harm van Seijen, Ashique Rupam Mahmood, Patrick M. Pilarski, Richard S. Sutton:
An Empirical Evaluation of True Online TD(λ). CoRR abs/1507.00353 (2015) - [i10]Ashique Rupam Mahmood, Huizhen Yu, Martha White, Richard S. Sutton:
Emphatic Temporal-Difference Learning. CoRR abs/1507.01569 (2015) - [i9]Richard S. Sutton:
True Online Emphatic TD(λ): Quick Reference and Implementation Guide. CoRR abs/1507.07147 (2015) - [i8]Hado van Hasselt, Richard S. Sutton:
Learning to Predict Independent of Span. CoRR abs/1508.04582 (2015) - [i7]Harm van Seijen, Ashique Rupam Mahmood, Patrick M. Pilarski, Marlos C. Machado, Richard S. Sutton:
True Online Temporal-Difference Learning. CoRR abs/1512.04087 (2015) - 2014
- [j13]Joseph Modayil, Adam White, Richard S. Sutton:
Multi-timescale nexting in a reinforcement learning robot. Adapt. Behav. 22(2): 146-160 (2014) - [c66]Richard S. Sutton, Ashique Rupam Mahmood, Doina Precup, Hado van Hasselt:
A new Q(lambda) with interim forward view and Monte Carlo equivalence. ICML 2014: 568-576 - [c65]Harm van Seijen, Richard S. Sutton:
True Online TD(lambda). ICML 2014: 692-700 - [c64]Hengshuai Yao, Csaba Szepesvári, Richard S. Sutton, Joseph Modayil, Shalabh Bhatnagar:
Universal Option Models. NIPS 2014: 990-998 - [c63]Ashique Rupam Mahmood, Hado van Hasselt, Richard S. Sutton:
Weighted importance sampling for off-policy learning with linear function approximation. NIPS 2014: 3014-3022 - [c62]Hado van Hasselt, Ashique Rupam Mahmood, Richard S. Sutton:
Off-policy TD( l) with a true online equivalence. UAI 2014: 330-339 - 2013
- [j12]Patrick M. Pilarski
, Michael Rory Dawson
, Thomas Degris, Jason P. Carey
, K. Ming Chan
, Jacqueline S. Hebert
, Richard S. Sutton:
Adaptive Artificial Limbs: A Real-Time Approach to Prediction and Anticipation. IEEE Robotics Autom. Mag. 20(1): 53-64 (2013) - [c61]Ashique Rupam Mahmood, Richard S. Sutton:
Representation Search through Generate and Test. AAAI Workshop: Learning Rich Representations from Low-Level Sensors 2013 - [c60]David Silver, Richard S. Sutton, Martin Müller:
Temporal-Difference Search in Computer Go. ICAPS 2013 - [c59]Harm van Seijen, Richard S. Sutton:
Planning by Prioritized Sweeping with Small Backups. ICML (3) 2013: 361-369 - [c58]Patrick M. Pilarski
, Travis B. Dick, Richard S. Sutton:
Real-time prediction learning for the simultaneous actuation of multiple prosthetic joints. ICORR 2013: 1-8 - [c57]Ashique Rupam Mahmood, Richard S. Sutton:
Position Paper: Representation Search through Generate and Test. SARA 2013 - [i6]Harm van Seijen, Richard S. Sutton:
Planning by Prioritized Sweeping with Small Backups. CoRR abs/1301.2343 (2013) - [i5]Ann L. Edwards, Alexandra Kearney, Michael Rory Dawson
, Richard S. Sutton, Patrick M. Pilarski:
Temporal-Difference Learning to Assist Human Decision Making during the Control of an Artificial Limb. CoRR abs/1309.4714 (2013) - 2012
- [j11]David Silver, Richard S. Sutton, Martin Müller:
Temporal-difference search in computer Go. Mach. Learn. 87(2): 183-219 (2012) - [c56]Patrick M. Pilarski, Richard S. Sutton:
Between Instruction and Reward: Human-Prompted Switching. AAAI Fall Symposium: Robots Learning Interactively from Human Teachers 2012 - [c55]Thomas Degris, Patrick M. Pilarski
, Richard S. Sutton:
Model-Free reinforcement learning with continuous action in practice. ACC 2012: 2177-2182 - [c54]Ashique Rupam Mahmood, Richard S. Sutton, Thomas Degris, Patrick M. Pilarski
:
Tuning-free step-size adaptation. ICASSP 2012: 2121-2124 - [c53]