Stop the war!

Остановите войну!

for scientists:

default search action

combined dblp search
author search
venue search
publication search

ask others

A. Rupam Mahmood

Rupam Mahmood – Ashique Rupam Mahmood

> Home > Persons

Person information

affiliation: University of Alberta, Reinforcement Learning & Artificial Intelligence Lab, Edmonton, AB, Canada
affiliation: Alberta Machine Intelligence Institute (Amii), Edmonton, AB, Canada
affiliation: Kindred AI, Toronto, ON, Canada

Refine list

refinements active!

zoomed in on ?? of ?? records

view refined list in

export refined list as

showing all ?? records

2020 – today

see FAQ

What is the meaning of the colors in the publication lists?

2024
[c24]
- view
  authority control:
- export record
  dblp key:
  - conf/atal/GrootenTVTMFPM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/atal/GrootenTVTMFPM24
Bram Grooten, Tristan Tomilin, Gautham Vasan, Matthew E. Taylor, A. Rupam Mahmood, Meng Fang, Mykola Pechenizkiy, Decebal Constantin Mocanu:
MaDi: Learning to Mask Distractions for Generalization in Visual Deep Reinforcement Learning. AAMAS 2024: 733-742
[c23]
- view
  - electronic edition @ openreview.net (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/iclr/0003M24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/iclr/0003M24
Mohamed Elsayed, A. Rupam Mahmood:
Addressing Loss of Plasticity and Catastrophic Forgetting in Continual Learning. ICLR 2024
[c22]
- view
  - electronic edition @ openreview.net (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/iclr/IshfaqL0MPAA24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/iclr/IshfaqL0MPAA24
Haque Ishfaq, Qingfeng Lan, Pan Xu, A. Rupam Mahmood, Doina Precup, Anima Anandkumar, Kamyar Azizzadenesheli:
Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo. ICLR 2024
[i35]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2404-00781
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2404-00781
Mohamed Elsayed, A. Rupam Mahmood:
Addressing Loss of Plasticity and Catastrophic Forgetting in Continual Learning. CoRR abs/2404.00781 (2024)
[i34]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2405-21043
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2405-21043
Fengdi Che, Chenjun Xiao, Jincheng Mei, Bo Dai, Ramki Gummadi, Oscar A. Ramirez, Christopher K. Harris, A. Rupam Mahmood, Dale Schuurmans:
Target Networks and Over-parameterization Stabilize Off-policy Bootstrapping with Function Approximation. CoRR abs/2405.21043 (2024)
[i33]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2406-03276
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2406-03276
Mohamed Elsayed, Homayoon Farrahi, Felix Dangel, A. Rupam Mahmood:
Revisiting Scalable Hessian Diagonal Approximations for Applications in Reinforcement Learning. CoRR abs/2406.03276 (2024)
[i32]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2406-12241
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2406-12241
Haque Ishfaq, Yixin Tan, Yu Yang, Qingfeng Lan, Jianfeng Lu, A. Rupam Mahmood, Doina Precup, Pan Xu:
More Efficient Randomized Exploration for Reinforcement Learning via Approximate Sampling. CoRR abs/2406.12241 (2024)
[i31]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2407-00324
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2407-00324
Gautham Vasan, Yan Wang, Fahim Shahriar, James Bergstra, Martin Jägersand, A. Rupam Mahmood:
Revisiting Sparse Rewards for Goal-Reaching Reinforcement Learning. CoRR abs/2407.00324 (2024)
[i30]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2407-01704
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2407-01704
Mohamed Elsayed, Qingfeng Lan, Clare Lyle, A. Rupam Mahmood:
Weight Clipping for Deep Continual and Reinforcement Learning. CoRR abs/2407.01704 (2024)
2023
[j6]
- view
  - electronic edition @ openreview.net (open access)
  - no references & citations available
- export record
  dblp key:
  - journals/tmlr/LanPLM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/tmlr/LanPLM23
Qingfeng Lan, Yangchen Pan, Jun Luo, A. Rupam Mahmood:
Memory-efficient Reinforcement Learning with Value-based Knowledge Consolidation. Trans. Mach. Learn. Res. 2023 (2023)
[c21]
- view
  - electronic edition @ mlr.press (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/icml/CheVM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icml/CheVM23
Fengdi Che, Gautham Vasan, A. Rupam Mahmood:
Correcting discount-factor mismatch in on-policy policy gradient methods. ICML 2023: 4218-4240
[c20]
- view
  authority control:
- export record
  dblp key:
  - conf/icra/WangVM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icra/WangVM23
Yan Wang, Gautham Vasan, A. Rupam Mahmood:
Real-Time Reinforcement Learning for Vision-Based Robotics Utilizing Local and Remote Computers. ICRA 2023: 9435-9441
[c19]
- view
  authority control:
- export record
  dblp key:
  - conf/ijcnn/FarrahiM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/ijcnn/FarrahiM23
Homayoon Farrahi, A. Rupam Mahmood:
Reducing the Cost of Cycle-Time Tuning for Real-World Policy Optimization. IJCNN 2023: 1-8
[c18]
- view
  authority control:
- export record
  dblp key:
  - conf/iros/Karimi00MJT23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/iros/Karimi00MJT23
Amirmohammad Karimi, Jun Jin, Jun Luo, A. Rupam Mahmood, Martin Jägersand, Samuele Tosatto:
Dynamic Decision Frequency with Continuous Options. IROS 2023: 7545-7552
[c17]
- view
  - electronic edition @ mlr.press (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/uai/HeCWM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/uai/HeCWM23
Jiamin He, Fengdi Che, Yi Wan, A. Rupam Mahmood:
Loosely consistent emphatic temporal-difference learning. UAI 2023: 849-859
[i29]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2302-01470
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2302-01470
Qingfeng Lan, A. Rupam Mahmood, Shuicheng Yan, Zhongwen Xu:
Learning to Optimize for Reinforcement Learning. CoRR abs/2302.01470 (2023)
[i28]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2302-03281
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2302-03281
Mohamed Elsayed, A. Rupam Mahmood:
Utility-based Perturbed Gradient Descent: An Optimizer for Continual Learning. CoRR abs/2302.03281 (2023)
[i27]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2305-05760
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2305-05760
Homayoon Farrahi, A. Rupam Mahmood:
Reducing the Cost of Cycle-Time Tuning for Real-World Policy Optimization. CoRR abs/2305.05760 (2023)
[i26]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2305-18246
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2305-18246
Haque Ishfaq, Qingfeng Lan, Pan Xu, A. Rupam Mahmood, Doina Precup, Anima Anandkumar, Kamyar Azizzadenesheli:
Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo. CoRR abs/2305.18246 (2023)
[i25]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2306-13284
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2306-13284
Fengdi Che, Gautham Vasan, A. Rupam Mahmood:
Correcting discount-factor mismatch in on-policy policy gradient methods. CoRR abs/2306.13284 (2023)
[i24]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2306-13812
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2306-13812
Shibhansh Dohare, J. Fernando Hernandez-Garcia, Parash Rahman, Richard S. Sutton, A. Rupam Mahmood:
Maintaining Plasticity in Deep Continual Learning. CoRR abs/2306.13812 (2023)
[i23]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2310-01365
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2310-01365
Qingfeng Lan, A. Rupam Mahmood:
Elephant Neural Networks: Born to Be a Continual Learner. CoRR abs/2310.01365 (2023)
[i22]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2312-15339
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2312-15339
Bram Grooten, Tristan Tomilin, Gautham Vasan, Matthew E. Taylor, A. Rupam Mahmood, Meng Fang, Mykola Pechenizkiy, Decebal Constantin Mocanu:
MaDi: Learning to Mask Distractions for Generalization in Visual Deep Reinforcement Learning. CoRR abs/2312.15339 (2023)
2022
[j5]
- view
  - electronic edition @ jmlr.org (open access)
  - no references & citations available
- export record
  dblp key:
  - journals/jmlr/0001SLKMW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/jmlr/0001SLKMW22
Alan Chan, Hugo Silva, Sungsu Lim, Tadashi Kozuno, A. Rupam Mahmood, Martha White:
Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences. J. Mach. Learn. Res. 23: 253:1-253:79 (2022)
[c16]
- view
  - electronic edition @ mlr.press (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/aistats/LanTFM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/aistats/LanTFM22
Qingfeng Lan, Samuele Tosatto, Homayoon Farrahi, Rupam Mahmood:
Model-free Policy Learning with Reward Gradients. AISTATS 2022: 4217-4234
[c15]
- view
  - electronic edition @ mlr.press (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/aistats/0006TPWM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/aistats/0006TPWM22
Shivam Garg, Samuele Tosatto, Yangchen Pan, Martha White, Rupam Mahmood:
An Alternate Policy Gradient Estimator for Softmax Policies. AISTATS 2022: 6630-6689
[c14]
- view
  - electronic edition @ mlr.press (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/icml/TosattoPWM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icml/TosattoPWM22
Samuele Tosatto, Andrew Patterson, Martha White, Rupam Mahmood:
A Temporal-Difference Approach to Policy Gradient Estimation. ICML 2022: 21609-21632
[c13]
- view
  authority control:
- export record
  dblp key:
  - conf/icra/YuanM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icra/YuanM22
Yufeng Yuan, A. Rupam Mahmood:
Asynchronous Reinforcement Learning for Real-Time Control of Physical Robots. ICRA 2022: 5546-5552
[i21]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2202-02396
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2202-02396
Samuele Tosatto, Andrew Patterson, Martha White, A. Rupam Mahmood:
A Temporal-Difference Approach to Policy Gradient Estimation. CoRR abs/2202.02396 (2022)
[i20]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2203-12759
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2203-12759
Yufeng Yuan, Rupam Mahmood:
Asynchronous Reinforcement Learning for Real-Time Control of Physical Robots. CoRR abs/2203.12759 (2022)
[i19]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2205-10868
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2205-10868
Qingfeng Lan, Yangchen Pan, Jun Luo, A. Rupam Mahmood:
Memory-efficient Reinforcement Learning with Knowledge Consolidation. CoRR abs/2205.10868 (2022)
[i18]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2210-02317
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2210-02317
Yan Wang, Gautham Vasan, A. Rupam Mahmood:
Real-Time Reinforcement Learning for Vision-Based Robotics Utilizing Local and Remote Computers. CoRR abs/2210.02317 (2022)
[i17]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2210-11639
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2210-11639
Mohamed Elsayed, A. Rupam Mahmood:
HesScale: Scalable Computation of Hessian Diagonals. CoRR abs/2210.11639 (2022)
[i16]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2212-04407
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2212-04407
Amirmohammad Karimi, Jun Jin, Jun Luo, A. Rupam Mahmood, Martin Jägersand, Samuele Tosatto:
Variable-Decision Frequency Option Critic. CoRR abs/2212.04407 (2022)
2021
[c12]
- view
  authority control:
- export record
  dblp key:
  - conf/icra/PrzystupaDJM21
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icra/PrzystupaDJM21
Michael Przystupa, Masood Dehghan, Martin Jägersand, A. Rupam Mahmood:
Analyzing Neural Jacobian Methods in Applications of Visual Servoing and Kinematic Control. ICRA 2021: 14276-14283
[i15]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2103-05147
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2103-05147
Qingfeng Lan, A. Rupam Mahmood:
Model-free Policy Learning with Reward Gradients. CoRR abs/2103.05147 (2021)
[i14]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2106-06083
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2106-06083
Michael Przystupa, Masood Dehghan, Martin Jägersand, A. Rupam Mahmood:
Analyzing Neural Jacobian Methods in Applications of Visual Servoing and Kinematic Control. CoRR abs/2106.06083 (2021)
[i13]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2107-08285
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2107-08285
Alan Chan, Hugo Silva, Sungsu Lim, Tadashi Kozuno, A. Rupam Mahmood, Martha White:
Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences. CoRR abs/2107.08285 (2021)
[i12]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2108-06325
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2108-06325
Shibhansh Dohare, A. Rupam Mahmood, Richard S. Sutton:
Continual Backprop: Stochastic Gradient Descent with Persistent Randomness. CoRR abs/2108.06325 (2021)
[i11]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2112-11622
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2112-11622
Shivam Garg, Samuele Tosatto, Yangchen Pan, Martha White, A. Rupam Mahmood:
An Alternate Policy Gradient Estimator for Softmax Policies. CoRR abs/2112.11622 (2021)
2020
[j4]
- view
  authority control:
- export record
  dblp key:
  - journals/ral/LimoyoCMWMK20
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/ral/LimoyoCMWMK20
Oliver Limoyo, Bryan Chan, Filip Maric, Brandon Wagstaff, A. Rupam Mahmood, Jonathan Kelly:
Heteroscedastic Uncertainty for Robust Generative Latent Dynamics. IEEE Robotics Autom. Lett. 5(4): 6654-6661 (2020)
[i10]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2008-08157
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2008-08157
Oliver Limoyo, Bryan Chan, Filip Maric, Brandon Wagstaff, A. Rupam Mahmood, Jonathan Kelly:
Heteroscedastic Uncertainty for Robust Generative Latent Dynamics. CoRR abs/2008.08157 (2020)

2010 – 2019

see FAQ

What is the meaning of the colors in the publication lists?

2019
[c11]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/ijcai/KorenkevychMVB19
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/ijcai/KorenkevychMVB19
Dmytro Korenkevych, A. Rupam Mahmood, Gautham Vasan, James Bergstra:
Autoregressive Policies for Continuous Control Deep Reinforcement Learning. IJCAI 2019: 2754-2762
[i9]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-1903-11524
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-1903-11524
Dmytro Korenkevych, A. Rupam Mahmood, Gautham Vasan, James Bergstra:
Autoregressive Policies for Continuous Control Deep Reinforcement Learning. CoRR abs/1903.11524 (2019)
2018
[j3]
- view
  - electronic edition @ jmlr.org (open access)
  - no references & citations available
- export record
  dblp key:
  - journals/jmlr/YuMS18
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/jmlr/YuMS18
Huizhen Yu, Ashique Rupam Mahmood, Richard S. Sutton:
On Generalized Bellman Equations and Temporal-Difference Learning. J. Mach. Learn. Res. 19: 48:1-48:49 (2018)
[c10]
- view
  - electronic edition @ mlr.press (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/corl/MahmoodKVMB18
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/corl/MahmoodKVMB18
A. Rupam Mahmood, Dmytro Korenkevych, Gautham Vasan, William Ma, James Bergstra:
Benchmarking Reinforcement Learning Algorithms on Real-World Robots. CoRL 2018: 561-591
[c9]
- view
  authority control:
- export record
  dblp key:
  - conf/iros/MahmoodKKB18
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/iros/MahmoodKKB18
A. Rupam Mahmood, Dmytro Korenkevych, Brent J. Komer, James Bergstra:
Setting up a Reinforcement Learning Task with a Real-World Robot. IROS 2018: 4635-4640
[i8]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-1803-07067
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-1803-07067
A. Rupam Mahmood, Dmytro Korenkevych, Brent J. Komer, James Bergstra:
Setting up a Reinforcement Learning Task with a Real-World Robot. CoRR abs/1803.07067 (2018)
[i7]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-1809-07731
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-1809-07731
A. Rupam Mahmood, Dmytro Korenkevych, Gautham Vasan, William Ma, James Bergstra:
Benchmarking Reinforcement Learning Algorithms on Real-World Robots. CoRR abs/1809.07731 (2018)
2017
[c8]
- view
  authority control:
- export record
  dblp key:
  - conf/ai/YuMS17
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/ai/YuMS17
Huizhen Yu, Ashique Rupam Mahmood, Richard S. Sutton:
On Generalized Bellman Equations and Temporal-Difference Learning. Canadian AI 2017: 3-14
[i6]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/MahmoodYS17
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/MahmoodYS17
Ashique Rupam Mahmood, Huizhen Yu, Richard S. Sutton:
Multi-step Off-policy Learning Without Importance Sampling Ratios. CoRR abs/1702.03006 (2017)
[i5]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/YuMS17
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/YuMS17
Huizhen Yu, Ashique Rupam Mahmood, Richard S. Sutton:
On Generalized Bellman Equations and Temporal-Difference Learning. CoRR abs/1704.04463 (2017)
2016
[j2]
- view
  - electronic edition @ jmlr.org (open access)
  - no references & citations available
- export record
  dblp key:
  - journals/jmlr/SuttonMW16
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/jmlr/SuttonMW16
Richard S. Sutton, Ashique Rupam Mahmood, Martha White:
An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning. J. Mach. Learn. Res. 17: 73:1-73:29 (2016)
[j1]
- view
  - electronic edition @ jmlr.org (open access)
  - no references & citations available
- export record
  dblp key:
  - journals/jmlr/SeijenMPMS16
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/jmlr/SeijenMPMS16
Harm van Seijen, Ashique Rupam Mahmood, Patrick M. Pilarski, Marlos C. Machado, Richard S. Sutton:
True Online Temporal-Difference Learning. J. Mach. Learn. Res. 17: 145:1-145:40 (2016)
2015
[c7]
- view
  - electronic edition @ auai.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/uai/MahmoodS15
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/uai/MahmoodS15
Ashique Rupam Mahmood, Richard S. Sutton:
Off-policy learning based on weighted importance sampling with linear computational complexity. UAI 2015: 552-561
[i4]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/SuttonMW15
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/SuttonMW15
Richard S. Sutton, Ashique Rupam Mahmood, Martha White:
An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning. CoRR abs/1503.04269 (2015)
[i3]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/SeijenMPS15
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/SeijenMPS15
Harm van Seijen, Ashique Rupam Mahmood, Patrick M. Pilarski, Richard S. Sutton:
An Empirical Evaluation of True Online TD(λ). CoRR abs/1507.00353 (2015)
[i2]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/MahmoodYWS15
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/MahmoodYWS15
Ashique Rupam Mahmood, Huizhen Yu, Martha White, Richard S. Sutton:
Emphatic Temporal-Difference Learning. CoRR abs/1507.01569 (2015)
[i1]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/SeijenMPMS15
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/SeijenMPMS15
Harm van Seijen, Ashique Rupam Mahmood, Patrick M. Pilarski, Marlos C. Machado, Richard S. Sutton:
True Online Temporal-Difference Learning. CoRR abs/1512.04087 (2015)
2014
[c6]
- view
  - electronic edition @ mlr.press (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/icml/SuttonMPH14
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icml/SuttonMPH14
Richard S. Sutton, Ashique Rupam Mahmood, Doina Precup, Hado van Hasselt:
A new Q(lambda) with interim forward view and Monte Carlo equivalence. ICML 2014: 568-576
[c5]
- view
- export record
  dblp key:
  - conf/nips/MahmoodHS14
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/nips/MahmoodHS14
Ashique Rupam Mahmood, Hado van Hasselt, Richard S. Sutton:
Weighted importance sampling for off-policy learning with linear function approximation. NIPS 2014: 3014-3022
[c4]
- view
  - electronic edition @ dslpitt.org (archived)
  - no references & citations available
- export record
  dblp key:
  - conf/uai/HasseltMS14
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/uai/HasseltMS14
Hado van Hasselt, Ashique Rupam Mahmood, Richard S. Sutton:
Off-policy TD( l) with a true online equivalence. UAI 2014: 330-339
2013
[c3]
- view
  - electronic edition @ aaai.org (archived)
  - no references & citations available
- export record
  dblp key:
  - conf/aaai/MahmoodS13
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/aaai/MahmoodS13
Ashique Rupam Mahmood, Richard S. Sutton:
Representation Search through Generate and Test. AAAI Workshop: Learning Rich Representations from Low-Level Sensors 2013
[c2]
- view
  - electronic edition @ aaai.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/sara/MahmoodS13
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/sara/MahmoodS13
Ashique Rupam Mahmood, Richard S. Sutton:
Position Paper: Representation Search through Generate and Test. SARA 2013
2012
[c1]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/MahmoodSDP12
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/MahmoodSDP12
Ashique Rupam Mahmood, Richard S. Sutton, Thomas Degris, Patrick M. Pilarski:
Tuning-free step-size adaptation. ICASSP 2012: 2121-2124

Coauthor Index

see FAQ

a service of

manage site settings

To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.