


Остановите войну!
for scientists:


default search action
A. Rupam Mahmood
Person information

- affiliation: University of Alberta, Reinforcement Learning & Artificial Intelligence Lab, Edmonton, AB, Canada
- affiliation: Alberta Machine Intelligence Institute (Amii), Edmonton, AB, Canada
- affiliation: Kindred AI, Toronto, ON, Canada
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2023
- [i23]Qingfeng Lan, A. Rupam Mahmood, Shuicheng Yan, Zhongwen Xu:
Learning to Optimize for Reinforcement Learning. CoRR abs/2302.01470 (2023) - [i22]Mohamed Elsayed, A. Rupam Mahmood:
Utility-based Perturbed Gradient Descent: An Optimizer for Continual Learning. CoRR abs/2302.03281 (2023) - 2022
- [c16]Qingfeng Lan, Samuele Tosatto, Homayoon Farrahi, Rupam Mahmood:
Model-free Policy Learning with Reward Gradients. AISTATS 2022: 4217-4234 - [c15]Shivam Garg, Samuele Tosatto, Yangchen Pan, Martha White, Rupam Mahmood:
An Alternate Policy Gradient Estimator for Softmax Policies. AISTATS 2022: 6630-6689 - [c14]Samuele Tosatto, Andrew Patterson, Martha White, Rupam Mahmood:
A Temporal-Difference Approach to Policy Gradient Estimation. ICML 2022: 21609-21632 - [c13]Yufeng Yuan, A. Rupam Mahmood:
Asynchronous Reinforcement Learning for Real-Time Control of Physical Robots. ICRA 2022: 5546-5552 - [i21]Samuele Tosatto, Andrew Patterson, Martha White, A. Rupam Mahmood:
A Temporal-Difference Approach to Policy Gradient Estimation. CoRR abs/2202.02396 (2022) - [i20]Yufeng Yuan, Rupam Mahmood:
Asynchronous Reinforcement Learning for Real-Time Control of Physical Robots. CoRR abs/2203.12759 (2022) - [i19]Qingfeng Lan, Yangchen Pan, Jun Luo, A. Rupam Mahmood:
Memory-efficient Reinforcement Learning with Knowledge Consolidation. CoRR abs/2205.10868 (2022) - [i18]Yan Wang, Gautham Vasan, A. Rupam Mahmood:
Real-Time Reinforcement Learning for Vision-Based Robotics Utilizing Local and Remote Computers. CoRR abs/2210.02317 (2022) - [i17]Mohamed Elsayed, A. Rupam Mahmood:
HesScale: Scalable Computation of Hessian Diagonals. CoRR abs/2210.11639 (2022) - [i16]Amirmohammad Karimi, Jun Jin, Jun Luo, A. Rupam Mahmood, Martin Jägersand, Samuele Tosatto:
Variable-Decision Frequency Option Critic. CoRR abs/2212.04407 (2022) - 2021
- [c12]Michael Przystupa, Masood Dehghan, Martin Jägersand, A. Rupam Mahmood:
Analyzing Neural Jacobian Methods in Applications of Visual Servoing and Kinematic Control. ICRA 2021: 14276-14283 - [i15]Qingfeng Lan, A. Rupam Mahmood:
Model-free Policy Learning with Reward Gradients. CoRR abs/2103.05147 (2021) - [i14]Michael Przystupa, Masood Dehghan, Martin Jägersand, A. Rupam Mahmood:
Analyzing Neural Jacobian Methods in Applications of Visual Servoing and Kinematic Control. CoRR abs/2106.06083 (2021) - [i13]Alan Chan, Hugo Silva, Sungsu Lim, Tadashi Kozuno, A. Rupam Mahmood, Martha White:
Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences. CoRR abs/2107.08285 (2021) - [i12]Shibhansh Dohare, A. Rupam Mahmood, Richard S. Sutton:
Continual Backprop: Stochastic Gradient Descent with Persistent Randomness. CoRR abs/2108.06325 (2021) - [i11]Shivam Garg, Samuele Tosatto, Yangchen Pan, Martha White, A. Rupam Mahmood:
An Alternate Policy Gradient Estimator for Softmax Policies. CoRR abs/2112.11622 (2021) - 2020
- [j4]Oliver Limoyo
, Bryan Chan
, Filip Maric
, Brandon Wagstaff
, A. Rupam Mahmood, Jonathan Kelly
:
Heteroscedastic Uncertainty for Robust Generative Latent Dynamics. IEEE Robotics Autom. Lett. 5(4): 6654-6661 (2020) - [i10]Oliver Limoyo, Bryan Chan, Filip Maric, Brandon Wagstaff, A. Rupam Mahmood, Jonathan Kelly:
Heteroscedastic Uncertainty for Robust Generative Latent Dynamics. CoRR abs/2008.08157 (2020)
2010 – 2019
- 2019
- [c11]Dmytro Korenkevych, A. Rupam Mahmood, Gautham Vasan, James Bergstra:
Autoregressive Policies for Continuous Control Deep Reinforcement Learning. IJCAI 2019: 2754-2762 - [i9]Dmytro Korenkevych, A. Rupam Mahmood, Gautham Vasan, James Bergstra:
Autoregressive Policies for Continuous Control Deep Reinforcement Learning. CoRR abs/1903.11524 (2019) - 2018
- [j3]Huizhen Yu, Ashique Rupam Mahmood, Richard S. Sutton:
On Generalized Bellman Equations and Temporal-Difference Learning. J. Mach. Learn. Res. 19: 48:1-48:49 (2018) - [c10]A. Rupam Mahmood, Dmytro Korenkevych, Gautham Vasan, William Ma, James Bergstra:
Benchmarking Reinforcement Learning Algorithms on Real-World Robots. CoRL 2018: 561-591 - [c9]A. Rupam Mahmood, Dmytro Korenkevych, Brent J. Komer, James Bergstra:
Setting up a Reinforcement Learning Task with a Real-World Robot. IROS 2018: 4635-4640 - [i8]A. Rupam Mahmood, Dmytro Korenkevych, Brent J. Komer, James Bergstra:
Setting up a Reinforcement Learning Task with a Real-World Robot. CoRR abs/1803.07067 (2018) - [i7]A. Rupam Mahmood, Dmytro Korenkevych, Gautham Vasan, William Ma, James Bergstra:
Benchmarking Reinforcement Learning Algorithms on Real-World Robots. CoRR abs/1809.07731 (2018) - 2017
- [c8]Huizhen Yu, Ashique Rupam Mahmood, Richard S. Sutton:
On Generalized Bellman Equations and Temporal-Difference Learning. Canadian Conference on AI 2017: 3-14 - [i6]Ashique Rupam Mahmood, Huizhen Yu, Richard S. Sutton:
Multi-step Off-policy Learning Without Importance Sampling Ratios. CoRR abs/1702.03006 (2017) - [i5]Huizhen Yu, Ashique Rupam Mahmood, Richard S. Sutton:
On Generalized Bellman Equations and Temporal-Difference Learning. CoRR abs/1704.04463 (2017) - 2016
- [j2]Richard S. Sutton, Ashique Rupam Mahmood, Martha White:
An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning. J. Mach. Learn. Res. 17: 73:1-73:29 (2016) - [j1]Harm van Seijen, Ashique Rupam Mahmood, Patrick M. Pilarski, Marlos C. Machado, Richard S. Sutton:
True Online Temporal-Difference Learning. J. Mach. Learn. Res. 17: 145:1-145:40 (2016) - 2015
- [c7]Ashique Rupam Mahmood, Richard S. Sutton:
Off-policy learning based on weighted importance sampling with linear computational complexity. UAI 2015: 552-561 - [i4]Richard S. Sutton, Ashique Rupam Mahmood, Martha White:
An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning. CoRR abs/1503.04269 (2015) - [i3]Harm van Seijen, Ashique Rupam Mahmood, Patrick M. Pilarski, Richard S. Sutton:
An Empirical Evaluation of True Online TD(λ). CoRR abs/1507.00353 (2015) - [i2]Ashique Rupam Mahmood, Huizhen Yu, Martha White, Richard S. Sutton:
Emphatic Temporal-Difference Learning. CoRR abs/1507.01569 (2015) - [i1]Harm van Seijen, Ashique Rupam Mahmood, Patrick M. Pilarski, Marlos C. Machado, Richard S. Sutton:
True Online Temporal-Difference Learning. CoRR abs/1512.04087 (2015) - 2014
- [c6]Richard S. Sutton, Ashique Rupam Mahmood, Doina Precup, Hado van Hasselt:
A new Q(lambda) with interim forward view and Monte Carlo equivalence. ICML 2014: 568-576 - [c5]Ashique Rupam Mahmood, Hado van Hasselt, Richard S. Sutton:
Weighted importance sampling for off-policy learning with linear function approximation. NIPS 2014: 3014-3022 - [c4]Hado van Hasselt, Ashique Rupam Mahmood, Richard S. Sutton:
Off-policy TD( l) with a true online equivalence. UAI 2014: 330-339 - 2013
- [c3]Ashique Rupam Mahmood, Richard S. Sutton:
Representation Search through Generate and Test. AAAI Workshop: Learning Rich Representations from Low-Level Sensors 2013 - [c2]Ashique Rupam Mahmood, Richard S. Sutton:
Position Paper: Representation Search through Generate and Test. SARA 2013 - 2012
- [c1]Ashique Rupam Mahmood, Richard S. Sutton, Thomas Degris, Patrick M. Pilarski:
Tuning-free step-size adaptation. ICASSP 2012: 2121-2124
Coauthor Index

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
load content from web.archive.org
Privacy notice: By enabling the option above, your browser will contact the API of web.archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from ,
, and
to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and
to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2023-02-12 01:07 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint