


default search action
Mohammad Ghavamzadeh
Person information
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2025
- [i87]Zhuotong Chen, Fang Liu, Xuan Zhu, Yanjun Qi, Mohammad Ghavamzadeh:
Preference Optimization via Contrastive Divergence: Your Reward Model is Secretly an NLL Estimator. CoRR abs/2502.04567 (2025) - 2024
- [j20]Audrey Huang, Mohammad Ghavamzadeh, Nan Jiang, Marek Petrik:
Non-adaptive Online Finetuning for Offline Reinforcement Learning. RLJ 1: 182-197 (2024) - [j19]Mohammad Javad Azizi, Thang Duong, Yasin Abbasi-Yadkori, András György, Claire Vernade, Mohammad Ghavamzadeh:
Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms. RLJ 5: 2461-2491 (2024) - [j18]Christina Göpfert
, Alex Haig
, Chih-Wei Hsu
, Yinlam Chow
, Ivan Vendrov
, Tyler Lu
, Deepak Ramachandran
, Hubert Pham
, Mohammad Ghavamzadeh
, Craig Boutilier
:
Discovering Personalized Semantics for Soft Attributes in Recommender Systems Using Concept Activation Vectors. Trans. Recomm. Syst. 2(4): 30:1-30:37 (2024) - [c108]Kyuyoung Kim, Jongheon Jeong, Minyong An, Mohammad Ghavamzadeh, Krishnamurthy Dj Dvijotham, Jinwoo Shin, Kimin Lee:
Confidence-aware Reward Optimization for Fine-tuning Text-to-Image Models. ICLR 2024 - [c107]Amin Rakhsha, Mete Kemertas, Mohammad Ghavamzadeh, Amir-massoud Farahmand:
Maximum Entropy Model Correction in Reinforcement Learning. ICLR 2024 - [c106]Marek Petrik, Guy Tennenholtz, Mohammad Ghavamzadeh:
Bayesian Regret Minimization in Offline Bandits. ICML 2024 - [i86]Aldo Pacchiano, Mohammad Ghavamzadeh, Peter L. Bartlett:
Contextual Bandits with Stage-wise Constraints. CoRR abs/2401.08016 (2024) - [i85]Kyuyoung Kim, Jongheon Jeong, Minyong An, Mohammad Ghavamzadeh, Krishnamurthy Dvijotham, Jinwoo Shin, Kimin Lee:
Confidence-aware Reward Optimization for Fine-tuning Text-to-Image Models. CoRR abs/2404.01863 (2024) - [i84]Jia Lin Hau, Erick Delage, Esther Derman, Mohammad Ghavamzadeh, Marek Petrik:
Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis. CoRR abs/2410.24128 (2024) - [i83]Rohan Deb, Mohammad Ghavamzadeh, Arindam Banerjee:
Conservative Contextual Bandits: Beyond Linear Representations. CoRR abs/2412.06165 (2024) - 2023
- [c105]Mohammad Javad Azizi, Branislav Kveton, Mohammad Ghavamzadeh, Sumeet Katariya:
Meta-Learning for Simple Regret Minimization. AAAI 2023: 6709-6717 - [c104]Jia Lin Hau, Marek Petrik, Mohammad Ghavamzadeh:
Entropic Risk Optimization in Discounted MDPs. AISTATS 2023: 47-76 - [c103]Christoph Dann, Mohammad Ghavamzadeh, Teodor V. Marinov:
Multiple-policy High-confidence Policy Evaluation. AISTATS 2023: 9470-9487 - [c102]Kishan Panaganti, Zaiyan Xu, Dileep Kalathil, Mohammad Ghavamzadeh:
Distributionally Robust Behavioral Cloning for Robust Imitation Learning. CDC 2023: 1342-1347 - [c101]Yinlam Chow, Aza Tulepbergenov, Ofir Nachum, Dhawal Gupta, Moonkyung Ryu, Mohammad Ghavamzadeh, Craig Boutilier:
A Mixture-of-Expert Approach to RL-based Dialogue Management. ICLR 2023 - [c100]Joey Hong, Branislav Kveton, Manzil Zaheer, Sumeet Katariya, Mohammad Ghavamzadeh:
Multi-Task Off-Policy Learning from Bandit Feedback. ICML 2023: 13157-13173 - [c99]Ying Fan, Olivia Watkins, Yuqing Du, Hao Liu, Moonkyung Ryu, Craig Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, Kangwook Lee, Kimin Lee:
Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models. NeurIPS 2023 - [c98]Dhawal Gupta, Yinlam Chow, Azamat Tulepbergenov, Mohammad Ghavamzadeh, Craig Boutilier:
Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management. NeurIPS 2023 - [c97]Jia Lin Hau, Erick Delage, Mohammad Ghavamzadeh, Marek Petrik:
On Dynamic Programming Decompositions of Static Risk Measures in Markov Decision Processes. NeurIPS 2023 - [c96]Jincheng Mei, Bo Dai, Alekh Agarwal, Mohammad Ghavamzadeh, Csaba Szepesvári, Dale Schuurmans:
Ordering-based Conditions for Global Convergence of Policy Gradient Methods. NeurIPS 2023 - [i82]Dhawal Gupta, Yinlam Chow, Mohammad Ghavamzadeh, Craig Boutilier:
Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management. CoRR abs/2302.10850 (2023) - [i81]Kimin Lee, Hao Liu, Moonkyung Ryu, Olivia Watkins, Yuqing Du, Craig Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, Shixiang Shane Gu:
Aligning Text-to-Image Models using Human Feedback. CoRR abs/2302.12192 (2023) - [i80]Moloud Abdar, Meenakshi Kollati, Swaraja Kuraparthi, Farhad Pourpanah, Daniel McDuff, Mohammad Ghavamzadeh, Shuicheng Yan, Abduallah Mohamed, Abbas Khosravi, Erik Cambria, Fatih Porikli:
A Review of Deep Learning for Video Captioning. CoRR abs/2304.11431 (2023) - [i79]Jia Lin Hau, Erick Delage, Mohammad Ghavamzadeh, Marek Petrik:
On Dynamic Program Decompositions of Static Risk Measures. CoRR abs/2304.12477 (2023) - [i78]Gecia Bravo Hermsdorff, Róbert Busa-Fekete, Mohammad Ghavamzadeh, Andres Muñoz Medina, Umar Syed:
Private and Communication-Efficient Algorithms for Entropy Estimation. CoRR abs/2305.07751 (2023) - [i77]Ying Fan, Olivia Watkins, Yuqing Du, Hao Liu, Moonkyung Ryu, Craig Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, Kangwook Lee, Kimin Lee:
DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models. CoRR abs/2305.16381 (2023) - [i76]Mohammad Ghavamzadeh, Marek Petrik, Guy Tennenholtz:
A Convex Relaxation Approach to Bayesian Regret Minimization in Offline Bandits. CoRR abs/2306.01237 (2023) - [i75]Jihwan Jeong, Yinlam Chow, Guy Tennenholtz, Chih-Wei Hsu, Azamat Tulepbergenov, Mohammad Ghavamzadeh, Craig Boutilier:
Factual and Personalized Recommendations using Language Models and Reinforcement Learning. CoRR abs/2310.06176 (2023) - [i74]Kishan Panaganti, Zaiyan Xu, Dileep M. Kalathil, Mohammad Ghavamzadeh:
Bridging Distributionally Robust Learning and Offline RL: An Approach to Mitigate Distribution Shift and Partial Data Coverage. CoRR abs/2310.18434 (2023) - [i73]Erdem Biyik, Fan Yao, Yinlam Chow, Alex Haig, Chih-Wei Hsu, Mohammad Ghavamzadeh, Craig Boutilier:
Preference Elicitation with Soft Attributes in Interactive Recommendation. CoRR abs/2311.02085 (2023) - [i72]Amin Rakhsha, Mete Kemertas, Mohammad Ghavamzadeh, Amir-massoud Farahmand:
Maximum Entropy Model Correction in Reinforcement Learning. CoRR abs/2311.17855 (2023) - 2022
- [c95]Joey Hong, Branislav Kveton, Manzil Zaheer, Mohammad Ghavamzadeh, Craig Boutilier:
Thompson Sampling with a Mixture Prior. AISTATS 2022: 7565-7586 - [c94]Joey Hong, Branislav Kveton, Manzil Zaheer, Mohammad Ghavamzadeh:
Hierarchical Bayesian Bandits. AISTATS 2022: 7724-7741 - [c93]Ahmadreza Moradipari, Mohammad Ghavamzadeh, Mahnoosh Alizadeh:
Collaborative Multi-agent Stochastic Linear Bandits. ACC 2022: 2761-2766 - [c92]Manan Tomar, Lior Shani, Yonathan Efroni, Mohammad Ghavamzadeh:
Mirror Descent Policy Optimization. ICLR 2022 - [c91]Joey Hong, Branislav Kveton, Sumeet Katariya, Manzil Zaheer, Mohammad Ghavamzadeh:
Deep Hierarchy in Bandits. ICML 2022: 8833-8851 - [c90]Ahmadreza Moradipari, Berkay Turan, Yasin Abbasi-Yadkori, Mahnoosh Alizadeh, Mohammad Ghavamzadeh:
Feature and Parameter Selection in Stochastic Linear Bandits. ICML 2022: 15927-15958 - [c89]Mohammad Javad Azizi, Branislav Kveton, Mohammad Ghavamzadeh:
Fixed-Budget Best-Arm Identification in Structured Bandits. IJCAI 2022: 2798-2804 - [c88]Ahmadreza Moradipari, Mohammad Ghavamzadeh, Taha Rajabzadeh, Christos Thrampoulidis, Mahnoosh Alizadeh:
Multi-Environment Meta-Learning in Stochastic Linear Bandits. ISIT 2022: 1659-1664 - [c87]Gecia Bravo Hermsdorff, Róbert Busa-Fekete, Mohammad Ghavamzadeh, Andrés Muñoz Medina, Umar Syed:
Private and Communication-Efficient Algorithms for Entropy Estimation. NeurIPS 2022 - [c86]Ido Greenberg, Yinlam Chow, Mohammad Ghavamzadeh, Shie Mannor:
Efficient Risk-Averse Reinforcement Learning. NeurIPS 2022 - [c85]Kishan Panaganti, Zaiyan Xu, Dileep Kalathil, Mohammad Ghavamzadeh:
Robust Reinforcement Learning using Offline Data. NeurIPS 2022 - [c84]Amin Rakhsha, Andrew Wang, Mohammad Ghavamzadeh, Amir-massoud Farahmand:
Operator Splitting Value Iteration. NeurIPS 2022 - [i71]Joey Hong, Branislav Kveton, Sumeet Katariya, Manzil Zaheer, Mohammad Ghavamzadeh:
Deep Hierarchy in Bandits. CoRR abs/2202.01454 (2022) - [i70]Mohammad Javad Azizi, Branislav Kveton, Mohammad Ghavamzadeh, Sumeet Katariya:
Meta-Learning for Simple Regret Minimization. CoRR abs/2202.12888 (2022) - [i69]Mohammad Javad Azizi, Thang Duong, Yasin Abbasi-Yadkori, András György, Claire Vernade, Mohammad Ghavamzadeh:
Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms. CoRR abs/2202.13001 (2022) - [i68]Ido Greenberg, Yinlam Chow, Mohammad Ghavamzadeh, Shie Mannor:
Efficient Risk-Averse Reinforcement Learning. CoRR abs/2205.05138 (2022) - [i67]Ahmadreza Moradipari, Mohammad Ghavamzadeh, Taha Rajabzadeh, Christos Thrampoulidis, Mahnoosh Alizadeh:
Multi-Environment Meta-Learning in Stochastic Linear Bandits. CoRR abs/2205.06326 (2022) - [i66]Ahmadreza Moradipari, Mohammad Ghavamzadeh, Mahnoosh Alizadeh:
Collaborative Multi-agent Stochastic Linear Bandits. CoRR abs/2205.06331 (2022) - [i65]Yinlam Chow, Aza Tulepbergenov, Ofir Nachum, Moonkyung Ryu, Mohammad Ghavamzadeh, Craig Boutilier:
A Mixture-of-Expert Approach to RL-based Dialogue Management. CoRR abs/2206.00059 (2022) - [i64]Jorge A. Mendez, Alborz Geramifard, Mohammad Ghavamzadeh, Bing Liu:
Reinforcement Learning of Multi-Domain Dialog Policies Via Action Embeddings. CoRR abs/2207.00468 (2022) - [i63]Kishan Panaganti, Zaiyan Xu
, Dileep M. Kalathil, Mohammad Ghavamzadeh:
Robust Reinforcement Learning using Offline Data. CoRR abs/2208.05129 (2022) - [i62]Jia Lin Hau, Marek Petrik, Mohammad Ghavamzadeh, Reazul Hasan Russel:
RASR: Risk-Averse Soft-Robust MDPs with EVaR and Entropic Risk. CoRR abs/2209.04067 (2022) - [i61]Amin Rakhsha, Andrew Wang, Mohammad Ghavamzadeh, Amir-massoud Farahmand:
Operator Splitting Value Iteration. CoRR abs/2211.13937 (2022) - [i60]Joey Hong, Branislav Kveton, Sumeet Katariya, Manzil Zaheer, Mohammad Ghavamzadeh:
Multi-Task Off-Policy Learning from Bandit Feedback. CoRR abs/2212.04720 (2022) - 2021
- [j17]Moloud Abdar
, Farhad Pourpanah
, Sadiq Hussain
, Dana Rezazadegan
, Li Liu, Mohammad Ghavamzadeh, Paul W. Fieguth
, Xiaochun Cao
, Abbas Khosravi
, U. Rajendra Acharya
, Vladimir Makarenkov
, Saeid Nahavandi:
A review of uncertainty quantification in deep learning: Techniques, applications and challenges. Inf. Fusion 76: 243-297 (2021) - [j16]Shubhanshu Shekhar
, Mohammad Ghavamzadeh
, Tara Javidi
:
Active Learning for Classification With Abstention. IEEE J. Sel. Areas Inf. Theory 2(2): 705-719 (2021) - [c83]Ravi Tej Akella, Kamyar Azizzadenesheli, Mohammad Ghavamzadeh, Animashree Anandkumar, Yisong Yue:
Deep Bayesian Quadrature Policy Optimization. AAAI 2021: 6600-6608 - [c82]Aldo Pacchiano, Mohammad Ghavamzadeh, Peter L. Bartlett, Heinrich Jiang:
Stochastic Bandits with Linear Constraints. AISTATS 2021: 2827-2835 - [c81]Brandon Cui, Yinlam Chow, Mohammad Ghavamzadeh:
Control-Aware Representations for Model-based Reinforcement Learning. ICLR 2021 - [c80]Amir Massoud Farahmand, Mohammad Ghavamzadeh:
PID Accelerated Value Iteration Algorithm. ICML 2021: 3143-3153 - [c79]Yinlam Chow, Brandon Cui, Moonkyung Ryu, Mohammad Ghavamzadeh:
Variational Model-based Policy Optimization. IJCAI 2021: 2292-2299 - [c78]Arash Mehrjou, Mohammad Ghavamzadeh, Bernhard Schölkopf:
Neural Lyapunov Redesign. L4DC 2021: 459-470 - [c77]Shubhanshu Shekhar, Greg Fields, Mohammad Ghavamzadeh, Tara Javidi
:
Adaptive Sampling for Minimax Fair Classification. NeurIPS 2021: 24535-24544 - [i59]Shubhanshu Shekhar, Mohammad Ghavamzadeh, Tara Javidi:
Adaptive Sampling for Minimax Fair Classification. CoRR abs/2103.00755 (2021) - [i58]Mohammad Javad Azizi, Branislav Kveton, Mohammad Ghavamzadeh:
Fixed-Budget Best-Arm Identification in Contextual Bandits: A Static-Adaptive Algorithm. CoRR abs/2106.04763 (2021) - [i57]Ahmadreza Moradipari, Yasin Abbasi-Yadkori, Mahnoosh Alizadeh, Mohammad Ghavamzadeh:
Parameter and Feature Selection in Stochastic Linear Bandits. CoRR abs/2106.05378 (2021) - [i56]Joey Hong, Branislav Kveton, Manzil Zaheer, Mohammad Ghavamzadeh, Craig Boutilier:
Thompson Sampling with a Mixture Prior. CoRR abs/2106.05608 (2021) - [i55]Joey Hong, Branislav Kveton, Manzil Zaheer, Mohammad Ghavamzadeh:
Hierarchical Bayesian Bandits. CoRR abs/2111.06929 (2021) - 2020
- [c76]Evrard Garcelon, Mohammad Ghavamzadeh, Alessandro Lazaric, Matteo Pirotta:
Improved Algorithms for Conservative Exploration in Bandits. AAAI 2020: 3962-3969 - [c75]Evrard Garcelon, Mohammad Ghavamzadeh, Alessandro Lazaric, Matteo Pirotta:
Conservative Exploration in Reinforcement Learning. AISTATS 2020: 1431-1441 - [c74]Branislav Kveton, Manzil Zaheer, Csaba Szepesvári, Lihong Li, Mohammad Ghavamzadeh, Craig Boutilier:
Randomized Exploration in Generalized Linear Bandits. AISTATS 2020: 2066-2076 - [c73]Yinlam Chow, Ofir Nachum, Aleksandra Faust, Edgar A. Duéñez-Guzmán, Mohammad Ghavamzadeh:
Safe Policy Learning for Continuous Control. CoRL 2020: 801-821 - [c72]Nir Levine, Yinlam Chow, Rui Shu, Ang Li, Mohammad Ghavamzadeh, Hung Bui:
Prediction, Consistency, Curvature: Representation Learning for Locally-Linear Control. ICLR 2020 - [c71]Shubhanshu Shekhar, Tara Javidi
, Mohammad Ghavamzadeh:
Adaptive Sampling for Estimating Probability Distributions. ICML 2020: 8687-8696 - [c70]Rui Shu, Tung Nguyen, Yinlam Chow, Tuan Pham, Khoat Than, Mohammad Ghavamzadeh, Stefano Ermon, Hung H. Bui:
Predictive Coding for Locally-Linear Control. ICML 2020: 8862-8871 - [c69]Manan Tomar, Yonathan Efroni, Mohammad Ghavamzadeh:
Multi-step Greedy Reinforcement Learning Algorithms. ICML 2020: 9504-9513 - [c68]Shubhanshu Shekhar, Mohammad Ghavamzadeh, Tara Javidi
:
Active Learning for Classification with Abstention. ISIT 2020: 2801-2806 - [c67]Yonathan Efroni, Mohammad Ghavamzadeh, Shie Mannor:
Online Planning with Lookahead Policies. NeurIPS 2020 - [c66]Jean Tarbouriech, Shubhanshu Shekhar, Matteo Pirotta, Mohammad Ghavamzadeh, Alessandro Lazaric:
Active Model Estimation in Markov Decision Processes. UAI 2020: 1019-1028 - [i54]Evrard Garcelon, Mohammad Ghavamzadeh, Alessandro Lazaric, Matteo Pirotta:
Conservative Exploration in Reinforcement Learning. CoRR abs/2002.03218 (2020) - [i53]Evrard Garcelon, Mohammad Ghavamzadeh, Alessandro Lazaric, Matteo Pirotta:
Improved Algorithms for Conservative Exploration in Bandits. CoRR abs/2002.03221 (2020) - [i52]Romina Abachi, Mohammad Ghavamzadeh, Amir-massoud Farahmand:
Policy-Aware Model Learning for Policy Gradient Methods. CoRR abs/2003.00030 (2020) - [i51]Rui Shu, Tung Nguyen, Yinlam Chow, Tuan Pham, Khoat Than, Mohammad Ghavamzadeh, Stefano Ermon, Hung H. Bui:
Predictive Coding for Locally-Linear Control. CoRR abs/2003.01086 (2020) - [i50]Jean Tarbouriech, Shubhanshu Shekhar, Matteo Pirotta, Mohammad Ghavamzadeh, Alessandro Lazaric:
Active Model Estimation in Markov Decision Processes. CoRR abs/2003.03297 (2020) - [i49]Manan Tomar, Lior Shani, Yonathan Efroni, Mohammad Ghavamzadeh:
Mirror Descent Policy Optimization. CoRR abs/2005.09814 (2020) - [i48]Arash Mehrjou, Mohammad Ghavamzadeh, Bernhard Schölkopf:
Automatic Policy Synthesis to Improve the Safety of Nonlinear Dynamical Systems. CoRR abs/2006.03947 (2020) - [i47]Bo Liu, Ian Gemp, Mohammad Ghavamzadeh, Ji Liu, Sridhar Mahadevan, Marek Petrik:
Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity. CoRR abs/2006.03976 (2020) - [i46]Yinlam Chow, Brandon Cui, Moonkyung Ryu, Mohammad Ghavamzadeh:
Variational Model-based Policy Optimization. CoRR abs/2006.05443 (2020) - [i45]Aldo Pacchiano, Mohammad Ghavamzadeh, Peter L. Bartlett, Heinrich Jiang:
Stochastic Bandits with Linear Constraints. CoRR abs/2006.10185 (2020) - [i44]Brandon Cui, Yinlam Chow, Mohammad Ghavamzadeh:
Control-Aware Representations for Model-based Reinforcement Learning. CoRR abs/2006.13408 (2020) - [i43]Bo Liu, Ji Liu, Mohammad Ghavamzadeh, Sridhar Mahadevan, Marek Petrik:
Finite-Sample Analysis of GTD Algorithms. CoRR abs/2006.14364 (2020) - [i42]Ravi Tej Akella, Kamyar Azizzadenesheli, Mohammad Ghavamzadeh, Anima Anandkumar, Yisong Yue:
Deep Bayesian Quadrature Policy Optimization. CoRR abs/2006.15637 (2020) - [i41]Daoming Lyu, Qi Qi, Mohammad Ghavamzadeh, Hengshuai Yao, Tianbao Yang, Bo Liu:
Variance-Reduced Off-Policy Memory-Efficient Policy Search. CoRR abs/2009.06548 (2020) - [i40]Moloud Abdar, Farhad Pourpanah, Sadiq Hussain, Dana Rezazadegan, Li Liu, Mohammad Ghavamzadeh, Paul W. Fieguth, Xiaochun Cao, Abbas Khosravi, U. Rajendra Acharya, Vladimir Makarenkov, Saeid Nahavandi:
A Review of Uncertainty Quantification in Deep Learning: Techniques, Applications and Challenges. CoRR abs/2011.06225 (2020) - [i39]Elita A. Lobo, Mohammad Ghavamzadeh, Marek Petrik:
Soft-Robust Algorithms for Handling Model Misspecification. CoRR abs/2011.14495 (2020) - [i38]Joey Hong, Branislav Kveton, Manzil Zaheer, Yinlam Chow, Amr Ahmed, Mohammad Ghavamzadeh, Craig Boutilier:
Non-Stationary Latent Bandits. CoRR abs/2012.00386 (2020)
2010 – 2019
- 2019
- [c65]Jonathan Lacotte, Mohammad Ghavamzadeh, Yinlam Chow, Marco Pavone:
Risk-Sensitive Generative Adversarial Imitation Learning. AISTATS 2019: 2154-2163 - [c64]Ershad Banijamali, Yasin Abbasi-Yadkori, Mohammad Ghavamzadeh, Nikos Vlassis:
Optimizing over a Restricted Policy Class in MDPs. AISTATS 2019: 3042-3050 - [c63]Branislav Kveton, Csaba Szepesvári, Sharan Vaswani, Zheng Wen, Tor Lattimore, Mohammad Ghavamzadeh:
Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits. ICML 2019: 3601-3610 - [c62]Branislav Kveton, Csaba Szepesvári, Mohammad Ghavamzadeh, Craig Boutilier:
Perturbed-History Exploration in Stochastic Multi-Armed Bandits. IJCAI 2019: 2786-2793 - [c61]Yonathan Efroni, Nadav Merlis, Mohammad Ghavamzadeh, Shie Mannor:
Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies. NeurIPS 2019: 12203-12213 - [c60]Branislav Kveton, Csaba Szepesvári, Mohammad Ghavamzadeh, Craig Boutilier:
Perturbed-History Exploration in Stochastic Linear Bandits. UAI 2019: 530-540 - [i37]Yinlam Chow, Ofir Nachum, Aleksandra Faust, Mohammad Ghavamzadeh, Edgar A. Duéñez-Guzmán:
Lyapunov-based Safe Policy Optimization for Continuous Control. CoRR abs/1901.10031 (2019) - [i36]Branislav Kveton, Csaba Szepesvári, Mohammad Ghavamzadeh, Craig Boutilier:
Perturbed-History Exploration in Stochastic Multi-Armed Bandits. CoRR abs/1902.10089 (2019) - [i35]Branislav Kveton, Csaba Szepesvári, Mohammad Ghavamzadeh, Craig Boutilier:
Perturbed-History Exploration in Stochastic Linear Bandits. CoRR abs/1903.09132 (2019) - [i34]Shubhanshu Shekhar, Mohammad Ghavamzadeh, Tara Javidi:
Binary Classification with Bounded Abstention Rate. CoRR abs/1905.09561 (2019) - [i33]Yonathan Efroni, Nadav Merlis, Mohammad Ghavamzadeh, Shie Mannor:
Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies. CoRR abs/1905.11527 (2019) - [i32]Shubhanshu Shekhar, Mohammad Ghavamzadeh, Tara Javidi:
Active Learning for Binary Classification with Abstention. CoRR abs/1906.00303 (2019) - [i31]Branislav Kveton, Manzil Zaheer, Csaba Szepesvári, Lihong Li, Mohammad Ghavamzadeh, Craig Boutilier:
Randomized Exploration in Generalized Linear Bandits. CoRR abs/1906.08947 (2019) - [i30]Nir Levine, Yinlam Chow, Rui Shu, Ang Li, Mohammad Ghavamzadeh, Hung Bui:
Prediction, Consistency, Curvature: Representation Learning for Locally-Linear Control. CoRR abs/1909.01506 (2019) - [i29]Yonathan Efroni, Mohammad Ghavamzadeh, Shie Mannor:
Multi-Step Greedy and Approximate Real Time Dynamic Programming. CoRR abs/1909.04236 (2019) - [i28]Scott Fujimoto, Edoardo Conti, Mohammad Ghavamzadeh, Joelle Pineau:
Benchmarking Batch Deep Reinforcement Learning Algorithms. CoRR abs/1910.01708 (2019) - [i27]Manan Tomar, Yonathan Efroni, Mohammad Ghavamzadeh:
Multi-step Greedy Policies in Model-Free Deep Reinforcement Learning. CoRR abs/1910.02919 (2019) - [i26]Shubhanshu Shekhar, Mohammad Ghavamzadeh, Tara Javidi:
Adaptive Sampling for Estimating Multiple Probability Distributions. CoRR abs/1910.12406 (2019) - 2018
- [j15]Bo Liu, Ian Gemp, Mohammad Ghavamzadeh, Ji Liu, Sridhar Mahadevan, Marek Petrik:
Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity. J. Artif. Intell. Res. 63: 461-494 (2018) - [c59]Ershad Banijamali, Rui Shu, Mohammad Ghavamzadeh, Hung Bui, Ali Ghodsi:
Robust Locally-Linear Controllable Embedding. AISTATS 2018: 1751-1759 - [c58]Yinlam Chow, Ofir Nachum, Mohammad Ghavamzadeh:
Path Consistency Learning in Tsallis Entropy Regularized MDPs. ICML 2018: 978-987 - [c57]Mehrdad Farajtabar, Yinlam Chow, Mohammad Ghavamzadeh:
More Robust Doubly Robust Off-policy Evaluation. ICML 2018: 1446-1455 - [c56]Yahel David, Balázs Szörényi, Mohammad Ghavamzadeh, Shie Mannor, Nahum Shimkin:
PAC Bandits with Risk Constraints. ISAIM 2018 - [c55]Tengyang Xie, Bo Liu, Yangyang Xu, Mohammad Ghavamzadeh, Yinlam Chow, Daoming Lyu
, Daesub Yoon:
A Block Coordinate Ascent Algorithm for Mean-Variance Optimization. NeurIPS 2018: 1073-1083 - [c54]Yinlam Chow, Ofir Nachum, Edgar A. Duéñez-Guzmán, Mohammad Ghavamzadeh:
A Lyapunov-based Approach to Safe Reinforcement Learning. NeurIPS 2018: 8103-8112 - [i25]Mehrdad Farajtabar, Yinlam Chow, Mohammad Ghavamzadeh:
More Robust Doubly Robust Off-policy Evaluation. CoRR abs/1802.03493 (2018) - [i24]Ofir Nachum, Yinlam Chow, Mohammad Ghavamzadeh:
Path Consistency Learning in Tsallis Entropy Regularized MDPs. CoRR abs/1802.03501 (2018) - [i23]Ershad Banijamali, Yasin Abbasi-Yadkori, Mohammad Ghavamzadeh, Nikos Vlassis:
Optimizing over a Restricted Policy Class in Markov Decision Processes. CoRR abs/1802.09646 (2018) - [i22]Yinlam Chow, Ofir Nachum, Edgar A. Duéñez-Guzmán, Mohammad Ghavamzadeh:
A Lyapunov-based Approach to Safe Reinforcement Learning. CoRR abs/1805.07708 (2018) - [i21]Jonathan Lacotte, Yinlam Chow, Mohammad Ghavamzadeh, Marco Pavone:
Risk-Sensitive Generative Adversarial Imitation Learning. CoRR abs/1808.04468 (2018) - [i20]Bo Liu, Tengyang Xie, Yangyang Xu, Mohammad Ghavamzadeh, Yinlam Chow, Daoming Lyu, Daesub Yoon:
A Block Coordinate Ascent Algorithm for Mean-Variance Optimization. CoRR abs/1809.02292 (2018) - [i19]Branislav Kveton, Csaba Szepesvári, Zheng Wen, Mohammad Ghavamzadeh, Tor Lattimore:
Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits. CoRR abs/1811.05154 (2018) - 2017
- [j14]Yinlam Chow, Mohammad Ghavamzadeh, Lucas Janson, Marco Pavone:
Risk-Constrained Reinforcement Learning with Percentile Risk Criteria. J. Mach. Learn. Res. 18: 167:1-167:51 (2017) - [j13]Aviv Tamar, Yinlam Chow
, Mohammad Ghavamzadeh, Shie Mannor
:
Sequential Decision Making With Coherent Risk. IEEE Trans. Autom. Control. 62(7): 3323-3338 (2017) - [c53]Philip S. Thomas, Georgios Theocharous, Mohammad Ghavamzadeh, Ishan Durugkar, Emma Brunskill:
Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing. AAAI 2017: 4740-4745 - [c52]Ian Gemp, Georgios Theocharous, Mohammad Ghavamzadeh:
Automated Data Cleansing through Meta-Learning. AAAI 2017: 4760-4761 - [c51]Alan Malek, Sumeet Katariya, Yinlam Chow, Mohammad Ghavamzadeh:
Sequential Multiple Hypothesis Testing with Type I Error Control. AISTATS 2017: 1468-1476 - [c50]Carlos Riquelme, Mohammad Ghavamzadeh, Alessandro Lazaric:
Active Learning for Accurate Estimation of Linear Models. ICML 2017: 2931-2939 - [c49]Rui Shu, Hung Hai Bui, Mohammad Ghavamzadeh:
Bottleneck Conditional Density Estimation. ICML 2017: 3164-3172 - [c48]Sharan Vaswani, Branislav Kveton, Zheng Wen, Mohammad Ghavamzadeh, Laks V. S. Lakshmanan, Mark Schmidt:
Model-Independent Online Learning for Influence Maximization. ICML 2017: 3530-3539 - [c47]Masrour Zoghi, Tomás Tunys, Mohammad Ghavamzadeh, Branislav Kveton, Csaba Szepesvári, Zheng Wen:
Online Learning to Rank in Stochastic Click Models. ICML 2017: 4199-4208 - [c46]Sougata Chaudhuri, Georgios Theocharous, Mohammad Ghavamzadeh:
Importance of Recommendation Policy Space in Addressing Click Sparsity in Personalized Advertisement Display. MLDM 2017: 32-46 - [c45]Abbas Kazerouni, Mohammad Ghavamzadeh, Yasin Abbasi, Benjamin Van Roy:
Conservative Contextual Linear Bandits. NIPS 2017: 3910-3919 - [i18]Sharan Vaswani, Branislav Kveton, Zheng Wen, Mohammad Ghavamzadeh, Laks V. S. Lakshmanan, Mark Schmidt:
Diffusion Independent Semi-Bandit Influence Maximization. CoRR abs/1703.00557 (2017) - [i17]Carlos Riquelme, Mohammad Ghavamzadeh, Alessandro Lazaric:
Active Learning for Accurate Estimation of Linear Models. CoRR abs/1703.00579 (2017) - [i16]Mohammad Ghavamzadeh, Branislav Kveton, Csaba Szepesvári, Tomás Tunys, Zheng Wen, Masrour Zoghi:
Online Learning to Rank in Stochastic Click Models. CoRR abs/1703.02527 (2017) - [i15]Ershad Banijamali, Rui Shu, Mohammad Ghavamzadeh, Hung Bui, Ali Ghodsi:
Robust Locally-Linear Controllable Embedding. CoRR abs/1710.05373 (2017) - [i14]Ershad Banijamali, Ahmad Khajenezhad, Ali Ghodsi, Mohammad Ghavamzadeh:
Disentangling Dynamics and Content for Control and Planning. CoRR abs/1711.09165 (2017) - 2016
- [j12]Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos:
Analysis of Classification-based Policy Iteration Algorithms. J. Mach. Learn. Res. 17: 19:1-19:30 (2016) - [j11]Mohammad Ghavamzadeh, Yaakov Engel, Michal Valko:
Bayesian Policy Gradient and Actor-Critic Algorithms. J. Mach. Learn. Res. 17: 66:1-66:53 (2016) - [j10]Amir-massoud Farahmand, Mohammad Ghavamzadeh, Csaba Szepesvári, Shie Mannor:
Regularized Policy Iteration with Nonparametric Function Spaces. J. Mach. Learn. Res. 17: 139:1-139:66 (2016) - [j9]Prashanth L. A.
, Mohammad Ghavamzadeh:
Variance-constrained actor-critic algorithms for discounted and average reward MDPs. Mach. Learn. 105(3): 367-417 (2016) - [c44]Victor Gabillon, Alessandro Lazaric, Mohammad Ghavamzadeh, Ronald Ortner, Peter L. Bartlett:
Improved Learning Complexity in Combinatorial Pure Exploration Bandits. AISTATS 2016: 1004-1012 - [c43]Bo Liu, Ji Liu, Mohammad Ghavamzadeh, Sridhar Mahadevan, Marek Petrik:
Proximal Gradient Temporal Difference Learning Algorithms. IJCAI 2016: 4195-4199 - [c42]Mohammad Ghavamzadeh, Marek Petrik, Yinlam Chow:
Safe Policy Improvement by Minimizing Robust Baseline Regret. NIPS 2016: 2298-2306 - [c41]Branislav Kveton, Hung Bui, Mohammad Ghavamzadeh, Georgios Theocharous, S. Muthukrishnan, Siqi Sun:
Graphical Model Sketch. ECML/PKDD (1) 2016: 81-97 - [i13]Branislav Kveton, Hung Bui, Mohammad Ghavamzadeh, Georgios Theocharous, S. Muthukrishnan, Siqi Sun:
Graphical Model Sketch. CoRR abs/1602.03105 (2016) - [i12]Sougata Chaudhuri, Georgios Theocharous, Mohammad Ghavamzadeh:
Personalized Advertisement Recommendation: A Ranking Approach to Address the Ubiquitous Click Sparsity Problem. CoRR abs/1603.01870 (2016) - [i11]Mohammad Ghavamzadeh, Shie Mannor, Joelle Pineau, Aviv Tamar:
Bayesian Reinforcement Learning: A Survey. CoRR abs/1609.04436 (2016) - [i10]Abbas Kazerouni, Mohammad Ghavamzadeh, Benjamin Van Roy:
Conservative Contextual Linear Bandits. CoRR abs/1611.06426 (2016) - [i9]Rui Shu, Hung Hai Bui, Mohammad Ghavamzadeh:
Bottleneck Conditional Density Estimation. CoRR abs/1611.08568 (2016) - 2015
- [j8]Mohammad Ghavamzadeh, Shie Mannor
, Joelle Pineau, Aviv Tamar:
Bayesian Reinforcement Learning: A Survey. Found. Trends Mach. Learn. 8(5-6): 359-483 (2015) - [j7]Bruno Scherrer, Mohammad Ghavamzadeh, Victor Gabillon, Boris Lesner, Matthieu Geist:
Approximate modified policy iteration and its application to the game of Tetris. J. Mach. Learn. Res. 16: 1629-1676 (2015) - [j6]Amir-massoud Farahmand, Doina Precup, André da Motta Salles Barreto, Mohammad Ghavamzadeh:
Classification-Based Approximate Policy Iteration. IEEE Trans. Autom. Control. 60(11): 2989-2993 (2015) - [c40]Philip S. Thomas, Georgios Theocharous, Mohammad Ghavamzadeh:
High-Confidence Off-Policy Evaluation. AAAI 2015: 3000-3006 - [c39]Philip S. Thomas, Georgios Theocharous, Mohammad Ghavamzadeh:
High Confidence Policy Improvement. ICML 2015: 2380-2388 - [c38]Georgios Theocharous, Philip S. Thomas, Mohammad Ghavamzadeh:
Personalized Ad Recommendation Systems for Life-Time Value Optimization with Guarantees. IJCAI 2015: 1806-1812 - [c37]Julien Audiffren, Michal Valko, Alessandro Lazaric, Mohammad Ghavamzadeh:
Maximum Entropy Semi-Supervised Inverse Reinforcement Learning. IJCAI 2015: 3315-3321 - [c36]Aviv Tamar, Yinlam Chow, Mohammad Ghavamzadeh, Shie Mannor:
Policy Gradient for Coherent Risk Measures. NIPS 2015: 1468-1476 - [c35]Bo Liu, Ji Liu, Mohammad Ghavamzadeh, Sridhar Mahadevan, Marek Petrik:
Finite-Sample Analysis of Proximal Gradient TD Algorithms. UAI 2015: 504-513 - [c34]Georgios Theocharous, Philip S. Thomas, Mohammad Ghavamzadeh:
Ad Recommendation Systems for Life-Time Value Optimization. WWW (Companion Volume) 2015: 1305-1310 - [i8]Aviv Tamar, Yinlam Chow, Mohammad Ghavamzadeh, Shie Mannor:
Policy Gradient for Coherent Risk Measures. CoRR abs/1502.03919 (2015) - [i7]Alexandra Carpentier, Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos, Peter Auer, András Antos:
Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits. CoRR abs/1507.04523 (2015) - [i6]Yinlam Chow, Mohammad Ghavamzadeh, Lucas Janson, Marco Pavone:
Risk-Constrained Reinforcement Learning with Percentile Risk Criteria. CoRR abs/1512.01629 (2015) - 2014
- [c33]Yinlam Chow, Mohammad Ghavamzadeh:
Algorithms for CVaR Optimization in MDPs. NIPS 2014: 3509-3517 - [i5]Prashanth L. A., Mohammad Ghavamzadeh:
Actor-Critic Algorithms for Risk-Sensitive Reinforcement Learning. CoRR abs/1403.6530 (2014) - [i4]Yinlam Chow, Mohammad Ghavamzadeh:
Algorithms for CVaR Optimization in MDPs. CoRR abs/1406.3339 (2014) - [i3]Amir-massoud Farahmand, Doina Precup, André da Motta Salles Barreto, Mohammad Ghavamzadeh:
Classification-based Approximate Policy Iteration: Experiments and Extended Discussions. CoRR abs/1407.0449 (2014) - 2013
- [c32]Hachem Kadri, Mohammad Ghavamzadeh, Philippe Preux:
A Generalized Kernel Approach to Structured Output Learning. ICML (1) 2013: 471-479 - [c31]Bernardo Ávila Pires, Csaba Szepesvári, Mohammad Ghavamzadeh:
Cost-sensitive Multiclass Classification Risk Bounds. ICML (3) 2013: 1391-1399 - [c30]Prashanth L. A., Mohammad Ghavamzadeh:
Actor-Critic Algorithms for Risk-Sensitive MDPs. NIPS 2013: 252-260 - [c29]Victor Gabillon, Mohammad Ghavamzadeh, Bruno Scherrer:
Approximate Dynamic Programming Finally Performs Well in the Game of Tetris. NIPS 2013: 1754-1762 - 2012
- [j5]Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos:
Finite-sample analysis of least-squares policy iteration. J. Mach. Learn. Res. 13: 3041-3074 (2012) - [c28]Mohammad Ghavamzadeh, Alessandro Lazaric:
Conservative and Greedy Approaches to Classification-Based Policy Iteration. AAAI 2012: 914-920 - [c27]Michal Valko, Mohammad Ghavamzadeh, Alessandro Lazaric:
Semi-Supervised Apprenticeship Learning. EWRL 2012: 131-142 - [c26]Matthieu Geist, Bruno Scherrer, Alessandro Lazaric, Mohammad Ghavamzadeh:
A Dantzig Selector Approach to Temporal Difference Learning. ICML 2012 - [c25]Bruno Scherrer, Victor Gabillon, Mohammad Ghavamzadeh, Matthieu Geist:
Approximate Modified Policy Iteration. ICML 2012 - [c24]Victor Gabillon, Mohammad Ghavamzadeh, Alessandro Lazaric:
Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence. NIPS 2012: 3221-3229 - [p2]Lucian Busoniu
, Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos, Robert Babuska, Bart De Schutter
:
Least-Squares Methods for Policy Iteration. Reinforcement Learning 2012: 75-109 - [p1]Nikos Vlassis, Mohammad Ghavamzadeh, Shie Mannor
, Pascal Poupart:
Bayesian Reinforcement Learning. Reinforcement Learning 2012: 359-386 - [i2]Hachem Kadri, Mohammad Ghavamzadeh, Philippe Preux:
A Generalized Kernel Approach to Structured Output Learning. CoRR abs/1205.2171 (2012) - [i1]Bruno Scherrer, Victor Gabillon, Mohammad Ghavamzadeh, Matthieu Geist:
Approximate Modified Policy Iteration. CoRR abs/1205.3054 (2012) - 2011
- [c23]Alexandra Carpentier
, Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos, Peter Auer
:
Upper-Confidence-Bound Algorithms for Active Learning in Multi-armed Bandits. ALT 2011: 189-203 - [c22]Matthew W. Hoffman, Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos:
Regularized Least Squares Temporal Difference Learning with Nested ℓ2 and ℓ1 Penalization. EWRL 2011: 102-114 - [c21]Victor Gabillon, Alessandro Lazaric, Mohammad Ghavamzadeh, Bruno Scherrer:
Classification-based Policy Iteration with a Critic. ICML 2011: 1049-1056 - [c20]Mohammad Ghavamzadeh, Alessandro Lazaric, Rémi Munos, Matthew W. Hoffman:
Finite-Sample Analysis of Lasso-TD. ICML 2011: 1177-1184 - [c19]Victor Gabillon, Mohammad Ghavamzadeh, Alessandro Lazaric, Sébastien Bubeck:
Multi-Bandit Best Arm Identification. NIPS 2011: 2222-2230 - [c18]Mohammad Gheshlaghi Azar, Rémi Munos, Mohammad Ghavamzadeh, Hilbert J. Kappen:
Speedy Q-Learning. NIPS 2011: 2411-2419 - 2010
- [c17]Alessandro Lazaric, Mohammad Ghavamzadeh:
Bayesian Multi-Task Reinforcement Learning. ICML 2010: 599-606 - [c16]Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos:
Analysis of a Classification-based Policy Iteration Algorithm. ICML 2010: 607-614 - [c15]Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos:
Finite-Sample Analysis of LSTD. ICML 2010: 615-622 - [c14]Mohammad Ghavamzadeh, Alessandro Lazaric, Odalric-Ambrym Maillard, Rémi Munos:
LSTD with Random Projections. NIPS 2010: 721-729 - [c13]Odalric-Ambrym Maillard, Rémi Munos, Alessandro Lazaric, Mohammad Ghavamzadeh:
Finite-sample Analysis of Bellman Residual Minimization. ACML 2010: 299-314
2000 – 2009
- 2009
- [j4]Shalabh Bhatnagar
, Richard S. Sutton, Mohammad Ghavamzadeh, Mark Lee:
Natural actor-critic algorithms. Autom. 45(11): 2471-2482 (2009) - [c12]Amir Massoud Farahmand, Mohammad Ghavamzadeh, Csaba Szepesvári, Shie Mannor
:
Regularized Fitted Q-Iteration for planning in continuous-space Markovian decision problems. ACC 2009: 725-730 - 2008
- [c11]Amir Massoud Farahmand, Mohammad Ghavamzadeh, Csaba Szepesvári, Shie Mannor
:
Regularized Fitted Q-Iteration: Application to Planning. EWRL 2008: 55-68 - [c10]Amir Massoud Farahmand, Mohammad Ghavamzadeh, Csaba Szepesvári, Shie Mannor:
Regularized Policy Iteration. NIPS 2008: 441-448 - 2007
- [j3]Mohammad Ghavamzadeh, Sridhar Mahadevan:
Hierarchical Average Reward Reinforcement Learning. J. Mach. Learn. Res. 8: 2629-2669 (2007) - [c9]Mohammad Ghavamzadeh, Yaakov Engel:
Bayesian actor-critic algorithms. ICML 2007: 297-304 - [c8]Shalabh Bhatnagar, Richard S. Sutton, Mohammad Ghavamzadeh, Mark Lee:
Incremental Natural Actor-Critic Algorithms. NIPS 2007: 105-112 - 2006
- [j2]Mohammad Ghavamzadeh, Sridhar Mahadevan, Rajbala Makar:
Hierarchical multi-agent reinforcement learning. Auton. Agents Multi Agent Syst. 13(2): 197-229 (2006) - [c7]Mohammad Ghavamzadeh, Yaakov Engel:
Bayesian Policy Gradient Algorithms. NIPS 2006: 457-464 - 2005
- [j1]Ion Muslea, Virginia Dignum, Daniel D. Corkill, Catholijn M. Jonker, Frank Dignum, Silvia Coradeschi, Alessandro Saffiotti, Dan Fu, Jeff Orkin, William Cheetham, Kai Goebel, Piero P. Bonissone, Leen-Kiat Soh, Randolph M. Jones, Robert E. Wray III, Matthias Scheutz, Daniela Pucci de Farias, Shie Mannor, Georgios Theocharous, Doina Precup, Bamshad Mobasher, Sarabjot S. Anand, Bettina Berendt, Andreas Hotho, Hans W. Guesgen, Michael T. Rosenstein, Mohammad Ghavamzadeh:
The Workshop Program at the Nineteenth National Conference on Artificial Intelligence. AI Mag. 26(1): 103-108 (2005) - 2004
- [c6]Mohammad Ghavamzadeh, Sridhar Mahadevan:
Learning to Communicate and Act Using Hierarchical Reinforcement Learning. AAMAS 2004: 1114-1121 - 2003
- [c5]Mohammad Ghavamzadeh, Sridhar Mahadevan:
Hierarchical Policy Gradient Algorithms. ICML 2003: 226-233 - 2002
- [c4]Mohammad Ghavamzadeh, Sridhar Mahadevan:
A multiagent reinforcement learning algorithm by dynamically merging markov decision processes. AAMAS 2002: 845-846 - [c3]Mohammad Ghavamzadeh, Sridhar Mahadevan:
Hierarchically Optimal Average Reward Reinforcement Learning. ICML 2002: 195-202 - 2001
- [c2]Rajbala Makar, Sridhar Mahadevan, Mohammad Ghavamzadeh:
Hierarchical multi-agent reinforcement learning. Agents 2001: 246-253 - [c1]Mohammad Ghavamzadeh, Sridhar Mahadevan:
Continuous-Time Hierarchical Reinforcement Learning. ICML 2001: 186-193
Coauthor Index

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from ,
, and
to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and
to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2025-03-13 21:28 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint