


Остановите войну!
for scientists:


default search action
Dale Schuurmans
Person information

- affiliation: University of Alberta
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
showing all ?? records
2020 – today
- 2023
- [i82]Dale Schuurmans:
Memory Augmented Large Language Models are Computationally Universal. CoRR abs/2301.04589 (2023) - [i81]Jincheng Mei, Wesley Chung, Valentin Thomas, Bo Dai, Csaba Szepesvári, Dale Schuurmans:
The Role of Baselines in Policy Gradient Optimization. CoRR abs/2301.06276 (2023) - [i80]Yilun Du, Mengjiao Yang, Bo Dai, Hanjun Dai, Ofir Nachum, Joshua B. Tenenbaum, Dale Schuurmans, Pieter Abbeel:
Learning Universal Policies via Text-Guided Video Generation. CoRR abs/2302.00111 (2023) - [i79]Sherry Yang, Ofir Nachum, Yilun Du, Jason Wei, Pieter Abbeel, Dale Schuurmans:
Foundation Models for Decision Making: Problems, Methods, and Opportunities. CoRR abs/2303.04129 (2023) - [i78]Azade Nova, Hanjun Dai, Dale Schuurmans:
Gradient-Free Structured Pruning with Unlabeled Data. CoRR abs/2303.04185 (2023) - 2022
- [c182]Mengjiao Yang, Bo Dai, Ofir Nachum, George Tucker, Dale Schuurmans:
Offline Policy Selection under Uncertainty. AISTATS 2022: 4376-4396 - [c181]Chenjun Xiao, Ilbin Lee, Bo Dai, Dale Schuurmans, Csaba Szepesvári:
The Curse of Passive Data Collection in Batch Reinforcement Learning. AISTATS 2022: 8413-8438 - [c180]Hanjun Dai, Yuan Xue, Zia Syed, Dale Schuurmans, Bo Dai:
Neural Stochastic Dual Dynamic Programming. ICLR 2022 - [c179]Chenjun Xiao, Bo Dai, Jincheng Mei, Oscar A Ramirez, Ramki Gummadi, Chris Harris, Dale Schuurmans:
Understanding and Leveraging Overparameterization in Recursive Value Estimation. ICLR 2022 - [c178]Hanjun Dai, Mengjiao Yang, Yuan Xue, Dale Schuurmans, Bo Dai:
Marginal Distribution Adaptation for Discrete Sets via Module-Oriented Divergence Minimization. ICML 2022: 4605-4617 - [c177]Ramki Gummadi, Saurabh Kumar, Junfeng Wen, Dale Schuurmans:
A Parametric Class of Approximate Gradient Updates for Policy Optimization. ICML 2022: 7998-8015 - [c176]Tianjun Zhang, Tongzheng Ren, Mengjiao Yang, Joseph Gonzalez, Dale Schuurmans, Bo Dai:
Making Linear MDPs Practical via Contrastive Representation Learning. ICML 2022: 26447-26466 - [c175]Hongyu Ren, Hanjun Dai, Bo Dai, Xinyun Chen, Denny Zhou, Jure Leskovec, Dale Schuurmans:
SMORE: Knowledge Graph Completion and Multi-hop Reasoning in Massive Knowledge Graphs. KDD 2022: 1472-1482 - [c174]Zichen Zhang, Jun Jin, Martin Jägersand, Jun Luo, Dale Schuurmans:
A Simple Decentralized Cross-Entropy Method. NeurIPS 2022 - [c173]Jincheng Mei, Wesley Chung, Valentin Thomas, Bo Dai, Csaba Szepesvári, Dale Schuurmans:
The Role of Baselines in Policy Gradient Optimization. NeurIPS 2022 - [c172]Haoran Sun, Hanjun Dai, Dale Schuurmans:
Optimal Scaling for Locally Balanced Proposals in Discrete Spaces. NeurIPS 2022 - [c171]Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, Denny Zhou:
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. NeurIPS 2022 - [c170]Mengjiao Yang, Dale Schuurmans, Pieter Abbeel, Ofir Nachum:
Chain of Thought Imitation with Procedure Cloning. NeurIPS 2022 - [c169]Runyu Zhang, Jincheng Mei, Bo Dai, Dale Schuurmans, Na Li:
On the Global Convergence Rates of Decentralized Softmax Gradient Play in Markov Potential Games. NeurIPS 2022 - [i77]Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Ed H. Chi, Quoc Le, Denny Zhou:
Chain of Thought Prompting Elicits Reasoning in Large Language Models. CoRR abs/2201.11903 (2022) - [i76]Runyu Zhang, Jincheng Mei, Bo Dai, Dale Schuurmans, Na Li:
On the Effect of Log-Barrier Regularization in Decentralized Softmax Gradient Play in Multiagent Systems. CoRR abs/2202.00872 (2022) - [i75]Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc V. Le, Ed H. Chi, Denny Zhou:
Self-Consistency Improves Chain of Thought Reasoning in Language Models. CoRR abs/2203.11171 (2022) - [i74]Alex Lewandowski, Calarina Muslimani, Matthew E. Taylor, Jun Luo, Dale Schuurmans:
Reinforcement Teaching. CoRR abs/2204.11897 (2022) - [i73]Denny Zhou, Nathanael Schärli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Olivier Bousquet, Quoc Le, Ed H. Chi:
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models. CoRR abs/2205.10625 (2022) - [i72]Mengjiao Yang, Dale Schuurmans, Pieter Abbeel, Ofir Nachum:
Chain of Thought Imitation with Procedure Cloning. CoRR abs/2205.10816 (2022) - [i71]Ramki Gummadi, Saurabh Kumar, Junfeng Wen, Dale Schuurmans:
A Parametric Class of Approximate Gradient Updates for Policy Optimization. CoRR abs/2206.08499 (2022) - [i70]Haoran Sun, Hanjun Dai, Bo Dai, Haomin Zhou, Dale Schuurmans:
Discrete Langevin Sampler via Wasserstein Gradient Flow. CoRR abs/2206.14897 (2022) - [i69]Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc V. Le, Ed H. Chi, Denny Zhou:
Rationale-Augmented Ensembles in Language Models. CoRR abs/2207.00747 (2022) - [i68]Tianjun Zhang, Tongzheng Ren, Mengjiao Yang, Joseph E. Gonzalez, Dale Schuurmans, Bo Dai:
Making Linear MDPs Practical via Contrastive Representation Learning. CoRR abs/2207.07150 (2022) - [i67]Tongzheng Ren, Tianjun Zhang, Lisa Lee, Joseph E. Gonzalez, Dale Schuurmans, Bo Dai:
Spectral Decomposition Representation for Reinforcement Learning. CoRR abs/2208.09515 (2022) - [i66]Haoran Sun, Hanjun Dai, Dale Schuurmans:
Optimal Scaling for Locally Balanced Proposals in Discrete Spaces. CoRR abs/2209.08183 (2022) - [i65]Mengjiao Yang, Dale Schuurmans, Pieter Abbeel, Ofir Nachum:
Dichotomy of Control: Separating What You Can Control from What You Cannot. CoRR abs/2210.13435 (2022) - [i64]Hanjun Dai, Yuan Xue, Niao He, Bethany Wang, Na Li, Dale Schuurmans, Bo Dai:
Learning to Optimize with Stochastic Dominance Constraints. CoRR abs/2211.07767 (2022) - [i63]Tianjun Zhang, Xuezhi Wang, Denny Zhou, Dale Schuurmans, Joseph E. Gonzalez:
TEMPERA: Test-Time Prompting via Reinforcement Learning. CoRR abs/2211.11890 (2022) - [i62]Ekin Akyürek, Dale Schuurmans, Jacob Andreas, Tengyu Ma, Denny Zhou:
What learning algorithm is in-context learning? Investigations with linear models. CoRR abs/2211.15661 (2022) - [i61]Haoran Sun, Lijun Yu, Bo Dai, Dale Schuurmans, Hanjun Dai:
Score-based Continuous-time Discrete Diffusion Models. CoRR abs/2211.16750 (2022) - [i60]Zichen Zhang, Jun Jin, Martin Jägersand, Jun Luo, Dale Schuurmans:
A Simple Decentralized Cross-Entropy Method. CoRR abs/2212.08235 (2022) - [i59]Tongzheng Ren, Chenjun Xiao, Tianjun Zhang, Na Li, Zhaoran Wang, Sujay Sanghavi, Dale Schuurmans, Bo Dai:
Latent Variable Representation for Reinforcement Learning. CoRR abs/2212.08765 (2022) - [i58]Zichen Zhang, Johannes Kirschner, Junxi Zhang, Francesco Zanini, Alex Ayoub, Masood Dehghan, Dale Schuurmans:
Managing Temporal Resolution in Continuous Value Estimation: A Fundamental Trade-off. CoRR abs/2212.08949 (2022) - 2021
- [c168]Mahdi Karami, Dale Schuurmans:
Deep Probabilistic Canonical Correlation Analysis. AAAI 2021: 8055-8063 - [c167]Seyed Kamyar Seyed Ghasemipour, Dale Schuurmans, Shixiang Shane Gu:
EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL. ICML 2021: 3682-3691 - [c166]Jincheng Mei, Yue Gao, Bo Dai, Csaba Szepesvári, Dale Schuurmans:
Leveraging Non-uniformity in First-order Non-convex Optimization. ICML 2021: 7555-7564 - [c165]Hongyu Ren, Hanjun Dai, Bo Dai, Xinyun Chen, Michihiro Yasunaga, Haitian Sun, Dale Schuurmans, Jure Leskovec, Denny Zhou:
LEGO: Latent Execution-Guided Reasoning for Multi-Hop Question Answering on Knowledge Graphs. ICML 2021: 8959-8970 - [c164]Junfeng Wen, Saurabh Kumar, Ramki Gummadi, Dale Schuurmans:
Characterizing the Gap Between Actor-Critic and Policy Gradient. ICML 2021: 11101-11111 - [c163]Chenjun Xiao, Yifan Wu, Jincheng Mei, Bo Dai, Tor Lattimore, Lihong Li, Csaba Szepesvári, Dale Schuurmans:
On the Optimality of Batch Policy Optimization Algorithms. ICML 2021: 11362-11371 - [c162]Jincheng Mei, Bo Dai, Chenjun Xiao, Csaba Szepesvári, Dale Schuurmans:
Understanding the Effect of Stochasticity in Policy Optimization. NeurIPS 2021: 19339-19351 - [c161]Hongyu Ren, Hanjun Dai, Zihang Dai, Mengjiao Yang, Jure Leskovec, Dale Schuurmans, Bo Dai:
Combiner: Full Attention Transformer with Sparse Computation Cost. NeurIPS 2021: 22470-22482 - [i57]Nevena Lazic, Botao Hao, Yasin Abbasi-Yadkori, Dale Schuurmans, Csaba Szepesvári:
Optimization Issues in KL-Constrained Approximate Policy Iteration. CoRR abs/2102.06234 (2021) - [i56]Chenjun Xiao, Yifan Wu, Tor Lattimore, Bo Dai, Jincheng Mei, Lihong Li, Csaba Szepesvári, Dale Schuurmans:
On the Optimality of Batch Policy Optimization Algorithms. CoRR abs/2104.02293 (2021) - [i55]Dennis Lee, Natasha Jaques, J. Chase Kew, Douglas Eck, Dale Schuurmans, Aleksandra Faust:
Joint Attention for Multi-Agent Coordination and Social Learning. CoRR abs/2104.07750 (2021) - [i54]Jincheng Mei, Yue Gao, Bo Dai, Csaba Szepesvári, Dale Schuurmans:
Leveraging Non-uniformity in First-order Non-convex Optimization. CoRR abs/2105.06072 (2021) - [i53]Junfeng Wen, Saurabh Kumar, Ramki Gummadi, Dale Schuurmans:
Characterizing the Gap Between Actor-Critic and Policy Gradient. CoRR abs/2106.06932 (2021) - [i52]Chenjun Xiao, Ilbin Lee, Bo Dai, Dale Schuurmans, Csaba Szepesvári:
On the Sample Complexity of Batch Reinforcement Learning with Policy-Induced Data. CoRR abs/2106.09973 (2021) - [i51]Hongyu Ren, Hanjun Dai, Zihang Dai, Mengjiao Yang, Jure Leskovec, Dale Schuurmans, Bo Dai:
Combiner: Full Attention Transformer with Sparse Computation Cost. CoRR abs/2107.05768 (2021) - [i50]Hongyu Ren, Hanjun Dai, Bo Dai, Xinyun Chen, Denny Zhou, Jure Leskovec, Dale Schuurmans:
SMORE: Knowledge Graph Completion and Multi-hop Reasoning in Massive Knowledge Graphs. CoRR abs/2110.14890 (2021) - [i49]Jincheng Mei, Bo Dai, Chenjun Xiao, Csaba Szepesvári, Dale Schuurmans:
Understanding the Effect of Stochasticity in Policy Optimization. CoRR abs/2110.15572 (2021) - [i48]Hanjun Dai, Yuan Xue, Zia Syed, Dale Schuurmans, Bo Dai:
Neural Stochastic Dual Dynamic Programming. CoRR abs/2112.00874 (2021) - 2020
- [c160]Ruiyi Zhang, Bo Dai, Lihong Li, Dale Schuurmans:
GenDICE: Generalized Offline Estimation of Stationary Values. ICLR 2020 - [c159]Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi:
An Optimistic Perspective on Offline Reinforcement Learning. ICML 2020: 104-114 - [c158]Hanjun Dai, Azade Nazi, Yujia Li, Bo Dai, Dale Schuurmans:
Scalable Deep Generative Modeling for Sparse Graphs. ICML 2020: 2302-2312 - [c157]Jincheng Mei, Chenjun Xiao, Csaba Szepesvári, Dale Schuurmans:
On the Global Convergence Rates of Softmax Policy Gradient Methods. ICML 2020: 6820-6829 - [c156]Dijia Su, Jayden Ooi, Tyler Lu, Dale Schuurmans, Craig Boutilier:
ConQUR: Mitigating Delusional Bias in Deep Q-Learning. ICML 2020: 9187-9195 - [c155]Junfeng Wen, Bo Dai, Lihong Li, Dale Schuurmans:
Batch Stationary Distribution Estimation. ICML 2020: 10203-10213 - [c154]Junfeng Wen, Russell Greiner, Dale Schuurmans:
Domain Aggregation Networks for Multi-Source Domain Adaptation. ICML 2020: 10214-10224 - [c153]Mengjiao Yang, Bo Dai, Hanjun Dai, Dale Schuurmans:
Energy-Based Processes for Exchangeable Data. ICML 2020: 10681-10692 - [c152]Denny Zhou, Mao Ye, Chen Chen, Tianjian Meng, Mingxing Tan, Xiaodan Song, Quoc V. Le, Qiang Liu, Dale Schuurmans:
Go Wide, Then Narrow: Efficient Training of Deep Thin Networks. ICML 2020: 11546-11555 - [c151]Bo Dai, Ofir Nachum, Yinlam Chow, Lihong Li, Csaba Szepesvári, Dale Schuurmans:
CoinDICE: Off-Policy Confidence Interval Estimation. NeurIPS 2020 - [c150]Hanjun Dai, Rishabh Singh, Bo Dai, Charles Sutton, Dale Schuurmans:
Learning Discrete Energy-based Models via Auxiliary-variable Local Exploration. NeurIPS 2020 - [c149]Nevena Lazic, Dong Yin, Mehrdad Farajtabar, Nir Levine, Dilan Görür, Chris Harris, Dale Schuurmans:
A Maximum-Entropy Approach to Off-Policy Evaluation in Average-Reward MDPs. NeurIPS 2020 - [c148]Jincheng Mei, Chenjun Xiao, Bo Dai, Lihong Li, Csaba Szepesvári, Dale Schuurmans:
Escaping the Gravitational Pull of Softmax. NeurIPS 2020 - [c147]Mengjiao Yang, Ofir Nachum, Bo Dai, Lihong Li, Dale Schuurmans:
Off-Policy Evaluation via the Regularized Lagrangian. NeurIPS 2020 - [i47]Ruiyi Zhang, Bo Dai, Lihong Li, Dale Schuurmans:
GenDICE: Generalized Offline Estimation of Stationary Values. CoRR abs/2002.09072 (2020) - [i46]Andy Su, Jayden Ooi, Tyler Lu, Dale Schuurmans, Craig Boutilier:
ConQUR: Mitigating Delusional Bias in Deep Q-learning. CoRR abs/2002.12399 (2020) - [i45]Junfeng Wen, Bo Dai, Lihong Li, Dale Schuurmans:
Batch Stationary Distribution Estimation. CoRR abs/2003.00722 (2020) - [i44]Mahdi Karami, Dale Schuurmans:
Variational Inference for Deep Probabilistic Canonical Correlation Analysis. CoRR abs/2003.04292 (2020) - [i43]Mengjiao Yang, Bo Dai, Hanjun Dai, Dale Schuurmans:
Energy-Based Processes for Exchangeable Data. CoRR abs/2003.07521 (2020) - [i42]Jincheng Mei, Chenjun Xiao, Csaba Szepesvári, Dale Schuurmans:
On the Global Convergence Rates of Softmax Policy Gradient Methods. CoRR abs/2005.06392 (2020) - [i41]Nevena Lazic, Dong Yin, Mehrdad Farajtabar, Nir Levine, Dilan Görür, Chris Harris, Dale Schuurmans:
A maximum-entropy approach to off-policy evaluation in average-reward MDPs. CoRR abs/2006.12620 (2020) - [i40]Hanjun Dai, Azade Nazi, Yujia Li, Bo Dai, Dale Schuurmans:
Scalable Deep Generative Modeling for Sparse Graphs. CoRR abs/2006.15502 (2020) - [i39]Denny Zhou, Mao Ye, Chen Chen, Tianjian Meng, Mingxing Tan, Xiaodan Song, Quoc V. Le, Qiang Liu, Dale Schuurmans:
Go Wide, Then Narrow: Efficient Training of Deep Thin Networks. CoRR abs/2007.00811 (2020) - [i38]Mengjiao Yang, Ofir Nachum, Bo Dai, Lihong Li, Dale Schuurmans:
Off-Policy Evaluation via the Regularized Lagrangian. CoRR abs/2007.03438 (2020) - [i37]Seyed Kamyar Seyed Ghasemipour, Dale Schuurmans, Shixiang Shane Gu:
EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL. CoRR abs/2007.11091 (2020) - [i36]Nan Ding, Xinjie Fan, Zhenzhong Lan, Dale Schuurmans, Radu Soricut:
Attention that does not Explain Away. CoRR abs/2009.14308 (2020) - [i35]Bo Dai, Ofir Nachum, Yinlam Chow, Lihong Li, Csaba Szepesvári, Dale Schuurmans:
CoinDICE: Off-Policy Confidence Interval Estimation. CoRR abs/2010.11652 (2020) - [i34]Hanjun Dai, Rishabh Singh, Bo Dai, Charles Sutton, Dale Schuurmans:
Learning Discrete Energy-based Models via Auxiliary-variable Local Exploration. CoRR abs/2011.05363 (2020) - [i33]Mengjiao Yang, Bo Dai, Ofir Nachum, George Tucker, Dale Schuurmans:
Offline Policy Selection under Uncertainty. CoRR abs/2012.06919 (2020)
2010 – 2019
- 2019
- [c146]Bo Dai, Hanjun Dai, Arthur Gretton, Le Song, Dale Schuurmans, Niao He:
Kernel Exponential Family Estimation via Doubly Dual Embedding. AISTATS 2019: 2321-2330 - [c145]Rishabh Agarwal, Chen Liang, Dale Schuurmans, Mohammad Norouzi:
Learning to Generalize from Sparse and Underspecified Rewards. ICML 2019: 130-140 - [c144]Zafarali Ahmed, Nicolas Le Roux, Mohammad Norouzi, Dale Schuurmans:
Understanding the Impact of Entropy on Policy Optimization. ICML 2019: 151-160 - [c143]Robert Dadashi, Marc G. Bellemare, Adrien Ali Taïga, Nicolas Le Roux, Dale Schuurmans:
The Value Function Polytope in Reinforcement Learning. ICML 2019: 1486-1495 - [c142]Jincheng Mei, Chenjun Xiao, Ruitong Huang, Dale Schuurmans, Martin Müller:
On Principled Entropy Exploration in Policy Optimization. IJCAI 2019: 3130-3136 - [c141]Martin Mladenov, Ofer Meshi, Jayden Ooi, Dale Schuurmans, Craig Boutilier:
Advantage Amplification in Slowly Evolving Latent-State Environments. IJCAI 2019: 3165-3172 - [c140]Marc G. Bellemare, Will Dabney, Robert Dadashi, Adrien Ali Taïga, Pablo Samuel Castro, Nicolas Le Roux, Dale Schuurmans, Tor Lattimore, Clare Lyle:
A Geometric Perspective on Optimal Representations for Reinforcement Learning. NeurIPS 2019: 4360-4371 - [c139]Mahdi Karami, Dale Schuurmans, Jascha Sohl-Dickstein, Laurent Dinh, Daniel Duckworth:
Invertible Convolutional Flow. NeurIPS 2019: 5636-5646 - [c138]Minmin Chen, Ramki Gummadi, Chris Harris, Dale Schuurmans:
Surrogate Objectives for Batch Policy Optimization in One-step Decision Making. NeurIPS 2019: 8825-8835 - [c137]Chenjun Xiao, Ruitong Huang, Jincheng Mei, Dale Schuurmans, Martin Müller:
Maximum Entropy Monte-Carlo Planning. NeurIPS 2019: 9516-9524 - [c136]Bo Dai, Zhen Liu, Hanjun Dai, Niao He, Arthur Gretton, Le Song, Dale Schuurmans:
Exponential Family Estimation via Adversarial Dynamics Embedding. NeurIPS 2019: 10977-10988 - [i32]Robert Dadashi, Adrien Ali Taïga, Nicolas Le Roux, Dale Schuurmans, Marc G. Bellemare:
The Value Function Polytope in Reinforcement Learning. CoRR abs/1901.11524 (2019) - [i31]Marc G. Bellemare, Will Dabney, Robert Dadashi, Adrien Ali Taïga, Pablo Samuel Castro, Nicolas Le Roux, Dale Schuurmans, Tor Lattimore, Clare Lyle:
A Geometric Perspective on Optimal Representations for Reinforcement Learning. CoRR abs/1901.11530 (2019) - [i30]Rishabh Agarwal, Chen Liang, Dale Schuurmans, Mohammad Norouzi:
Learning to Generalize from Sparse and Underspecified Rewards. CoRR abs/1902.07198 (2019) - [i29]Bo Dai, Zhen Liu, Hanjun Dai, Niao He, Arthur Gretton, Le Song, Dale Schuurmans:
Exponential Family Estimation via Adversarial Dynamics Embedding. CoRR abs/1904.12083 (2019) - [i28]Martin Mladenov, Ofer Meshi, Jayden Ooi, Dale Schuurmans, Craig Boutilier:
Advantage Amplification in Slowly Evolving Latent-State Environments. CoRR abs/1905.13559 (2019) - [i27]Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi:
Striving for Simplicity in Off-policy Deep Reinforcement Learning. CoRR abs/1907.04543 (2019) - [i26]Junfeng Wen, Russell Greiner, Dale Schuurmans:
Domain Aggregation Networks for Multi-Source Domain Adaptation. CoRR abs/1909.05352 (2019) - [i25]Ofir Nachum, Bo Dai, Ilya Kostrikov, Yinlam Chow, Lihong Li, Dale Schuurmans:
AlgaeDICE: Policy Gradient from Arbitrary Experience. CoRR abs/1912.02074 (2019) - [i24]Chenjun Xiao, Yifan Wu, Chen Ma, Dale Schuurmans, Martin Müller:
Learning to Combat Compounding-Error in Model-Based Reinforcement Learning. CoRR abs/1912.11206 (2019) - 2018
- [c135]Aditya Grover, Ramki Gummadi, Miguel Lázaro-Gredilla, Dale Schuurmans, Stefano Ermon:
Variational Rejection Sampling. AISTATS 2018: 823-832 - [c134]Ofir Nachum, Mohammad Norouzi, Kelvin Xu, Dale Schuurmans:
Trust-PCL: An Off-Policy Trust Region Method for Continuous Control. ICLR (Poster) 2018 - [c133]Ofir Nachum, Mohammad Norouzi, George Tucker, Dale Schuurmans:
Smoothed Action Value Functions for Learning Gaussian Policies. ICML 2018: 3689-3697 - [c132]Craig Boutilier, Alon Cohen, Avinatan Hassidim, Yishay Mansour, Ofer Meshi, Martin Mladenov, Dale Schuurmans:
Planning and Learning with Stochastic Action Sets. IJCAI 2018: 4674-4682 - [c131]Tyler Lu, Dale Schuurmans, Craig Boutilier:
Non-delusional Q-learning and value-iteration. NeurIPS 2018: 9971-9981 - [i23]Ofir Nachum, Mohammad Norouzi, George Tucker, Dale Schuurmans:
Smoothed Action Value Functions for Learning Gaussian Policies. CoRR abs/1803.02348 (2018) - [i22]Aditya Grover, Ramki Gummadi, Miguel Lázaro-Gredilla, Dale Schuurmans, Stefano Ermon:
Variational Rejection Sampling. CoRR abs/1804.01712 (2018) - [i21]Craig Boutilier, Alon Cohen, Amit Daniely, Avinatan Hassidim, Yishay Mansour, Ofer Meshi, Martin Mladenov, Dale Schuurmans:
Planning and Learning with Stochastic Action Sets. CoRR abs/1805.02363 (2018) - [i20]Bo Dai, Hanjun Dai, Arthur Gretton, Le Song, Dale Schuurmans, Niao He:
Kernel Exponential Family Estimation via Doubly Dual Embedding. CoRR abs/1811.02228 (2018) - [i19]Zafarali Ahmed, Nicolas Le Roux, Mohammad Norouzi, Dale Schuurmans:
Understanding the impact of entropy on policy optimization. CoRR abs/1811.11214 (2018) - 2017
- [j22]Yaoliang Yu, Xinhua Zhang, Dale Schuurmans:
Generalized Conditional Gradient for Sparse Estimation. J. Mach. Learn. Res. 18: 144:1-144:46 (2017) - [c130]Martin A. Zinkevich, Dale Schuurmans:
Formalizing Anthropomorphism Through Games: A Study in Deep Neural Networks. AAAI Workshops 2017 - [c129]Ofir Nachum, Mohammad Norouzi, Dale Schuurmans:
Improving Policy Gradient by Exploring Under-appreciated Rewards. ICLR (Poster) 2017 - [c128]Martin Mladenov, Craig Boutilier, Dale Schuurmans, Ofer Meshi, Gal Elidan, Tyler Lu:
Logistic Markov Decision Processes. IJCAI 2017: 2486-2493 - [c127]Ofir Nachum, Mohammad Norouzi, Kelvin Xu, Dale Schuurmans:
Bridging the Gap Between Value and Policy Based Reinforcement Learning. NIPS 2017: 2775-2785 - [c126]