


default search action
Lihong Li 0001
Person information
- affiliation: Amazon, Seattle, WA, USA
- affiliation (former): Google, Kirkland, WA, USA
- affiliation (former): Microsoft Research, Redmond, WA, USA
- affiliation (former): Yahoo! Research, Santa Clara, CA, USA
- affiliation (former): Rutgers University, Piscataway, NJ, USA
- affiliation (former): University of Alberta, Edmonton, AB, Canada
Other persons with the same name
- Lihong Li — disambiguation page
- Lihong Li 0002
— City University of New York, NY, USA
- Lihong Li 0003 — Hebei Polytechnic University, Tangshan, Hebei, China
- Lihong Li 0004 — Chongqing University of Technology, Chongqing, China
- Lihong Li 0005 — Hebei University of Engineering, Handan, China
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2022
- [c89]Xiaoyu Chen, Jiachen Hu, Chi Jin, Lihong Li, Liwei Wang:
Understanding Domain Randomization for Sim-to-real Transfer. ICLR 2022 - [c88]Ziyang Tang, Yiheng Duan, Steven Zhu, Stephanie Zhang, Lihong Li:
Estimating Long-term Effects from Experimental Data. RecSys 2022: 516-518 - [i66]Ziyang Tang, Yiheng Duan, Stephanie Zhang, Lihong Li:
A Reinforcement Learning Approach to Estimating Long-term Treatment Effects. CoRR abs/2210.07536 (2022) - 2021
- [j15]Yuxi Li, Alborz Geramifard, Lihong Li, Csaba Szepesvári, Tao Wang:
Guest editorial: special issue on reinforcement learning for real life. Mach. Learn. 110(9): 2291-2293 (2021) - [c87]Andrew Bennett, Nathan Kallus, Lihong Li, Ali Mousavi:
Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with Latent Confounders. AISTATS 2021: 1999-2007 - [c86]Xiaoyu Chen, Jiachen Hu, Lihong Li, Liwei Wang:
Efficient Reinforcement Learning in Factored MDPs with Application to Constrained RL. ICLR 2021 - [c85]Weitong Zhang, Dongruo Zhou
, Lihong Li, Quanquan Gu:
Neural Thompson Sampling. ICLR 2021 - [c84]Jiachen Hu, Xiaoyu Chen, Chi Jin, Lihong Li, Liwei Wang:
Near-Optimal Representation Learning for Linear Bandits and Linear RL. ICML 2021: 4349-4358 - [c83]Chenjun Xiao, Yifan Wu, Jincheng Mei, Bo Dai, Tor Lattimore, Lihong Li, Csaba Szepesvári, Dale Schuurmans:
On the Optimality of Batch Policy Optimization Algorithms. ICML 2021: 11362-11371 - [i65]Jiachen Hu, Xiaoyu Chen, Chi Jin, Lihong Li, Liwei Wang:
Near-optimal Representation Learning for Linear Bandits and Linear RL. CoRR abs/2102.04132 (2021) - [i64]Chenjun Xiao, Yifan Wu, Tor Lattimore, Bo Dai, Jincheng Mei, Lihong Li, Csaba Szepesvári, Dale Schuurmans:
On the Optimality of Batch Policy Optimization Algorithms. CoRR abs/2104.02293 (2021) - [i63]Yi Liu, Lihong Li:
A Map of Bandits for E-commerce. CoRR abs/2107.00680 (2021) - [i62]Xiaoyu Chen, Jiachen Hu, Chi Jin, Lihong Li, Liwei Wang:
Understanding Domain Randomization for Sim-to-real Transfer. CoRR abs/2110.03239 (2021) - 2020
- [c82]Branislav Kveton, Manzil Zaheer, Csaba Szepesvári, Lihong Li, Mohammad Ghavamzadeh, Craig Boutilier:
Randomized Exploration in Generalized Linear Bandits. AISTATS 2020: 2066-2076 - [c81]Ali Mousavi, Lihong Li, Qiang Liu, Denny Zhou:
Black-box Off-policy Estimation for Infinite-Horizon Reinforcement Learning. ICLR 2020 - [c80]Ziyang Tang, Yihao Feng, Lihong Li, Dengyong Zhou, Qiang Liu:
Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation. ICLR 2020 - [c79]Ruiyi Zhang, Bo Dai, Lihong Li, Dale Schuurmans:
GenDICE: Generalized Offline Estimation of Stationary Values. ICLR 2020 - [c78]Junfeng Wen, Bo Dai, Lihong Li, Dale Schuurmans:
Batch Stationary Distribution Estimation. ICML 2020: 10203-10213 - [c77]Dongruo Zhou
, Lihong Li, Quanquan Gu:
Neural Contextual Bandits with UCB-based Exploration. ICML 2020: 11492-11502 - [c76]Bo Dai, Ofir Nachum, Yinlam Chow, Lihong Li, Csaba Szepesvári, Dale Schuurmans:
CoinDICE: Off-Policy Confidence Interval Estimation. NeurIPS 2020 - [c75]Jincheng Mei, Chenjun Xiao, Bo Dai, Lihong Li, Csaba Szepesvári, Dale Schuurmans:
Escaping the Gravitational Pull of Softmax. NeurIPS 2020 - [c74]Mengjiao Yang, Ofir Nachum, Bo Dai, Lihong Li, Dale Schuurmans:
Off-Policy Evaluation via the Regularized Lagrangian. NeurIPS 2020 - [i61]Ge Liu, Rui Wu, Heng-Tze Cheng, Jing Wang, Jayden Ooi, Lihong Li, Ang Li, Wai Lok Sibon Li, Craig Boutilier, Ed H. Chi:
Data Efficient Training for Reinforcement Learning with Adaptive Behavior Policy Sharing. CoRR abs/2002.05229 (2020) - [i60]Ruiyi Zhang, Bo Dai, Lihong Li, Dale Schuurmans:
GenDICE: Generalized Offline Estimation of Stationary Values. CoRR abs/2002.09072 (2020) - [i59]Junfeng Wen, Bo Dai, Lihong Li, Dale Schuurmans:
Batch Stationary Distribution Estimation. CoRR abs/2003.00722 (2020) - [i58]Ali Mousavi, Lihong Li, Qiang Liu, Denny Zhou:
Black-box Off-policy Estimation for Infinite-Horizon Reinforcement Learning. CoRR abs/2003.11126 (2020) - [i57]Mengjiao Yang, Ofir Nachum, Bo Dai, Lihong Li, Dale Schuurmans:
Off-Policy Evaluation via the Regularized Lagrangian. CoRR abs/2007.03438 (2020) - [i56]Andrew Bennett, Nathan Kallus, Lihong Li, Ali Mousavi:
Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with Latent Confounders. CoRR abs/2007.13893 (2020) - [i55]Xiaoyu Chen, Jiachen Hu, Lihong Li, Liwei Wang:
Efficient Reinforcement Learning in Factored MDPs with Application to Constrained RL. CoRR abs/2008.13319 (2020) - [i54]Weitong Zhang
, Dongruo Zhou, Lihong Li, Quanquan Gu:
Neural Thompson Sampling. CoRR abs/2010.00827 (2020) - [i53]Bo Dai, Ofir Nachum, Yinlam Chow, Lihong Li, Csaba Szepesvári, Dale Schuurmans:
CoinDICE: Off-Policy Confidence Interval Estimation. CoRR abs/2010.11652 (2020)
2010 – 2019
- 2019
- [j14]Lihong Li:
A perspective on off-policy evaluation in reinforcement learning. Frontiers Comput. Sci. 13(5): 911-912 (2019) - [j13]Jianfeng Gao, Michel Galley, Lihong Li:
Neural Approaches to Conversational AI. Found. Trends Inf. Retr. 13(2-3): 127-298 (2019) - [c73]Honghua Dong, Jiayuan Mao, Tian Lin, Chong Wang, Lihong Li, Denny Zhou:
Neural Logic Machines. ICLR (Poster) 2019 - [c72]Christoph Dann, Lihong Li, Wei Wei, Emma Brunskill:
Policy Certificates: Towards Accountable Reinforcement Learning. ICML 2019: 1507-1516 - [c71]Ofir Nachum, Yinlam Chow, Bo Dai, Lihong Li:
DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections. NeurIPS 2019: 2315-2325 - [c70]Yihao Feng, Lihong Li, Qiang Liu:
A Kernel Loss for Solving the Bellman Equation. NeurIPS 2019: 15430-15441 - [i52]Honghua Dong, Jiayuan Mao, Tian Lin, Chong Wang, Lihong Li, Denny Zhou:
Neural Logic Machines. CoRR abs/1904.11694 (2019) - [i51]Yihao Feng, Lihong Li, Qiang Liu:
A Kernel Loss for Solving the Bellman Equation. CoRR abs/1905.10506 (2019) - [i50]Ofir Nachum, Yinlam Chow, Bo Dai, Lihong Li:
DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections. CoRR abs/1906.04733 (2019) - [i49]Branislav Kveton, Manzil Zaheer, Csaba Szepesvári, Lihong Li, Mohammad Ghavamzadeh, Craig Boutilier:
Randomized Exploration in Generalized Linear Bandits. CoRR abs/1906.08947 (2019) - [i48]Ziyang Tang, Yihao Feng, Lihong Li, Dengyong Zhou, Qiang Liu:
Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation. CoRR abs/1910.07186 (2019) - [i47]Dongruo Zhou, Lihong Li, Quanquan Gu:
Neural Contextual Bandits with Upper Confidence Bound-Based Exploration. CoRR abs/1911.04462 (2019) - [i46]Ofir Nachum, Bo Dai, Ilya Kostrikov, Yinlam Chow, Lihong Li, Dale Schuurmans:
AlgaeDICE: Policy Gradient from Arbitrary Experience. CoRR abs/1912.02074 (2019) - 2018
- [c69]Zachary C. Lipton, Xiujun Li, Jianfeng Gao, Lihong Li, Faisal Ahmed, Li Deng:
BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems. AAAI 2018: 5237-5244 - [c68]Jianfeng Gao, Michel Galley, Lihong Li:
Neural Approaches to Conversational AI. ACL (5) 2018: 2-7 - [c67]Da Tang, Xiujun Li, Jianfeng Gao, Chong Wang, Lihong Li, Tony Jebara:
Subgoal Discovery for Hierarchical Dialogue Policy Learning. EMNLP 2018: 2298-2309 - [c66]Yuzhe Ma, Kwang-Sung Jun, Lihong Li, Xiaojin Zhu:
Data Poisoning Attacks in Contextual Bandits. GameSec 2018: 186-204 - [c65]Bo Dai, Albert E. Shaw, Niao He, Lihong Li, Le Song:
Boosting the Actor with Dual Critic. ICLR (Poster) 2018 - [c64]Yichen Chen, Lihong Li, Mengdi Wang:
Scalable Bilinear Learning Using State and Action Features. ICML 2018: 833-842 - [c63]Bo Dai, Albert E. Shaw, Lihong Li, Lin Xiao, Niao He, Zhen Liu, Jianshu Chen, Le Song:
SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation. ICML 2018: 1133-1142 - [c62]Kwang-Sung Jun, Lihong Li, Yuzhe Ma, Xiaojin (Jerry) Zhu:
Adversarial Attacks on Stochastic Bandits. NeurIPS 2018: 3644-3653 - [c61]Qiang Liu, Lihong Li, Ziyang Tang, Dengyong Zhou:
Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation. NeurIPS 2018: 5361-5371 - [c60]Jianfeng Gao, Michel Galley, Lihong Li:
Neural Approaches to Conversational AI. SIGIR 2018: 1371-1374 - [i45]Da Tang, Xiujun Li, Jianfeng Gao, Chong Wang, Lihong Li, Tony Jebara:
Subgoal Discovery for Hierarchical Dialogue Policy Learning. CoRR abs/1804.07855 (2018) - [i44]Yichen Chen, Lihong Li, Mengdi Wang:
Scalable Bilinear π Learning Using State and Action Features. CoRR abs/1804.10328 (2018) - [i43]Yuzhe Ma, Kwang-Sung Jun, Lihong Li, Xiaojin Zhu:
Data Poisoning Attacks in Contextual Bandits. CoRR abs/1808.05760 (2018) - [i42]Jianfeng Gao, Michel Galley, Lihong Li:
Neural Approaches to Conversational AI. CoRR abs/1809.08267 (2018) - [i41]Kwang-Sung Jun, Lihong Li, Yuzhe Ma, Xiaojin Zhu:
Adversarial Attacks on Stochastic Bandits. CoRR abs/1810.12188 (2018) - [i40]Qiang Liu, Lihong Li, Ziyang Tang, Dengyong Zhou:
Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation. CoRR abs/1810.12429 (2018) - [i39]Christoph Dann, Lihong Li, Wei Wei, Emma Brunskill:
Policy Certificates: Towards Accountable Reinforcement Learning. CoRR abs/1811.03056 (2018) - 2017
- [c59]Bhuwan Dhingra, Lihong Li, Xiujun Li, Jianfeng Gao, Yun-Nung Chen, Faisal Ahmed, Li Deng:
Towards End-to-End Reinforcement Learning of Dialogue Agents for Information Access. ACL (1) 2017: 484-495 - [c58]Baolin Peng, Xiujun Li, Lihong Li, Jianfeng Gao, Asli Celikyilmaz
, Sungjin Lee, Kam-Fai Wong
:
Composite Task-Completion Dialogue Policy Learning via Hierarchical Deep Reinforcement Learning. EMNLP 2017: 2231-2240 - [c57]Emilio Parisotto, Abdel-rahman Mohamed, Rishabh Singh, Lihong Li, Dengyong Zhou, Pushmeet Kohli:
Neuro-Symbolic Program Synthesis. ICLR (Poster) 2017 - [c56]Simon S. Du, Jianshu Chen, Lihong Li, Lin Xiao, Dengyong Zhou:
Stochastic Variance Reduction Methods for Policy Evaluation. ICML 2017: 1049-1058 - [c55]Lihong Li, Yu Lu, Dengyong Zhou:
Provably Optimal Algorithms for Generalized Linear Contextual Bandits. ICML 2017: 2071-2080 - [c54]Xiujun Li, Yun-Nung Chen, Lihong Li, Jianfeng Gao, Asli Celikyilmaz:
End-to-End Task-Completion Neural Dialogue Systems. IJCNLP(1) 2017: 733-743 - [c53]Jianshu Chen, Chong Wang, Lin Xiao, Ji He, Lihong Li, Li Deng:
Q-LDA: Uncovering Latent Patterns in Text-based Sequential Decision Processes. NIPS 2017: 4977-4986 - [i38]Simon S. Du, Jianshu Chen, Lihong Li, Lin Xiao, Dengyong Zhou:
Stochastic Variance Reduction Methods for Policy Evaluation. CoRR abs/1702.07944 (2017) - [i37]Asli Celikyilmaz, Li Deng, Lihong Li, Chong Wang:
Scaffolding Networks for Teaching and Learning to Comprehend. CoRR abs/1702.08653 (2017) - [i36]Lihong Li, Yu Lu, Dengyong Zhou:
Provable Optimal Algorithms for Generalized Linear Contextual Bandits. CoRR abs/1703.00048 (2017) - [i35]Xiujun Li, Yun-Nung Chen, Lihong Li, Jianfeng Gao:
End-to-End Task-Completion Neural Dialogue Systems. CoRR abs/1703.01008 (2017) - [i34]Xiujun Li, Yun-Nung Chen, Lihong Li, Jianfeng Gao, Asli Celikyilmaz:
Investigation of Language Understanding Impact for Reinforcement Learning Based Dialogue Systems. CoRR abs/1703.07055 (2017) - [i33]Baolin Peng, Xiujun Li, Lihong Li, Jianfeng Gao, Asli Celikyilmaz, Sungjin Lee, Kam-Fai Wong:
Composite Task-Completion Dialogue System via Hierarchical Deep Reinforcement Learning. CoRR abs/1704.03084 (2017) - [i32]Bo Dai, Albert E. Shaw, Niao He, Lihong Li, Le Song:
Boosting the Actor with Dual Critic. CoRR abs/1712.10282 (2017) - [i31]Bo Dai, Albert E. Shaw, Lihong Li, Lin Xiao, Niao He, Jianshu Chen, Le Song:
Smoothed Dual Embedding Control. CoRR abs/1712.10285 (2017) - 2016
- [j12]Katja Hofmann, Lihong Li, Filip Radlinski:
Online Evaluation for Information Retrieval. Found. Trends Inf. Retr. 10(1): 1-117 (2016) - [c52]Ji He, Jianshu Chen, Xiaodong He, Jianfeng Gao, Lihong Li, Li Deng, Mari Ostendorf:
Deep Reinforcement Learning with a Natural Language Action Space. ACL (1) 2016 - [c51]Che-Yu Liu, Lihong Li:
On the Prior Sensitivity of Thompson Sampling. ALT 2016: 321-336 - [c50]Shipra Agrawal, Nikhil R. Devanur, Lihong Li:
An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives. COLT 2016: 4-18 - [c49]Ji He, Mari Ostendorf, Xiaodong He, Jianshu Chen, Jianfeng Gao, Lihong Li, Li Deng:
Deep Reinforcement Learning with a Combinatorial Action Space for Predicting Popular Reddit Threads. EMNLP 2016: 1838-1848 - [c48]Nan Jiang, Lihong Li:
Doubly Robust Off-policy Value Evaluation for Reinforcement Learning. ICML 2016: 652-661 - [c47]Tzu-Kuo Huang, Lihong Li, Ara Vartanian, Saleema Amershi, Xiaojin Zhu:
Active Learning with Oracle Epiphany. NIPS 2016: 2820-2828 - [c46]Masrour Zoghi, Tomás Tunys, Lihong Li, Damien Jose, Junyan Chen, Chun Ming Chin, Maarten de Rijke:
Click-based Hot Fixes for Underperforming Torso Queries. SIGIR 2016: 195-204 - [i30]Ji He, Mari Ostendorf, Xiaodong He, Jianshu Chen, Jianfeng Gao, Lihong Li, Li Deng:
Deep Reinforcement Learning with a Combinatorial Action Space for Predicting and Tracking Popular Discussion Threads. CoRR abs/1606.03667 (2016) - [i29]Zachary C. Lipton, Jianfeng Gao, Lihong Li, Xiujun Li, Faisal Ahmed, Li Deng:
Efficient Exploration for Dialog Policy Learning with Deep BBQ Networks \& Replay Buffer Spiking. CoRR abs/1608.05081 (2016) - [i28]Bhuwan Dhingra, Lihong Li, Xiujun Li, Jianfeng Gao, Yun-Nung Chen, Faisal Ahmed, Li Deng:
End-to-End Reinforcement Learning of Dialogue Agents for Information Access. CoRR abs/1609.00777 (2016) - [i27]Zachary C. Lipton, Jianfeng Gao, Lihong Li, Jianshu Chen, Li Deng:
Combating Reinforcement Learning's Sisyphean Curse with Intrinsic Fear. CoRR abs/1611.01211 (2016) - [i26]Emilio Parisotto, Abdel-rahman Mohamed, Rishabh Singh, Lihong Li, Dengyong Zhou, Pushmeet Kohli:
Neuro-Symbolic Program Synthesis. CoRR abs/1611.01855 (2016) - [i25]Xiujun Li, Zachary C. Lipton, Bhuwan Dhingra, Lihong Li, Jianfeng Gao, Yun-Nung Chen:
A User Simulator for Task-Completion Dialogues. CoRR abs/1612.05688 (2016) - 2015
- [c45]Lihong Li, Rémi Munos, Csaba Szepesvári:
Toward Minimax Off-policy Value Estimation. AISTATS 2015 - [c44]Lihong Li, Jin Young Kim, Imed Zitouni:
Toward Predicting the Outcome of an A/B Experiment for Search Relevance. WSDM 2015: 37-46 - [c43]Lihong Li:
Offline Evaluation and Optimization for Interactive Systems. WSDM 2015: 413-414 - [c42]Lihong Li, Shunbao Chen, Jim Kleban, Ankur Gupta:
Counterfactual Estimation and Optimization of Click Metrics in Search Engines: A Case Study. WWW (Companion Volume) 2015: 929-934 - [i24]Miroslav Dudík, Dumitru Erhan, John Langford, Lihong Li:
Doubly Robust Policy Evaluation and Optimization. CoRR abs/1503.02834 (2015) - [i23]Dragomir Yankov, Pavel Berkhin, Lihong Li:
Evaluation of Explore-Exploit Policies in Multi-result Ranking Systems. CoRR abs/1504.07662 (2015) - [i22]Shipra Agrawal, Nikhil R. Devanur, Lihong Li:
Contextual Bandits with Global Constraints and Objective. CoRR abs/1506.03374 (2015) - [i21]Che-Yu Liu, Lihong Li:
On the Prior Sensitivity of Thompson Sampling. CoRR abs/1506.03378 (2015) - [i20]Emma Brunskill, Lihong Li:
The Online Discovery Problem and Its Application to Lifelong Reinforcement Learning. CoRR abs/1506.03379 (2015) - [i19]Xiujun Li, Lihong Li, Jianfeng Gao, Xiaodong He, Jianshu Chen, Li Deng, Ji He:
Recurrent Reinforcement Learning: A Hybrid Approach. CoRR abs/1509.03044 (2015) - [i18]Nan Jiang, Lihong Li:
Doubly Robust Off-policy Evaluation for Reinforcement Learning. CoRR abs/1511.03722 (2015) - [i17]Ji He, Jianshu Chen, Xiaodong He, Jianfeng Gao, Lihong Li, Li Deng, Mari Ostendorf:
Deep Reinforcement Learning with an Unbounded Action Space. CoRR abs/1511.04636 (2015) - 2014
- [j11]Jiang Bian, Bo Long, Lihong Li, Taesup Moon, Anlei Dong, Yi Chang
:
Exploiting User Preference for Online Learning in Web Content Optimization Systems. ACM Trans. Intell. Syst. Technol. 5(2): 33:1-33:23 (2014) - [c41]Emma Brunskill, Lihong Li:
PAC-inspired Option Discovery in Lifelong Reinforcement Learning. ICML 2014: 316-324 - [c40]Alekh Agarwal, Daniel J. Hsu, Satyen Kale, John Langford, Lihong Li, Robert E. Schapire:
Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits. ICML 2014: 1638-1646 - [c39]Lihong Li, He He, Jason D. Williams
:
Temporal supervised learning for inferring a dialog policy from example conversations. SLT 2014: 312-317 - [i16]Alekh Agarwal, Daniel J. Hsu, Satyen Kale, John Langford, Lihong Li, Robert E. Schapire:
Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits. CoRR abs/1402.0555 (2014) - [i15]Lihong Li, Shunbao Chen, Jim Kleban, Ankur Gupta:
Counterfactual Estimation and Optimization of Click Metrics for Search Engines. CoRR abs/1403.1891 (2014) - [i14]Lihong Li, Rémi Munos, Csaba Szepesvári:
On Minimax Optimal Offline Policy Evaluation. CoRR abs/1409.3653 (2014) - 2013
- [c38]Emma Brunskill, Lihong Li:
Sample Complexity of Multi-task Reinforcement Learning. UAI 2013 - [i13]Emma Brunskill, Lihong Li:
Sample Complexity of Multi-task Reinforcement Learning. CoRR abs/1309.6821 (2013) - [i12]Lihong Li:
Generalized Thompson Sampling for Contextual Bandits. CoRR abs/1310.7163 (2013) - [i11]Zhen Qin, Vaclav Petricek, Nikos Karampatziakis, Lihong Li, John Langford:
Efficient Online Bootstrapping for Large Scale Learning. CoRR abs/1312.5021 (2013) - 2012
- [j10]John Langford, Lihong Li, R. Preston McAfee, Kishore Papineni:
Cloud control: voluntary admission control for intranet traffic management. Inf. Syst. E Bus. Manag. 10(3): 295-308 (2012) - [j9]Taesup Moon, Wei Chu, Lihong Li, Zhaohui Zheng, Yi Chang
:
An Online Learning Framework for Refining Recency Search Results with User Click Feedback. ACM Trans. Inf. Syst. 30(4): 20:1-20:28 (2012) - [c37]Miroslav Dudík, Dumitru Erhan, John Langford, Lihong Li:
Sample-efficient Nonstationary Policy Evaluation for Contextual Bandits. UAI 2012: 247-254 - [c36]Vidhya Navalpakkam, Ravi Kumar, Lihong Li, D. Sivakumar:
Attention and Selection in Online Choice Tasks. UMAP 2012: 200-211 - [c35]Hongning Wang
, Anlei Dong, Lihong Li, Yi Chang
, Evgeniy Gabrilovich
:
Joint relevance and freshness learning from clickthroughs for news search. WWW 2012: 579-588 - [c34]Lihong Li, Wei Chu, John Langford, Taesup Moon, Xuanhui Wang:
Bandits with Generalized Linear Models. ICML On-line Trading of Exploration and Exploitation 2012: 19-36 - [c33]Lihong Li, Olivier Chapelle:
Open Problem: Regret Bounds for Thompson Sampling. COLT 2012: 43.1-43.3 - [p1]Lihong Li:
Sample Complexity Bounds of Exploration. Reinforcement Learning 2012: 175-204 - [i10]John Asmuth, Lihong Li, Michael L. Littman, Ali Nouri, David Wingate:
A Bayesian Sampling Approach to Exploration in Reinforcement Learning. CoRR abs/1205.2664 (2012) - [i9]Emma Brunskill, Bethany R. Leffler, Lihong Li, Michael L. Littman, Nicholas Roy:
CORL: A Continuous-state Offset-dynamics Reinforcement Learner. CoRR abs/1206.3231 (2012) - [i8]Alexander L. Strehl, Lihong Li, Michael L. Littman:
Incremental Model-based Learners With Formal Learning-Time Guarantees. CoRR abs/1206.6870 (2012) - [i7]Miroslav Dudík, Dumitru Erhan, John Langford, Lihong Li:
Sample-efficient Nonstationary Policy Evaluation for Contextual Bandits. CoRR abs/1210.4862 (2012) - 2011
- [j8]Lihong Li, Michael L. Littman, Thomas J. Walsh, Alexander L. Strehl:
Knows what it knows: a framework for self-aware learning. Mach. Learn. 82(3): 399-443 (2011) - [c32]Miroslav Dudík, John Langford, Lihong Li:
Doubly Robust Policy Evaluation and Learning. ICML 2011: 1097-1104 - [c31]Wei Chu, Martin Zinkevich, Lihong Li, Achint Thomas, Belle L. Tseng:
Unbiased online active learning in data streams. KDD 2011: 195-203 - [c30]Olivier Chapelle, Lihong Li:
An Empirical Evaluation of Thompson Sampling. NIPS 2011: 2249-2257 - [c29]Lihong Li, Wei Chu, John Langford, Xuanhui Wang:
Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. WSDM 2011: 297-306 - [c28]Alina Beygelzimer, John Langford, Lihong Li, Lev Reyzin, Robert E. Schapire:
Contextual Bandit Algorithms with Supervised Learning Guarantees. AISTATS 2011: 19-26 - [c27]Deepak Agarwal, Lihong Li, Alexander J. Smola:
Linear-Time Estimators for Propensity Scores. AISTATS 2011: 93-100 - [c26]Wei Chu, Lihong Li, Lev Reyzin, Robert E. Schapire:
Contextual Bandits with Linear Payoff Functions. AISTATS 2011: 208-214 - [i6]Taesup Moon, Wei Chu, Lihong Li, Zhaohui Zheng, Yi Chang:
Refining Recency Search Results with User Click Feedback. CoRR abs/1103.3735 (2011) - [i5]Miroslav Dudík, John Langford, Lihong Li:
Doubly Robust Policy Evaluation and Learning. CoRR abs/1103.4601 (2011) - 2010
- [j7]John Langford, Lihong Li, Yevgeniy Vorobeychik
, Jennifer Wortman:
Maintaining Equilibria During Exploration in Sponsored Search Auctions. Algorithmica 58(4): 990-1021 (2010) - [j6]Lihong Li, Michael L. Littman:
Reducing reinforcement learning to KWIK online regression. Ann. Math. Artif. Intell. 58(3-4): 217-237 (2010) - [c25]Taesup Moon, Lihong Li, Wei Chu, Ciya Liao, Zhaohui Zheng, Yi Chang
:
Online learning for recency search ranking using real-time user feedback. CIKM 2010: 1501-1504 - [c24]Alexander L. Strehl, John Langford, Lihong Li, Sham M. Kakade:
Learning from Logged Implicit Exploration Data. NIPS 2010: 2217-2225 - [c23]Martin Zinkevich, Markus Weimer, Alexander J. Smola, Lihong Li:
Parallelized Stochastic Gradient Descent. NIPS 2010: 2595-2603 - [c22]Lihong Li, Wei Chu, John Langford, Robert E. Schapire:
A contextual-bandit approach to personalized news article recommendation. WWW 2010: 661-670 - [i4]Alina Beygelzimer, John Langford, Lihong Li, Lev Reyzin, Robert E. Schapire:
An Optimal High Probability Algorithm for the Contextual Bandit Problem. CoRR abs/1002.4058 (2010) - [i3]Lihong Li, Wei Chu, John Langford, Robert E. Schapire:
A Contextual-Bandit Approach to Personalized News Article Recommendation. CoRR abs/1003.0146 (2010) - [i2]Lihong Li, Wei Chu, John Langford:
An Unbiased, Data-Driven, Offline Evaluation Method of Contextual Bandit Algorithms. CoRR abs/1003.5956 (2010)
2000 – 2009
- 2009
- [j5]Thomas J. Walsh, Ali Nouri, Lihong Li, Michael L. Littman:
Learning and planning in environments with delayed feedback. Auton. Agents Multi Agent Syst. 18(1): 83-105 (2009) - [j4]John Langford, Lihong Li, Tong Zhang:
Sparse Online Learning via Truncated Gradient. J. Mach. Learn. Res. 10: 777-801 (2009) - [j3]Emma Brunskill, Bethany R. Leffler, Lihong Li, Michael L. Littman, Nicholas Roy:
Provably Efficient Learning with Typed Parametric Models. J. Mach. Learn. Res. 10: 1955-1988 (2009) - [j2]Alexander L. Strehl, Lihong Li, Michael L. Littman:
Reinforcement Learning in Finite MDPs: PAC Analysis. J. Mach. Learn. Res. 10: 2413-2444 (2009) - [c21]Lihong Li, Michael L. Littman, Christopher R. Mansley:
Online exploration in least-squares policy iteration. AAMAS (2) 2009: 733-739 - [c20]David Wingate, Carlos Diuk, Lihong Li, Matthew Taylor, Jordan Frank:
Workshop summary: Results of the 2009 reinforcement learning competition. ICML 2009: 6 - [c19]Carlos Diuk, Lihong Li, Bethany R. Leffler:
The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning. ICML 2009: 249-256 - [c18]Lihong Li, Jason D. Williams, Suhrid Balakrishnan:
Reinforcement learning for dialog management using least-squares Policy iteration and fast feature selection. INTERSPEECH 2009: 2475-2478 - [c17]John Asmuth, Lihong Li, Michael L. Littman, Ali Nouri, David Wingate:
A Bayesian Sampling Approach to Exploration in Reinforcement Learning. UAI 2009: 19-26 - 2008
- [c16]Lihong Li:
A worst-case comparison between temporal difference and residual gradient with linear function approximation. ICML 2008: 560-567 - [c15]Lihong Li, Michael L. Littman, Thomas J. Walsh:
Knows what it knows: a framework for self-aware learning. ICML 2008: 568-575 - [c14]Ronald Parr, Lihong Li, Gavin Taylor, Christopher Painter-Wakefield, Michael L. Littman:
An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning. ICML 2008: 752-759 - [c13]Lihong Li, Michael L. Littman:
Efficient Value-Function Approximation via Online Linear Regression. ISAIM 2008 - [c12]John Langford, Lihong Li, Tong Zhang:
Sparse Online Learning via Truncated Gradient. NIPS 2008: 905-912 - [c11]Emma Brunskill, Bethany R. Leffler, Lihong Li, Michael L. Littman, Nicholas Roy:
CORL: A Continuous-state Offset-dynamics Reinforcement Learner. UAI 2008: 53-61 - [i1]John Langford, Lihong Li, Tong Zhang:
Sparse Online Learning via Truncated Gradient. CoRR abs/0806.4686 (2008) - 2007
- [j1]Lihong Li, Vadim Bulitko, Russell Greiner:
Focus of Attention in Reinforcement Learning. J. Univers. Comput. Sci. 13(9): 1246-1269 (2007) - [c10]Thomas J. Walsh, Ali Nouri, Lihong Li, Michael L. Littman:
Planning and Learning in Environments with Delayed Feedback. ECML 2007: 442-453 - [c9]Ronald Parr, Christopher Painter-Wakefield, Lihong Li, Michael L. Littman:
Analyzing feature generation for value-function approximation. ICML 2007: 737-744 - [c8]Jennifer Wortman, Yevgeniy Vorobeychik, Lihong Li, John Langford:
Maintaining Equilibria During Exploration in Sponsored Search Auctions. WINE 2007: 119-130 - 2006
- [c7]Alexander L. Strehl, Lihong Li, Eric Wiewiora, John Langford, Michael L. Littman:
PAC model-free reinforcement learning. ICML 2006: 881-888 - [c6]Lihong Li, Thomas J. Walsh, Michael L. Littman:
Towards a Unified Theory of State Abstraction for MDPs. AI&M 2006 - [c5]Alexander L. Strehl, Lihong Li, Michael L. Littman:
Incremental Model-based Learners With Formal Learning-Time Guarantees. UAI 2006 - 2005
- [c4]Lihong Li, Michael L. Littman:
Lazy Approximation for Solving Continuous Finite-Horizon MDPs. AAAI 2005: 1175-1180 - 2004
- [c3]Lihong Li, Vadim Bulitko, Russell Greiner:
Batch Reinforcement Learning with State Importance. ECML 2004: 566-568 - 2003
- [c2]Ilya Levner, Vadim Bulitko, Lihong Li, Greg Lee, Russell Greiner:
Towards Automated Creation of Image Interpretation Systems. Australian Conference on Artificial Intelligence 2003: 653-665 - [c1]Vadim Bulitko, Lihong Li, Russell Greiner, Ilya Levner:
Lookahead Pathologies for Single Agent Search. IJCAI 2003: 1531-1533
Coauthor Index

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from ,
, and
to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and
to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2025-03-11 21:05 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint