Adaptive Discretization in Online Reinforcement Learning
References
- (2019) Reinforcement learning: Theory and algorithms. Technical report, University of Washington, Seattle.Google Scholar
- (2013) pFabric: Minimal near-optimal datacenter transport. Comput. Comm. Rev. 43(4):435–446.Crossref, Google Scholar
- (2010) DCTCP: Efficient packet transport for the commoditized data center (ACM SIGCOMM). https://www.microsoft.com/en-us/research/publication/dctcp-efficient-packet-transport-for-the-commoditized-data-center/.Google Scholar
- (2020a) Control with adaptive Q-learning. Preprint, submitted November 3, https://arxiv.org/abs/2011.02141.Google Scholar
- (2020b) Single-partition adaptive Q-learning. Preprint, submitted July 14, https://arxiv.org/abs/2007.06741.Google Scholar
- (2013) Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model. Machine Learn. 91(3):325–349.Crossref, Google Scholar
- (2017) Minimax regret bounds for reinforcement learning. Proc. 34th Internat. Conf. Machine Learn., vol. 70, 263–272.Google Scholar
- (2021) On the linear convergence of policy gradient methods for finite MDPs. Banerjee A, Fukumizu K, eds. Proc. 24th Internat. Conf. Artificial Intelligence Statist., vol. 130 (PMLR), 2386–2394.Google Scholar
- (2012) Regret Analysis of Stochastic and Nonstochastic Multi-Armed Bandit Problems, Foundations and Trends in Machine Learning, vol. 5 (now Publishers Inc., Norwell, MA).Google Scholar
- (2020) Provably adaptive reinforcement learning in metric spaces. Adv. Neural Inform. Processing Systems 33:9736–9744.Google Scholar
- (2021) Estimating and improving dynamic treatment regimes with a time-varying instrumental variable. Preprint, submitted April 15, https://arxiv.org/abs/2104.07822.Google Scholar
- (2018) Cellular network traffic scheduling with deep reinforcement learning. 32nd AAAI Conf. Artificial Intelligence.Google Scholar
- (2020) Regret bounds for kernel-based reinforcement learning. Preprint, submitted April 12, https://arxiv.org/abs/2004.05599.Google Scholar
- (2020) Is a good representation sufficient for sample efficient reinforcement learning? Internat. Conf. Learn. Representations.Google Scholar
- (2019) Provably efficient Q-Learning with function approximation via distribution shift error checking oracle. Adv. Neural Inform. Processing Systems 33:8058–8068.Google Scholar
- (2019) Tight regret bounds for model-based reinforcement learning with greedy policies. Adv. Neural Inform. Processing Systems 33:12203–12213.Google Scholar
- (2019) Explicit explore-exploit algorithms in continuous state spaces. Adv. Neural Inform. Processing Systems 32:9372–9382.Google Scholar
- (2020) OR-gym: A reinforcement learning library for operations research problems. Preprint, submitted August 14, https://arxiv.org/abs/2008.06319.Google Scholar
- (2008) Self-optimizing memory controllers: A reinforcement learning approach. SIGARCH Comput. Architecture. News 36(3):39–50.Crossref, Google Scholar
- (2010) Near-optimal regret bounds for reinforcement learning. J. Machine Learn. Res. 11:1563–1600.Google Scholar
- (2017) Contextual decision processes with low Bellman rank are PAC-learnable. Internat. Conf. Machine Learn. (PMLR), 1704–1713.Google Scholar
- (2018) Is Q-learning provably efficient? Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, ed. Advances in Neural Information Processing Systems, vol. 31 (Curran Associates, Inc., Red Hook, NY), 4863–4873.Google Scholar
- (2020) Provably efficient reinforcement learning with linear function approximation. Conf. Learn. Theory (PMLR), 2137–2143.Google Scholar
- (2003) Exploration in metric state spaces. Proc. 20th Internat. Conf. Machine Learn., 306–312.Google Scholar
- (2006) Automatic basis function construction for approximate dynamic programming and reinforcement learning. Proc. 23rd Internat. Conf. Machine Learn., 449–456.Google Scholar
- (2019) Bandits and experts in metric spaces. J. ACM 66(4):1–77.Google Scholar
- (2015) Improved regret bounds for undiscounted continuous reinforcement learning. Internat. Conf. Machine Learn., 524–532.Google Scholar
- (2021) Metrics and continuity in reinforcement learning. Proc. Conf. AAAI Artificial Intelligence, vol. 35, 8261–8269.Google Scholar
- (2017) Adaptive state space partitioning of Markov decision processes for elastic resource management. IEEE 33rd Internat. Conf. Data Engrg., 191–194.Google Scholar
- (2018) Competitive caching with machine learned advice. Preprint, submitted February 15, https://arxiv.org/abs/1802.05399.Google Scholar
- (2005) Basis function adaptation in temporal difference reinforcement learning. Ann. Oper. Res. 134(1):215–238.Crossref, Google Scholar
- (2013) Playing Atari with deep reinforcement learning. Preprint, submitted December 19, https://arxiv.org/abs/1312.5602.Google Scholar
- (2016) Asynchronous methods for deep reinforcement learning. Internat. Conf. Machine Learn., 1928–1937.Google Scholar
- (2017) Google’s Alphago defeats Chinese Go master in win for A.I. The New York Times Online (May 23), https://www.nytimes.com/2017/05/23/business/google-deepmind-alphago-go-champion-defeat.html.Google Scholar
- (2014) From bandits to Monte-Carlo tree search: The optimistic principle applied to optimization and planning. Foundations and Trends in Machine Learning, vol. 7 (now Publishers Inc., Norwell, MA), 1–129.Google Scholar
- (2013) Scaling memcache at Facebook. Proc. 10th Sympos. Networked Systems Design Implementation, vol. 13, 385–398.Google Scholar
- (2014) Model-based reinforcement learning and the eluder dimension. Adv. Neural Inform. Processing Systems 27:1466–1474.Google Scholar
- (2022) Reinforcement Learning and Stochastic Optimization: A Unified Framework for Sequential Decisions (Wiley-Blackwell, Hoboken, NJ).Google Scholar
- (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1st ed. (John Wiley & Sons, Inc., New York).Crossref, Google Scholar
- (2001) Decision tree function approximation in reinforcement learning. Proc. Third Internat. Sympos. Adaptive Systems: Evolutionary Comput. Probabilistic Graphical Models, vol. 2, 70–77.Google Scholar
- (1988) Real Analysis, vol. 32 (Macmillan, New York).Google Scholar
- (2013) Eluder dimension and the sample complexity of optimistic exploration. Adv. Neural Inform. Processing Systems 26:2256–2264.Google Scholar
- (2018) Q-Learning with nearest neighbors. Adv. Neural Inform. Processing Systems 31:3111–3121.Google Scholar
- (2020) Sample efficient reinforcement learning via low-rank matrix estimation. Adv. Neural Inform. Processing Systems 33:12092–12103.Google Scholar
- (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484–489.Google Scholar
- (2017) Mastering chess and shogi by self-play with a general reinforcement learning algorithm. Preprint, submitted December 5, https://arxiv.org/abs/1712.01815.Google Scholar
- (2019) Non-asymptotic gap-dependent regret bounds for tabular MDPs. Adv. Neural Inform. Processing Systems 33:1151–1160.Google Scholar
- (2019) Adaptive discretization for episodic reinforcement learning in metric spaces. Proc. ACM Measurement Anal. Comput. Systems, vol. 3, 1–44.Google Scholar
- (2020) Adaptive discretization for model-based reinforcement learning. Adv. Neural Inform. Processing Systems 34:33.Google Scholar
- (1995) Reinforcement learning with soft state aggregation. Adv. Neural Inform. Processing Systems 7:361–368.Google Scholar
- (2014) Contextual bandits with similarity information. J. Machine Learn. Res. 15(1):2533–2568.Google Scholar
- (2019) Introduction to multi-armed bandits. Foundations and Trends in Machine Learning, vol. 12 (now Publishers Inc., Norwell, MA).Google Scholar
- (2019) Efficient model-free reinforcement learning in metric spaces. Preprint, submitted May 1, https://arxiv.org/abs/1905.00475.Google Scholar
- (2018) Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA).Google Scholar
- (2022) Reinforcement learning for datacenter congestion control. Performance Evaluation Rev. 49(2):43–46.Crossref, Google Scholar
- (1998) Tree based discretization for continuous state space reinforcement learning. AAAI/IAAI, 769–774.Google Scholar
- (2020) Provably efficient reinforcement learning with general value function approximation. Preprint, submitted May 21, https://arxiv.org/abs/2005.10804.Google Scholar
- (2019) Optimism in reinforcement learning with generalized linear function approximation. Preprint, submitted December 9, https://arxiv.org/abs/1912.04136.Google Scholar
- (2019) Nonparametric contextual bandits in metric spaces with unknown metric. Adv. Neural Inform. Processing Systems 32:14657–14667.Google Scholar
- (1989) Learning from delayed rewards. PhD thesis, Cambridge University, Cambridge, UK.Google Scholar
- (2006) Evolutionary function approximation for reinforcement learning. J. Machine Learn. Res. 7:877–917.Google Scholar
- (2019) Learning to control in metric space with optimal regret. Preprint, submitted May 5, https://arxiv.org/abs/1905.01576.Google Scholar
- (2019) Tighter problem-dependent regret bounds in reinforcement learning without domain knowledge using value function bounds. Preprint, submitted January 1, https://arxiv.org/abs/1901.00210.Google Scholar
- (2019) Limiting extrapolation in linear approximate value iteration. Adv. Neural Inform. Processing Systems 32:5616–5625.Google Scholar

