Nonparametric Approximate Dynamic Programming via the Kernel Method
Published Online:10 Mar 2023https://doi.org/10.1287/stsy.2023.0107
References
- (2020) Discount factor as a regularizer in reinforcement learning. Proc. 37th Internat. Conf. Machine Learning (PMLR), 269–278.Google Scholar
- (2011) Reinforcement Learning Using Kernel-Based Stochastic Factorization, Advances in Neural Information Processing Systems, vol. 24 (MIT Press, Cambridge, MA), 720–728.Google Scholar
- (2002) Rademacher and Gaussian complexities: Risk bounds and structural results. J. Machine Learning Res. 3:463–482.Google Scholar
- (2023) Kernel-based reinforcement learning using Bellman residual elimination. J. Machine Learning Res. Forthcoming.Google Scholar
- (1998) Value iteration and optimization of multiclass queueing networks., 1998. Proc. 37th IEEE Conf. Decision and Control, vol. 1 (IEEE, Piscataway, NJ), 50–55.Google Scholar
- (2005) Maximum pressure policies in stochastic processing networks. Oper. Res. 53(2):197–218.Link, Google Scholar
- (2000) On the existence of fixed points for approximate value iteration and temporal-difference learning. J. Optim. Theory Appl. 105(3):589–608.Google Scholar
- (2003) The linear programming approach to approximate dynamic programming. Oper. Res. 51(6):850–865.Link, Google Scholar
- (2004) On constraint sampling in the linear programming approach to approximate dynamic programming. Math. Oper. Res. 29:462–478.Link, Google Scholar
- (2012) Approximate dynamic programming via a smoothed linear program. Oper. Res. 60(3):655–674.Link, Google Scholar
- (2021) Examining average and discounted reward optimality criteria in reinforcement learning. Preprint, submitted July 3, https://arxiv.org/abs/2107.01348.Google Scholar
- (2002) Batch Value Function Approximation via Support Vectors, Advances in Neural Information Processing Systems, vol. 14 (MIT Press, Cambridge, MA), 1491–1498.Google Scholar
- (2003) Bayes meets Bellman: The Gaussian process approach to temporal difference learning. Proc. 20th Internat. Conf. Machine Learning (AAAI Press, Washington, DC), 154–161.Google Scholar
- (2005) Tree-based batch mode reinforcement learning. J. Machine Learning Res. 6:503–556.Google Scholar
- (2009a) Regularized policy iteration. Adv. Neural Inform. Processing Systems 21:441–448.Google Scholar
- (2009b) Regularized fitted Q-iteration for planning in continuous-space Markovian decision problems. 2009 Amer. Control Conf. ACC’09 (IEEE, Piscataway, NJ), 725–730.Google Scholar
- (1989) Scheduling network of queues: Heavy traffic analysis of a simple open network. Queueing Systems 5:265–280.Google Scholar
- (2016) Empirical dynamic programming. Math. Oper. Res. 41(2):402–429.Link, Google Scholar
- (2015) The dependence of effective planning horizon on model accuracy. Proc. 2015 Internat. Conf. Autonomous Agents Multiagent Systems (IFAAMAS, Richland, SC), 1181–1189.Google Scholar
- (1999) Advances in kernel methods: Support vector learning. Schölkopf B, Burges CJC, Smola AJ, eds. Making Large-Scale Support Vector Machine Learning Practical (MIT Press, Cambridge, MA), 169–184.Google Scholar
- (2009) Regularization and feature selection in least-squares temporal difference learning. ICML’09 Proc. 26th Annu. Internat. Conf. Machine Learning (Association for Computing Machinery, New York), 521–528.Google Scholar
- (1990) Dynamic instabilities and stabilization methods in distributed real-time scheduling of manufacturing systems. IEEE Trans. Automatic Control 35(3):289–298.Google Scholar
- (2004) A numerical method for solving singular stochastic control problems. Oper. Res. 52(4):563–582.Link, Google Scholar
- (1996) Heavy traffic analysis of a controlled multiclass queueing network via weak convergence methods. SIAM J. Control Optim. 34(5):1781–1797.Google Scholar
- (1997) Optimization by Vector Space Methods (John Wiley & Sons, Inc., New York).Google Scholar
- (1960) Linear programming and sequential decisions. Management Sci. 6(3):259–267.Link, Google Scholar
- (1996) Heavy traffic convergence of a controlled multiclass queueing network. SIAM J. Control Optim. 34(6):2133–2171.Google Scholar
- (2008) Approximate and data-driven dynamic programming for queueing networks. Working paper, Columbia University, New York.Google Scholar
- (2002) Kernel-based reinforcement learning in average cost problems. IEEE Trans. Automatic Control 47(10):1624–1636.Google Scholar
- (2002) Tree-based batch mode reinforcement learning. Machine Learning 49(2):161–178.Google Scholar
- (1997) An improved training algorithm for support vector machines. Neural Networks Signal Processing, Proc. 1997 IEEE Workshop (IEEE, Piscataway, NJ), 276–285.Google Scholar
- (2011) Non-parametric approximate linear programming for MDPs. Proc. Twenty-Fifth AAAI Conf. Artificial Intelligence (AAAI Press, Washington, DC), 459–464.Google Scholar
- (2010) Feature selection using regularization in approximate linear programs for Markov decision processes. ICML’10 Proc. 27th Annu. Internat. Conf. Machine Learning (Association for Computing Machinery, New York), 871–879.Google Scholar
- (2001) Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond (The MIT Press, Cambridge, MA).Google Scholar
- (1985) Generalized polynomial approximations in Markovian decision processes. J. Math. Anal. Appl. 110:568–582.Google Scholar
- (2004) Maxweight scheduling in a generalized switch: State space collapse and workload minimization in heavy traffic. Ann. Appl. Probab. 14:1–53.Google Scholar
- (1992) Stability properties of constrained queueing systems and scheduling policies for maximum throughput in multihop radio networks. IEEE Trans. Automatic Control 37(12):1936–1948.Google Scholar
- (2006) Performance loss bounds for approximate value iteration with state aggregation. Math. Oper. Res. 31(2):234–244.Link, Google Scholar
- (2007) Kernel-based least squares policy iteration for reinforcement learning. IEEE Trans. Neural Networks 18(4):973–992.Google Scholar

