Optimal Oracle Inequalities for Projected Fixed-Point Equations, with Applications to Policy Evaluation
Published Online:28 Dec 2022https://doi.org/10.1287/moor.2022.1341
References
- [1] (2019) Solving inverse problems using data-driven models. Acta Numer. 28(May):1–174.Crossref, Google Scholar
- [2] (2005) Local Rademacher complexities. Ann. Statist. 33(4):1497–1537.Crossref, Google Scholar
- [3] (2018) Sharp oracle inequalities for least squares estimators in shape restricted regression. Ann. Statist. 46(2):745–780.Crossref, Google Scholar
- [4] (2012) Adaptive Algorithms and Stochastic Approximations, translated by Wilson SS (Springer Science & Business Media, New York).Google Scholar
- [5] (2011) Temporal difference methods for general projected equations. IEEE Trans. Automatic Control 56(9):2128–2139.Crossref, Google Scholar
- [6] Bertsekas DP (2018) Proximal algorithms and temporal difference methods for solving fixed point problems. Comput. Optim. Appl. 70(3):709–736.Google Scholar
- [7] (2019) Reinforcement Learning and Optimal Control (Athena Scientific, Belmont, MA).Google Scholar
- [8] (2021) A finite time analysis of temporal difference learning with linear function approximation. Oper. Res. 69(3):950–973.Google Scholar
- [9] (2009) Stochastic Approximation: A Dynamical Systems Viewpoint (Hindustan Book Agency, New Delhi, India).Google Scholar
- [10] (2019) The optimal approximation factor in density estimation. Beygelzimer A, Hsu D, eds. Proc. Machine Learn. Res., vol. 99, Conference on Learning Theory, Phoenix, AZ (PMLR), 318–341.Google Scholar
- [11] (2002) Technical update: Least-squares temporal difference learning. Machine Learn. 49(2–3):233–246.Crossref, Google Scholar
- [12] (1996) Linear least-squares algorithms for temporal difference learning. Machine Learn. 22(1–3):33–57.Crossref, Google Scholar
- [13] (2007) The Mathematical Theory of Finite Element Methods, 3rd ed. (Springer Science & Business Media, New York).Google Scholar
- [14] (2007) Aggregation for Gaussian regression. Ann. Statist. 35(4):1674–1697.Crossref, Google Scholar
- [15] (2007) Sparsity oracle inequalities for the Lasso. Electronic J. Statist. 1:169–194.Crossref, Google Scholar
- [16] (1964) Approximation variationnelle des problèmes aux limites. (In French.) Ann. Inst. Fourier 14(2):345–444.Crossref, Google Scholar
- [17] (2014) Near-optimal density estimation in near-linear time using variable-width histograms. Ghahramani Z, Welling M, Cortes C, Lawrence N, Weinberger KQ, eds. Advances in Neural Information Processing Systems, Vol. 27 (Curran Associates, Red Hook, NY), 1844–1852.Google Scholar
- [18] (2012) Sharp oracle inequalities for aggregation of affine estimators. Ann. Statist. 40(4):2327–2355.Crossref, Google Scholar
- [19] (2018) Optimal Kullback–Leibler aggregation in mixture density estimation by maximum likelihood. Math. Statist. Learn. 1(1):1–35.Crossref, Google Scholar
- [20] (2011) Nonparametric instrumental regression. Econometrica 79(5):1541–1565.Crossref, Google Scholar
- [21] Duchi JC, Ruan F (2021) Asymptotic optimality in stochastic optimization. Ann. Stat. 49(1):21–48.Google Scholar
- [22] (1984) Computational Galerkin methods. Computational Galerkin Methods (Springer, New York), 72–85.Crossref, Google Scholar
- [23] (1915) Series solution of some problems of elastic equilibrium of rods and plates. (In Russian.) Vestnik Inzhenerov i Tekhnikov 19(7):897–908.Google Scholar
- [24] (2020) Consistency of Bayesian inference with Gaussian process priors in an elliptic inverse problem. Inverse Problems 36(8):085001.Crossref, Google Scholar
- [25] (2011) Adaptive discretizations for the choice of a Tikhonov regularization parameter in nonlinear inverse problems. Inverse Problems 27(12):125008.Crossref, Google Scholar
- [26] (2021) Is temporal difference learning optimal? An instance-dependent analysis. SIAM J. Math. Data Sci. 3(4):1013–1040.Google Scholar
- [27] (2017) Oracle inequalities for network models and sparse graphon estimation. Ann. Statist. 45(1):316–354.Crossref, Google Scholar
- [28] (2006) Local Rademacher complexities and oracle inequalities in risk minimization. Ann. Statist. 34(6):2593–2656.Google Scholar
- [29] (2011) Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems: Ecole d’Eté de Probabilités de Saint-Flour XXXVIII-2008 (Springer, Berlin).Crossref, Google Scholar
- [30] (2000) Actor-critic algorithms. Solla S, Leen T, Müller K, eds. Advances in Neural Information Processing Systems, Vol. 12 (MIT Press, Cambridge, MA), 1008–1014.Google Scholar
- [31] (1972) Approximate Solution of Operator Equations, translated by Louvish D (Wolters-Noordhoff Publishing, Groningen, Netherlands).Crossref, Google Scholar
- [32] (2003) Stochastic approximation. Ann. Statist. 31(2):391–406.Crossref, Google Scholar
- [33] (2018) Linear stochastic approximation: How far does constant step-size and iterate averaging go? Storkey A, Perez-Cruz F, eds. Internat. Conf. Artificial Intelligence Statist., Playa Blanca, Lanzarote, Canary Islands (PMLR), 1347–1355.Google Scholar
- [34] (2008) Partial Differential Equations with Numerical Methods (Springer Science & Business Media, New York).Google Scholar
- [35] (2021) Transfer learning for high-dimensional linear regression: Prediction, estimation, and minimax optimality. J. Roy. Statist. Soc. Series B, Statist. Methodol. 84(1):149–173.Google Scholar
- [36] (2022) ROOT-SGD: Sharp nonasymptotics and asymptotic efficiency in a single algorithm. Proc. 35th Conf. Learn. Theory, Proceedings of Machine Learning Research Series (PMLR, London), 909–981.Google Scholar
- [37] (2020) A sketched finite element method for elliptic models. Comput. Methods Appl. Mech. Engrg. 364(June):112933.Crossref, Google Scholar
- [38] (2007) Concentration Inequalities and Model Selection (Springer, Berlin).Google Scholar
- [39] (2006) Risk bounds for statistical learning. Ann. Statist. 34(5):2326–2366.Crossref, Google Scholar
- [40] (2020) On linear stochastic approximation: Fine-grained Polyak-Ruppert and non-asymptotic concentration. Abernethy J, Agarwal S, eds. Proc. 33rd Conf. Learn. Theory (PMLR), 2947–2997.Google Scholar
- [41] (2011) Non-asymptotic analysis of stochastic approximation algorithms for machine learning. Shawe-Taylor J, Zemel R, Bartlett P, Pereira F, Weinberger KQ, eds. Advances in Neural Information Processing Systems, Vol. 24 (Curran Associates, Red Hook, NY), 451–459.Google Scholar
- [42] (2008) Finite-time bounds for fitted value iteration. J. Machine Learn. Res. 9(May):815–857.Google Scholar
- [43] (2009) Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4):1574–1609.Crossref, Google Scholar
- [44] (2017) On Bayesian inference for some statistical inverse problems with partial differential equations. Bernoulli News 24(2):5–9.Google Scholar
- [45] (2021) Instance-dependent ℓ∞-bounds for policy evaluation in tabular reinforcement learning. IEEE Trans. Inform. Theory 67(1):566–585.Crossref, Google Scholar
- [46] (1990) A new method of stochastic approximation type. (In Russian.) Automatika i Telemekh 51(7):98–107.Google Scholar
- [47] (1992) Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30(4):838–855.Crossref, Google Scholar
- [48] (2009) Approximate solution of large-scale linear inverse problems with Monte Carlo simulation. LIDS Report 2822, Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, MA.Google Scholar
- [49] (2012) A quasi Monte Carlo method for large-scale inverse problems. Monte Carlo and Quasi-Monte Carlo Methods 2010 (Springer,), 623–637.Crossref, Google Scholar
- [50] (2017) Empirical entropy, minimax regret and minimax risk. Bernoulli 23(2):789–824.Crossref, Google Scholar
- [51] (2015) 18.S997: High dimensional statistics. Lecture notes (spring 2015), Massachusetts Institute of Technology, Cambridge, MA.Google Scholar
- [52] (1951) A stochastic approximation method. Ann. Math. Statist. 22(3):400–407.Crossref, Google Scholar
- [53] (1994) On-line Q-learning using connectionist systems. Technical Report CUED/F-INFENG/TR 166, Cambridge University Department of Engineering, University of Cambridge, Cambridge, UK.Google Scholar
- [54] (1988) Efficient estimations from a slowly convergent Robbins-Monro process. Technical Report 781, School of Operations Research and Industrial Engineering, Cornell University, Ithaca, NY.Google Scholar
- [55] (2010) Should one compute the temporal difference fix point or minimize the Bellman residual? The unified oblique projection view. Fürnkranz J, Joachims T, eds. Proc. 27th Internat. Conf. Internat. Conf. Machine Learn. (Omnipress, Madison, WI), 959–966.Google Scholar
- [56] (2019) Finite-time error bounds for linear stochastic approximation and TD learning. 32nd Annual Conf. Learn. Theory, Phoenix, AZ (PMLR), 2803–2830.Google Scholar
- [57] (1988) Learning to predict by the methods of temporal differences. Machine Learn. 3(1):9–44.Crossref, Google Scholar
- [58] (2009) Fast gradient-descent methods for temporal-difference learning with linear function approximation. Bottou L, Littman M, eds. Proc. 26th Annual Internat. Conf. Machine Learn. (ACM, New York), 993–1000.Google Scholar
- [59] (2010) Algorithms for Reinforcement Learning (Morgan & Claypool Publishers, San Rafael, CA).Crossref, Google Scholar
- [60] (1997) Analysis of temporal-diffference learning with function approximation. Jordan MI, Petsche T, eds. Proc. 9th Internat. Conf. Neural Inform. Processing Systems (MIT Press, Cambridge, MA), 1075–1081.Google Scholar
- [61] (1999) Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives. IEEE Trans. Automatic Control 44(10):1840–1851.Crossref, Google Scholar
- [62] (2004) Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32(1):135–166.Crossref, Google Scholar
- [63] (2000) Asymptotic Statistics (Cambridge University Press, Cambridge, UK).Google Scholar
- [64] (2006) Performance loss bounds for approximate value iteration with state aggregation. Math. Oper. Res. 31(2):234–244.Link, Google Scholar
- [65] (2019) High-Dimensional Statistics: A Non-Asymptotic Viewpoint (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- [66] (2019) Stochastic approximation with cone-contractive operators: Sharper ℓ∞-bounds for Q-learning. Preprint, submitted May 15, https://doi.org/10.48550/arXiv.1905.06265.Google Scholar
- [67] (2019) Variance-reduced Q-learning is minimax optimal. Preprint, submitted June 11, https://doi.org/10.48550/arXiv.1906.04697.Google Scholar
- [68] (1992) Q-Learning. Machine Learn. 8(3–4):279–292.Crossref, Google Scholar
- [69] (2016) Introductory Econometrics: A Modern Approach. 7th ed. (Nelson Education, Scarborough, ON, Canada).Google Scholar
- [70] (1985) Rates of convergence of minimum distance estimators and Kolmogorov’s entropy. Ann. Statist. 13(2):768–774.Crossref, Google Scholar
- [71] (2010) Error bounds for approximations from projected linear equations. Math. Oper. Res. 35(2):306–329.Link, Google Scholar
- [72] (2020) Deconstructing generative adversarial networks. IEEE Trans. Inform. Theory 66(11):7155–7179.Crossref, Google Scholar

