[1] Arridge S, Maass P, Öktem O, Schönlieb C-B (2019) Solving inverse problems using data-driven models. Acta Numer. 28(May):1–174.Crossref, Google Scholar
[2] Bartlett PL, Bousquet O, Mendelson S (2005) Local Rademacher complexities. Ann. Statist. 33(4):1497–1537.Crossref, Google Scholar
[3] Bellec PC (2018) Sharp oracle inequalities for least squares estimators in shape restricted regression. Ann. Statist. 46(2):745–780.Crossref, Google Scholar
[4] Benveniste A, Métivier M, Priouret P (2012) Adaptive Algorithms and Stochastic Approximations, translated by Wilson SS (Springer Science & Business Media, New York).Google Scholar
[5] Bertsekas DP (2011) Temporal difference methods for general projected equations. IEEE Trans. Automatic Control 56(9):2128–2139.Crossref, Google Scholar
[6] Bertsekas DP (2018) Proximal algorithms and temporal difference methods for solving fixed point problems. Comput. Optim. Appl. 70(3):709–736.Google Scholar
[7] Bertsekas DP (2019) Reinforcement Learning and Optimal Control (Athena Scientific, Belmont, MA).Google Scholar
[8] Bhandari J, Russo D, Singal R (2021) A finite time analysis of temporal difference learning with linear function approximation. Oper. Res. 69(3):950–973.Google Scholar
[9] Borkar VS (2009) Stochastic Approximation: A Dynamical Systems Viewpoint (Hindustan Book Agency, New Delhi, India).Google Scholar
[10] Bousquet O, Kane D, Moran S (2019) The optimal approximation factor in density estimation. Beygelzimer A, Hsu D, eds. Proc. Machine Learn. Res., vol. 99, Conference on Learning Theory, Phoenix, AZ (PMLR), 318–341.Google Scholar
[11] Boyan JA (2002) Technical update: Least-squares temporal difference learning. Machine Learn. 49(2–3):233–246.Crossref, Google Scholar
[12] Bradtke SJ, Barto AG (1996) Linear least-squares algorithms for temporal difference learning. Machine Learn. 22(1–3):33–57.Crossref, Google Scholar
[13] Brenner SC, Scott LR (2007) The Mathematical Theory of Finite Element Methods, 3rd ed. (Springer Science & Business Media, New York).Google Scholar
[14] Bunea F, Tsybakov AB, Wegkamp MH (2007) Aggregation for Gaussian regression. Ann. Statist. 35(4):1674–1697.Crossref, Google Scholar
[15] Bunea F, Tsybakov A, Wegkamp M (2007) Sparsity oracle inequalities for the Lasso. Electronic J. Statist. 1:169–194.Crossref, Google Scholar
[16] Céa J (1964) Approximation variationnelle des problèmes aux limites. (In French.) Ann. Inst. Fourier 14(2):345–444.Crossref, Google Scholar
[17] Chan SO, Diakonikolas I, Servedio RA, Sun X (2014) Near-optimal density estimation in near-linear time using variable-width histograms. Ghahramani Z, Welling M, Cortes C, Lawrence N, Weinberger KQ, eds. Advances in Neural Information Processing Systems, Vol. 27 (Curran Associates, Red Hook, NY), 1844–1852.Google Scholar
[18] Dalalyan AS, Salmon J (2012) Sharp oracle inequalities for aggregation of affine estimators. Ann. Statist. 40(4):2327–2355.Crossref, Google Scholar
[19] Dalalyan AS, Sebbar M (2018) Optimal Kullback–Leibler aggregation in mixture density estimation by maximum likelihood. Math. Statist. Learn. 1(1):1–35.Crossref, Google Scholar
[20] Darolles S, Fan Y, Florens J-P, Renault E (2011) Nonparametric instrumental regression. Econometrica 79(5):1541–1565.Crossref, Google Scholar
[21] Duchi JC, Ruan F (2021) Asymptotic optimality in stochastic optimization. Ann. Stat. 49(1):21–48.Google Scholar
[22] Fletcher CAJ (1984) Computational Galerkin methods. Computational Galerkin Methods (Springer, New York), 72–85.Crossref, Google Scholar
[23] Galerkin BG (1915) Series solution of some problems of elastic equilibrium of rods and plates. (In Russian.) Vestnik Inzhenerov i Tekhnikov 19(7):897–908.Google Scholar
[24] Giordano M, Nickl R (2020) Consistency of Bayesian inference with Gaussian process priors in an elliptic inverse problem. Inverse Problems 36(8):085001.Crossref, Google Scholar
[25] Kaltenbacher B, Kirchner A, Vexler B (2011) Adaptive discretizations for the choice of a Tikhonov regularization parameter in nonlinear inverse problems. Inverse Problems 27(12):125008.Crossref, Google Scholar
[26] Khamaru K, Pananjady A, Ruan F, Wainwright MJ, Jordan MI (2021) Is temporal difference learning optimal? An instance-dependent analysis. SIAM J. Math. Data Sci. 3(4):1013–1040.Google Scholar
[27] Klopp O, Tsybakov AB, Verzelen N (2017) Oracle inequalities for network models and sparse graphon estimation. Ann. Statist. 45(1):316–354.Crossref, Google Scholar
[28] Koltchinskii V (2006) Local Rademacher complexities and oracle inequalities in risk minimization. Ann. Statist. 34(6):2593–2656.Google Scholar
[29] Koltchinskii V (2011) Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems: Ecole d’Eté de Probabilités de Saint-Flour XXXVIII-2008 (Springer, Berlin).Crossref, Google Scholar
[30] Konda VR, Tsitsiklis JN (2000) Actor-critic algorithms. Solla S, Leen T, Müller K, eds. Advances in Neural Information Processing Systems, Vol. 12 (MIT Press, Cambridge, MA), 1008–1014.Google Scholar
[31] Krasnosel’skii MA, Vaĭnikko GM, Zabreĭko RP, Ruticki YB, Stet’senko VY (1972) Approximate Solution of Operator Equations, translated by Louvish D (Wolters-Noordhoff Publishing, Groningen, Netherlands).Crossref, Google Scholar
[32] Lai TL (2003) Stochastic approximation. Ann. Statist. 31(2):391–406.Crossref, Google Scholar
[33] Lakshminarayanan C, Szepesvári C (2018) Linear stochastic approximation: How far does constant step-size and iterate averaging go? Storkey A, Perez-Cruz F, eds. Internat. Conf. Artificial Intelligence Statist., Playa Blanca, Lanzarote, Canary Islands (PMLR), 1347–1355.Google Scholar
[34] Larsson S, Thomée V (2008) Partial Differential Equations with Numerical Methods (Springer Science & Business Media, New York).Google Scholar
[35] Li S, Cai T, Li H (2021) Transfer learning for high-dimensional linear regression: Prediction, estimation, and minimax optimality. J. Roy. Statist. Soc. Series B, Statist. Methodol. 84(1):149–173.Google Scholar
[36] Li CJ, Mou W, Wainwright MJ, Jordan MI (2022) ROOT-SGD: Sharp nonasymptotics and asymptotic efficiency in a single algorithm. Proc. 35th Conf. Learn. Theory, Proceedings of Machine Learning Research Series (PMLR, London), 909–981.Google Scholar
[37] Lung R, Wu Y, Kamilis D, Polydorides N (2020) A sketched finite element method for elliptic models. Comput. Methods Appl. Mech. Engrg. 364(June):112933.Crossref, Google Scholar
[38] Massart P (2007) Concentration Inequalities and Model Selection (Springer, Berlin).Google Scholar
[39] Massart P, Nédélec É (2006) Risk bounds for statistical learning. Ann. Statist. 34(5):2326–2366.Crossref, Google Scholar
[40] Mou W, Li CJ, Wainwright MJ, Bartlett PL, Jordan MI (2020) On linear stochastic approximation: Fine-grained Polyak-Ruppert and non-asymptotic concentration. Abernethy J, Agarwal S, eds. Proc. 33rd Conf. Learn. Theory (PMLR), 2947–2997.Google Scholar
[41] Moulines É, Bach FR (2011) Non-asymptotic analysis of stochastic approximation algorithms for machine learning. Shawe-Taylor J, Zemel R, Bartlett P, Pereira F, Weinberger KQ, eds. Advances in Neural Information Processing Systems, Vol. 24 (Curran Associates, Red Hook, NY), 451–459.Google Scholar
[42] Munos R, Szepesvári C (2008) Finite-time bounds for fitted value iteration. J. Machine Learn. Res. 9(May):815–857.Google Scholar
[43] Nemirovski A, Juditsky A, Lan G, Shapiro A (2009) Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4):1574–1609.Crossref, Google Scholar
[44] Nickl R (2017) On Bayesian inference for some statistical inverse problems with partial differential equations. Bernoulli News 24(2):5–9.Google Scholar
[45] Pananjady A, Wainwright MJ (2021) Instance-dependent ℓ∞-bounds for policy evaluation in tabular reinforcement learning. IEEE Trans. Inform. Theory 67(1):566–585.Crossref, Google Scholar
[46] Polyak BT (1990) A new method of stochastic approximation type. (In Russian.) Automatika i Telemekh 51(7):98–107.Google Scholar
[47] Polyak BT, Juditsky AB (1992) Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30(4):838–855.Crossref, Google Scholar
[48] Polydorides N, Wang M, Bertsekas DP (2009) Approximate solution of large-scale linear inverse problems with Monte Carlo simulation. LIDS Report 2822, Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, MA.Google Scholar
[49] Polydorides N, Wang M, Bertsekas DP (2012) A quasi Monte Carlo method for large-scale inverse problems. Monte Carlo and Quasi-Monte Carlo Methods 2010 (Springer,), 623–637.Crossref, Google Scholar
[50] Rakhlin A, Sridharan K, Tsybakov AB (2017) Empirical entropy, minimax regret and minimax risk. Bernoulli 23(2):789–824.Crossref, Google Scholar
[51] Rigollet P, Hütter J-C (2015) 18.S997: High dimensional statistics. Lecture notes (spring 2015), Massachusetts Institute of Technology, Cambridge, MA.Google Scholar
[52] Robbins H, Sutton M (1951) A stochastic approximation method. Ann. Math. Statist. 22(3):400–407.Crossref, Google Scholar
[53] Rummery GA, Niranjan M (1994) On-line Q-learning using connectionist systems. Technical Report CUED/F-INFENG/TR 166, Cambridge University Department of Engineering, University of Cambridge, Cambridge, UK.Google Scholar
[54] Ruppert D (1988) Efficient estimations from a slowly convergent Robbins-Monro process. Technical Report 781, School of Operations Research and Industrial Engineering, Cornell University, Ithaca, NY.Google Scholar
[55] Scherrer B (2010) Should one compute the temporal difference fix point or minimize the Bellman residual? The unified oblique projection view. Fürnkranz J, Joachims T, eds. Proc. 27th Internat. Conf. Internat. Conf. Machine Learn. (Omnipress, Madison, WI), 959–966.Google Scholar
[56] Srikant R, Ying L (2019) Finite-time error bounds for linear stochastic approximation and TD learning. 32nd Annual Conf. Learn. Theory, Phoenix, AZ (PMLR), 2803–2830.Google Scholar
[57] Sutton RS (1988) Learning to predict by the methods of temporal differences. Machine Learn. 3(1):9–44.Crossref, Google Scholar
[58] Sutton RS, Maei HR, Precup D, Bhatnagar S, Silver D, Szepesvári C, Wiewiora E (2009) Fast gradient-descent methods for temporal-difference learning with linear function approximation. Bottou L, Littman M, eds. Proc. 26th Annual Internat. Conf. Machine Learn. (ACM, New York), 993–1000.Google Scholar
[59] Szepesvári C (2010) Algorithms for Reinforcement Learning (Morgan & Claypool Publishers, San Rafael, CA).Crossref, Google Scholar
[60] Tsitsiklis JN, Van Roy B (1997) Analysis of temporal-diffference learning with function approximation. Jordan MI, Petsche T, eds. Proc. 9th Internat. Conf. Neural Inform. Processing Systems (MIT Press, Cambridge, MA), 1075–1081.Google Scholar
[61] Tsitsiklis JN, Van Roy B (1999) Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives. IEEE Trans. Automatic Control 44(10):1840–1851.Crossref, Google Scholar
[62] Tsybakov AB (2004) Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32(1):135–166.Crossref, Google Scholar
[63] van der Vaart AW (2000) Asymptotic Statistics (Cambridge University Press, Cambridge, UK).Google Scholar
[64] Van Roy B (2006) Performance loss bounds for approximate value iteration with state aggregation. Math. Oper. Res. 31(2):234–244.Link, Google Scholar
[65] Wainwright MJ (2019) High-Dimensional Statistics: A Non-Asymptotic Viewpoint (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
[66] Wainwright MJ (2019) Stochastic approximation with cone-contractive operators: Sharper ℓ∞-bounds for Q-learning. Preprint, submitted May 15, https://doi.org/10.48550/arXiv.1905.06265.Google Scholar
[67] Wainwright MJ (2019) Variance-reduced Q-learning is minimax optimal. Preprint, submitted June 11, https://doi.org/10.48550/arXiv.1906.04697.Google Scholar
[68] Watkins CJCH, Dayan P (1992) Q-Learning. Machine Learn. 8(3–4):279–292.Crossref, Google Scholar
[69] Wooldridge JM (2016) Introductory Econometrics: A Modern Approach. 7th ed. (Nelson Education, Scarborough, ON, Canada).Google Scholar
[70] Yatracos YG (1985) Rates of convergence of minimum distance estimators and Kolmogorov’s entropy. Ann. Statist. 13(2):768–774.Crossref, Google Scholar
[71] Yu H, Bertsekas DP (2010) Error bounds for approximations from projected linear equations. Math. Oper. Res. 35(2):306–329.Link, Google Scholar
[72] Zhu B, Jiao J, Tse D (2020) Deconstructing generative adversarial networks. IEEE Trans. Inform. Theory 66(11):7155–7179.Crossref, Google Scholar

cover image Mathematics of Operations Research

Volume 48, Issue 4

November 2023

Pages 1811-2382, C2

Article Information

Supplemental Material

Metrics

Information

Received:February 20, 2021
Accepted:September 28, 2022
Published Online:December 28, 2022

Cite as

Wenlong Mou, Ashwin Pananjady, Martin J. Wainwright (2022) Optimal Oracle Inequalities for Projected Fixed-Point Equations, with Applications to Policy Evaluation. Mathematics of Operations Research 48(4):2308-2336.

https://doi.org/10.1287/moor.2022.1341

Keywords

Acknowledgements

The authors thank Peter Bartlett for helpful discussions.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Optimal Oracle Inequalities for Projected Fixed-Point Equations, with Applications to Policy Evaluation

References

Volume 48, Issue 4

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News