Optimal Oracle Inequalities for Projected Fixed-Point Equations, with Applications to Policy Evaluation

Published Online:https://doi.org/10.1287/moor.2022.1341

References

  • [1] Arridge S, Maass P, Öktem O, Schönlieb C-B (2019) Solving inverse problems using data-driven models. Acta Numer. 28(May):1–174.CrossrefGoogle Scholar
  • [2] Bartlett PL, Bousquet O, Mendelson S (2005) Local Rademacher complexities. Ann. Statist. 33(4):1497–1537.CrossrefGoogle Scholar
  • [3] Bellec PC (2018) Sharp oracle inequalities for least squares estimators in shape restricted regression. Ann. Statist. 46(2):745–780.CrossrefGoogle Scholar
  • [4] Benveniste A, Métivier M, Priouret P (2012) Adaptive Algorithms and Stochastic Approximations, translated by Wilson SS (Springer Science & Business Media, New York).Google Scholar
  • [5] Bertsekas DP (2011) Temporal difference methods for general projected equations. IEEE Trans. Automatic Control 56(9):2128–2139.CrossrefGoogle Scholar
  • [6] Bertsekas DP (2018) Proximal algorithms and temporal difference methods for solving fixed point problems. Comput. Optim. Appl. 70(3):709–736.Google Scholar
  • [7] Bertsekas DP (2019) Reinforcement Learning and Optimal Control (Athena Scientific, Belmont, MA).Google Scholar
  • [8] Bhandari J, Russo D, Singal R (2021) A finite time analysis of temporal difference learning with linear function approximation. Oper. Res. 69(3):950–973.Google Scholar
  • [9] Borkar VS (2009) Stochastic Approximation: A Dynamical Systems Viewpoint (Hindustan Book Agency, New Delhi, India).Google Scholar
  • [10] Bousquet O, Kane D, Moran S (2019) The optimal approximation factor in density estimation. Beygelzimer A, Hsu D, eds. Proc. Machine Learn. Res., vol. 99, Conference on Learning Theory, Phoenix, AZ (PMLR), 318–341.Google Scholar
  • [11] Boyan JA (2002) Technical update: Least-squares temporal difference learning. Machine Learn. 49(2–3):233–246.CrossrefGoogle Scholar
  • [12] Bradtke SJ, Barto AG (1996) Linear least-squares algorithms for temporal difference learning. Machine Learn. 22(1–3):33–57.CrossrefGoogle Scholar
  • [13] Brenner SC, Scott LR (2007) The Mathematical Theory of Finite Element Methods, 3rd ed. (Springer Science & Business Media, New York).Google Scholar
  • [14] Bunea F, Tsybakov AB, Wegkamp MH (2007) Aggregation for Gaussian regression. Ann. Statist. 35(4):1674–1697.CrossrefGoogle Scholar
  • [15] Bunea F, Tsybakov A, Wegkamp M (2007) Sparsity oracle inequalities for the Lasso. Electronic J. Statist. 1:169–194.CrossrefGoogle Scholar
  • [16] Céa J (1964) Approximation variationnelle des problèmes aux limites. (In French.) Ann. Inst. Fourier 14(2):345–444.CrossrefGoogle Scholar
  • [17] Chan SO, Diakonikolas I, Servedio RA, Sun X (2014) Near-optimal density estimation in near-linear time using variable-width histograms. Ghahramani Z, Welling M, Cortes C, Lawrence N, Weinberger KQ, eds. Advances in Neural Information Processing Systems, Vol. 27 (Curran Associates, Red Hook, NY), 1844–1852.Google Scholar
  • [18] Dalalyan AS, Salmon J (2012) Sharp oracle inequalities for aggregation of affine estimators. Ann. Statist. 40(4):2327–2355.CrossrefGoogle Scholar
  • [19] Dalalyan AS, Sebbar M (2018) Optimal Kullback–Leibler aggregation in mixture density estimation by maximum likelihood. Math. Statist. Learn. 1(1):1–35.CrossrefGoogle Scholar
  • [20] Darolles S, Fan Y, Florens J-P, Renault E (2011) Nonparametric instrumental regression. Econometrica 79(5):1541–1565.CrossrefGoogle Scholar
  • [21] Duchi JC, Ruan F (2021) Asymptotic optimality in stochastic optimization. Ann. Stat. 49(1):21–48.Google Scholar
  • [22] Fletcher CAJ (1984) Computational Galerkin methods. Computational Galerkin Methods (Springer, New York), 72–85.CrossrefGoogle Scholar
  • [23] Galerkin BG (1915) Series solution of some problems of elastic equilibrium of rods and plates. (In Russian.) Vestnik Inzhenerov i Tekhnikov 19(7):897–908.Google Scholar
  • [24] Giordano M, Nickl R (2020) Consistency of Bayesian inference with Gaussian process priors in an elliptic inverse problem. Inverse Problems 36(8):085001.CrossrefGoogle Scholar
  • [25] Kaltenbacher B, Kirchner A, Vexler B (2011) Adaptive discretizations for the choice of a Tikhonov regularization parameter in nonlinear inverse problems. Inverse Problems 27(12):125008.CrossrefGoogle Scholar
  • [26] Khamaru K, Pananjady A, Ruan F, Wainwright MJ, Jordan MI (2021) Is temporal difference learning optimal? An instance-dependent analysis. SIAM J. Math. Data Sci. 3(4):1013–1040.Google Scholar
  • [27] Klopp O, Tsybakov AB, Verzelen N (2017) Oracle inequalities for network models and sparse graphon estimation. Ann. Statist. 45(1):316–354.CrossrefGoogle Scholar
  • [28] Koltchinskii V (2006) Local Rademacher complexities and oracle inequalities in risk minimization. Ann. Statist. 34(6):2593–2656.Google Scholar
  • [29] Koltchinskii V (2011) Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems: Ecole d’Eté de Probabilités de Saint-Flour XXXVIII-2008 (Springer, Berlin).CrossrefGoogle Scholar
  • [30] Konda VR, Tsitsiklis JN (2000) Actor-critic algorithms. Solla S, Leen T, Müller K, eds. Advances in Neural Information Processing Systems, Vol. 12 (MIT Press, Cambridge, MA), 1008–1014.Google Scholar
  • [31] Krasnosel’skii MA, Vaĭnikko GM, Zabreĭko RP, Ruticki YB, Stet’senko VY (1972) Approximate Solution of Operator Equations, translated by Louvish D (Wolters-Noordhoff Publishing, Groningen, Netherlands).CrossrefGoogle Scholar
  • [32] Lai TL (2003) Stochastic approximation. Ann. Statist. 31(2):391–406.CrossrefGoogle Scholar
  • [33] Lakshminarayanan C, Szepesvári C (2018) Linear stochastic approximation: How far does constant step-size and iterate averaging go? Storkey A, Perez-Cruz F, eds. Internat. Conf. Artificial Intelligence Statist., Playa Blanca, Lanzarote, Canary Islands (PMLR), 1347–1355.Google Scholar
  • [34] Larsson S, Thomée V (2008) Partial Differential Equations with Numerical Methods (Springer Science & Business Media, New York).Google Scholar
  • [35] Li S, Cai T, Li H (2021) Transfer learning for high-dimensional linear regression: Prediction, estimation, and minimax optimality. J. Roy. Statist. Soc. Series B, Statist. Methodol. 84(1):149–173.Google Scholar
  • [36] Li CJ, Mou W, Wainwright MJ, Jordan MI (2022) ROOT-SGD: Sharp nonasymptotics and asymptotic efficiency in a single algorithm. Proc. 35th Conf. Learn. Theory, Proceedings of Machine Learning Research Series (PMLR, London), 909–981.Google Scholar
  • [37] Lung R, Wu Y, Kamilis D, Polydorides N (2020) A sketched finite element method for elliptic models. Comput. Methods Appl. Mech. Engrg. 364(June):112933.CrossrefGoogle Scholar
  • [38] Massart P (2007) Concentration Inequalities and Model Selection (Springer, Berlin).Google Scholar
  • [39] Massart P, Nédélec É (2006) Risk bounds for statistical learning. Ann. Statist. 34(5):2326–2366.CrossrefGoogle Scholar
  • [40] Mou W, Li CJ, Wainwright MJ, Bartlett PL, Jordan MI (2020) On linear stochastic approximation: Fine-grained Polyak-Ruppert and non-asymptotic concentration. Abernethy J, Agarwal S, eds. Proc. 33rd Conf. Learn. Theory (PMLR), 2947–2997.Google Scholar
  • [41] Moulines É, Bach FR (2011) Non-asymptotic analysis of stochastic approximation algorithms for machine learning. Shawe-Taylor J, Zemel R, Bartlett P, Pereira F, Weinberger KQ, eds. Advances in Neural Information Processing Systems, Vol. 24 (Curran Associates, Red Hook, NY), 451–459.Google Scholar
  • [42] Munos R, Szepesvári C (2008) Finite-time bounds for fitted value iteration. J. Machine Learn. Res. 9(May):815–857.Google Scholar
  • [43] Nemirovski A, Juditsky A, Lan G, Shapiro A (2009) Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4):1574–1609.CrossrefGoogle Scholar
  • [44] Nickl R (2017) On Bayesian inference for some statistical inverse problems with partial differential equations. Bernoulli News 24(2):5–9.Google Scholar
  • [45] Pananjady A, Wainwright MJ (2021) Instance-dependent ℓ∞-bounds for policy evaluation in tabular reinforcement learning. IEEE Trans. Inform. Theory 67(1):566–585.CrossrefGoogle Scholar
  • [46] Polyak BT (1990) A new method of stochastic approximation type. (In Russian.) Automatika i Telemekh 51(7):98–107.Google Scholar
  • [47] Polyak BT, Juditsky AB (1992) Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30(4):838–855.CrossrefGoogle Scholar
  • [48] Polydorides N, Wang M, Bertsekas DP (2009) Approximate solution of large-scale linear inverse problems with Monte Carlo simulation. LIDS Report 2822, Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, MA.Google Scholar
  • [49] Polydorides N, Wang M, Bertsekas DP (2012) A quasi Monte Carlo method for large-scale inverse problems. Monte Carlo and Quasi-Monte Carlo Methods 2010 (Springer,), 623–637.CrossrefGoogle Scholar
  • [50] Rakhlin A, Sridharan K, Tsybakov AB (2017) Empirical entropy, minimax regret and minimax risk. Bernoulli 23(2):789–824.CrossrefGoogle Scholar
  • [51] Rigollet P, Hütter J-C (2015) 18.S997: High dimensional statistics. Lecture notes (spring 2015), Massachusetts Institute of Technology, Cambridge, MA.Google Scholar
  • [52] Robbins H, Sutton M (1951) A stochastic approximation method. Ann. Math. Statist. 22(3):400–407.CrossrefGoogle Scholar
  • [53] Rummery GA, Niranjan M (1994) On-line Q-learning using connectionist systems. Technical Report CUED/F-INFENG/TR 166, Cambridge University Department of Engineering, University of Cambridge, Cambridge, UK.Google Scholar
  • [54] Ruppert D (1988) Efficient estimations from a slowly convergent Robbins-Monro process. Technical Report 781, School of Operations Research and Industrial Engineering, Cornell University, Ithaca, NY.Google Scholar
  • [55] Scherrer B (2010) Should one compute the temporal difference fix point or minimize the Bellman residual? The unified oblique projection view. Fürnkranz J, Joachims T, eds. Proc. 27th Internat. Conf. Internat. Conf. Machine Learn. (Omnipress, Madison, WI), 959–966.Google Scholar
  • [56] Srikant R, Ying L (2019) Finite-time error bounds for linear stochastic approximation and TD learning. 32nd Annual Conf. Learn. Theory, Phoenix, AZ (PMLR), 2803–2830.Google Scholar
  • [57] Sutton RS (1988) Learning to predict by the methods of temporal differences. Machine Learn. 3(1):9–44.CrossrefGoogle Scholar
  • [58] Sutton RS, Maei HR, Precup D, Bhatnagar S, Silver D, Szepesvári C, Wiewiora E (2009) Fast gradient-descent methods for temporal-difference learning with linear function approximation. Bottou L, Littman M, eds. Proc. 26th Annual Internat. Conf. Machine Learn. (ACM, New York), 993–1000.Google Scholar
  • [59] Szepesvári C (2010) Algorithms for Reinforcement Learning (Morgan & Claypool Publishers, San Rafael, CA).CrossrefGoogle Scholar
  • [60] Tsitsiklis JN, Van Roy B (1997) Analysis of temporal-diffference learning with function approximation. Jordan MI, Petsche T, eds. Proc. 9th Internat. Conf. Neural Inform. Processing Systems (MIT Press, Cambridge, MA), 1075–1081.Google Scholar
  • [61] Tsitsiklis JN, Van Roy B (1999) Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives. IEEE Trans. Automatic Control 44(10):1840–1851.CrossrefGoogle Scholar
  • [62] Tsybakov AB (2004) Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32(1):135–166.CrossrefGoogle Scholar
  • [63] van der Vaart AW (2000) Asymptotic Statistics (Cambridge University Press, Cambridge, UK).Google Scholar
  • [64] Van Roy B (2006) Performance loss bounds for approximate value iteration with state aggregation. Math. Oper. Res. 31(2):234–244.LinkGoogle Scholar
  • [65] Wainwright MJ (2019) High-Dimensional Statistics: A Non-Asymptotic Viewpoint (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • [66] Wainwright MJ (2019) Stochastic approximation with cone-contractive operators: Sharper ℓ∞-bounds for Q-learning. Preprint, submitted May 15, https://doi.org/10.48550/arXiv.1905.06265.Google Scholar
  • [67] Wainwright MJ (2019) Variance-reduced Q-learning is minimax optimal. Preprint, submitted June 11, https://doi.org/10.48550/arXiv.1906.04697.Google Scholar
  • [68] Watkins CJCH, Dayan P (1992) Q-Learning. Machine Learn. 8(3–4):279–292.CrossrefGoogle Scholar
  • [69] Wooldridge JM (2016) Introductory Econometrics: A Modern Approach. 7th ed. (Nelson Education, Scarborough, ON, Canada).Google Scholar
  • [70] Yatracos YG (1985) Rates of convergence of minimum distance estimators and Kolmogorov’s entropy. Ann. Statist. 13(2):768–774.CrossrefGoogle Scholar
  • [71] Yu H, Bertsekas DP (2010) Error bounds for approximations from projected linear equations. Math. Oper. Res. 35(2):306–329.LinkGoogle Scholar
  • [72] Zhu B, Jiao J, Tse D (2020) Deconstructing generative adversarial networks. IEEE Trans. Inform. Theory 66(11):7155–7179.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.