Technical Note—Consistency Analysis of Sequential Learning Under Approximate Bayesian Inference

Published Online:https://doi.org/10.1287/opre.2019.1850

References

  • Abounadi J, Bertsekas DP, Borkar VS (2002) Stochastic approximation for nonexpansive maps: Application to Q-learning algorithms. SIAM J. Control Optim. 41(1):1–22.CrossrefGoogle Scholar
  • Anderson CK, Xie X (2012) A choice-based dynamic programming approach for setting opaque prices. Production Oper. Management 21(3):590–605.CrossrefGoogle Scholar
  • Andradóttir S (1995) A stochastic approximation algorithm with varying bounds. Oper. Res. 43(6):1037–1048.LinkGoogle Scholar
  • Asmussen S, Glynn PW (2011) A new proof of convergence of MCMC via the ergodic theorem. Statist. Probab. Lett. 81(10):1482–1485.CrossrefGoogle Scholar
  • Bertsimas D, Kallus N (2015) From predictive to prescriptive analytics. Working paper, Massachusetts Institute of Technology, Cambridge.Google Scholar
  • Borkar VS (2008) Stochastic Approximation: A Dynamical Systems Viewpoint (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Borkar VS, Meyn SP (2000) The ODE method for convergence of stochastic approximation and reinforcement learning. SIAM J. Control Optim. 38(2):447–469.CrossrefGoogle Scholar
  • Bottou L (1998) Online learning and stochastic approximations. Saad D, ed. On-Line Learning in Neural Networks (Cambridge University Press, Cambridge, UK), 9–42.Google Scholar
  • Broadie M, Cicek D, Zeevi A (2011) General bounds and finite-time improvement for the Kiefer-Wolfowitz stochastic approximation algorithm. Oper. Res. 59(5):1211–1224.LinkGoogle Scholar
  • Chakraborty M, Das S, Magdon-Ismail M (2011) Near-optimal target learning with stochastic binary signals. Cozman F, Pfeffer A, eds. Proc. 27th Conf. Uncertainty Artificial Intelligence (AUAI Press, Corvallis, OR), 69–76.Google Scholar
  • Chakraborty M, Das S, Lavoie A, Magdon-Ismail M, Naamad Y (2013) Instructor rating markets. Proc. 27th AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 159–165.Google Scholar
  • Chau M, Fu MC, Qu H, Ryzhov IO (2014) Simulation optimization: A tutorial overview and recent developments in gradient-based methods. Tolk A, Diallo SY, Ryzhov IO, Yilmaz L, Buckley S, Miller JA, eds. Proc. 2014 Winter Simulation Conf. (IEEE, Piscataway, NJ), 21–35.CrossrefGoogle Scholar
  • Chen C-H, Lee LH (2010) Stochastic Simulation Optimization: An Optimal Computing Budget Allocation (World Scientific, Singapore).CrossrefGoogle Scholar
  • Chen Y, Ryzhov IO (2016) Approximate Bayesian inference as a form of stochastic approximation: A new consistency theory with applications. Roeder TMK, Frazier PI, Szechtman R, Zhou E, Huschka T, Chick SE, eds. Proc. 2016 Winter Simulation Conf. (IEEE, Piscataway, NJ), 534–544.CrossrefGoogle Scholar
  • Chen C-H, Chick SE, Lee LH, Pujowidianto NA (2015) Ranking and selection: Efficient simulation budget allocation. Fu MC, ed. Handbook of Simulation Optimization (Springer, New York), 45–80.CrossrefGoogle Scholar
  • Chhabra M, Das S (2011) Learning the demand curve in posted-price digital goods auctions. Proc. 10th Internat. Conf. Autonomous Agents Multi-Agent Systems (International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC), 63–70.Google Scholar
  • Chick SE (2006) Subjective probability and bayesian methodology. Henderson S, Nelson B, eds. Simulation, Handbooks of Operations Research and Management Science, vol. 13 (North-Holland Publishing, Amsterdam), 225–258.Google Scholar
  • Dangauthier P, Herbrich R, Minka T, Graepel T (2007) TrueSkill through time: Revisiting the history of chess. Platt JC, Koller D, Singer Y, Roweis S, eds. Advances in Neural Information Processing Systems, vol. 20 (Curran Associates, Red Hook, NY), 337–344.Google Scholar
  • Das S, Magdon-Ismail M (2009) Adapting to a market shock: Optimal sequential market-making. Koller D, Schuurmans D, Bengio Y, Bottou L, eds. Advances in Neural Information Processing Systems, vol. 21 (Curran Associates, Red Hook, NY), 361–368.Google Scholar
  • DeGroot MH (1970) Optimal Statistical Decisions (John Wiley & Sons, Hoboken, NJ).Google Scholar
  • García-Fernández ÁF, Svensson L (2015) Gaussian MAP filtering using Kalman optimization. IEEE Trans. Automatic Control 60(5):1336–1349.CrossrefGoogle Scholar
  • Gelman AB, Carlin JB, Stern HS, Rubin DB (2004) Bayesian Data Analysis, 2nd ed. (CRC Press, Boca Raton, FL).Google Scholar
  • Gupta AK, Nagar DK (2000) Matrix Variate Distributions (CRC Press, Boca Raton, FL).Google Scholar
  • Gutin E, Farias V (2016) Optimistic Gittins indices. Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 29 (Curran Associates, Red Hook, NY), 3153–3161.Google Scholar
  • Haario H, Saksman E, Tamminen J (2001) An adaptive Metropolis algorithm. Bernoulli 7(2):223–242.CrossrefGoogle Scholar
  • Herbrich R, Minka T, Graepel T (2006) TrueSkillTM: A Bayesian skill rating system. Schölkopf B, Platt JC, Hoffman T, eds. Advances in Neural Information Processing Systems, vol. 19 (MIT Press, Cambridge, MA), 569–576.Google Scholar
  • Hoffman MD, Blei DM, Wang C, Paisley J (2013) Stochastic variational inference. J. Machine Learn. Res. 14(1):1303–1347.Google Scholar
  • Hong LJ, Nelson BL (2009) A brief introduction to optimization via simulation. Rosetti M, Hill R, Johansson B, Dunkin A, Ingalls R, eds. Proc. 2009 Winter Simulation Conf. (IEEE, Piscataway, NJ), 75–85.CrossrefGoogle Scholar
  • Jaakkola TS, Jordan MI (2000) Bayesian parameter estimation via variational methods. Statist. Comput. 10(1):25–37.CrossrefGoogle Scholar
  • Jaakkola TS, Jordan MI, Singh SP (1994) On the convergence of stochastic iterative dynamic programming algorithms. Neural Comput. 6(6):1185–1201.CrossrefGoogle Scholar
  • Jiang H, Shanbhag UV (2016) On the solution of stochastic optimization and variational problems in imperfect information regimes. SIAM J. Optim. 26(4):2394–2429.CrossrefGoogle Scholar
  • Jiang H, Xu H (2008) Stochastic approximation approaches to the stochastic variational inequality problem. IEEE Trans. Automatic Control 53(6):1462–1475.CrossrefGoogle Scholar
  • Keizers JM, Bertrand JWM, Wessels J (2003) Diagnosing order planning performance at a navy maintenance and repair organization, using logistic regression. Production Oper. Management 12(4):445–463.CrossrefGoogle Scholar
  • Koshal J, Nedić A, Shanbhag UV (2013) Regularized iterative stochastic approximation methods for stochastic variational inequality problems. IEEE Trans. Automatic Control 58(3):594–609.CrossrefGoogle Scholar
  • Kushner HJ, Yin G (2003) Stochastic Approximation and Recursive Algorithms and Applications, 2nd ed. (Springer, New York).Google Scholar
  • Lai TL (2003) Stochastic approximation. Ann. Statist. 31(2):391–406.CrossrefGoogle Scholar
  • Marin JM, Pudlo P, Robert CP, Ryder RJ (2012) Approximate Bayesian computational methods. Statist. Comput. 22(6):1167–1180.CrossrefGoogle Scholar
  • Minka TP (2001) Expectation propagation for approximate Bayesian inference. Breese J, Koller D, eds. Proc. 17th Conf. Uncertainty Artificial Intelligence (Morgan Kaufmann, San Francisco), 362–369.Google Scholar
  • Opper M (1998) A Bayesian approach to on-line learning. Saad D, ed. On-Line Learning in Neural Networks (Cambridge University Press, Cambridge, UK), 363–378.Google Scholar
  • Pasupathy R, Kim S (2011) The stochastic root-finding problem: Overview, solutions, and open questions. ACM Trans. Model. Comput. Simulation 21(3):19:1–19:23.CrossrefGoogle Scholar
  • Petruzzi NC, Dada M (1999) Pricing and the newsvendor problem: A review with extensions. Oper. Res. 47(2):183–194.LinkGoogle Scholar
  • Plagnol V, Tavaré S (2004) Approximate Bayesian computation and MCMC. Niederreiter H, ed. Monte Carlo and Quasi-Monte Carlo Methods (Springer, New York), 99–113.Google Scholar
  • Powell WB (2011) Approximate Dynamic Programming: Solving the Curses of Dimensionality, 2nd ed. (John Wiley & Sons, Hoboken, NJ).CrossrefGoogle Scholar
  • Powell WB, Ryzhov IO (2012) Optimal Learning (John Wiley & Sons, Hoboken, NJ).CrossrefGoogle Scholar
  • Powell WB, George A, Simão H, Scott WR, Lamont A, Stewart J (2012) SMART: A stochastic multiscale model for the analysis of energy resources, technology, and policy. INFORMS J. Comput. 24(4):665–682.LinkGoogle Scholar
  • Puterman ML (1994) Markov Decision Processes (John Wiley & Sons, New York).CrossrefGoogle Scholar
  • Qu H, Ryzhov IO, Fu MC, Ding Z (2015) Sequential selection with unknown correlation structures. Oper. Res. 63(4):931–948.LinkGoogle Scholar
  • Ribeiro C, Szepesvári C (1996) Q-learning combined with spreading: Convergence and results. Proc. 1996 ISRF-IEE Internat. Conf. Intelligent Cognitive Systems, Tehran, Iran, 32–36.Google Scholar
  • Robbins H, Monro S (1951) A stochastic approximation method. Ann. Math. Statist. 22(3):400–407.CrossrefGoogle Scholar
  • Russo D, Van Roy B (2014) Learning to optimize via posterior sampling. Math. Oper. Res. 39(4):1221–1243.LinkGoogle Scholar
  • Ryzhov IO (2015) Approximate Bayesian inference for simulation and optimization. Defourny B, Terlaky T, eds. Modeling and Optimization: Theory and Applications (Springer International, Cham, Switzerland), 1–28.CrossrefGoogle Scholar
  • Ryzhov IO (2016) On the convergence rates of expected improvement methods. Oper. Res. 64(6):1515–1528.LinkGoogle Scholar
  • Ryzhov IO, Powell WB (2010) Approximate dynamic programming with correlated Bayesian beliefs. Viswanath P, Meyn S, eds. Proc. 48th Annual Allerton Conf. Comm. Control Comput. (IEEE, Piscataway, NJ), 1360–1367.CrossrefGoogle Scholar
  • Simão HP, George A, Powell WB, Gifford T, Nienow J, Day J (2010) Approximate dynamic programming captures fleet operations for Schneider National. Interfaces 40(5):342–352.LinkGoogle Scholar
  • Sohn SY, Kim HS (2007) Random effects logistic regression model for default prediction of technology credit guarantee fund. Eur. J. Oper. Res. 183(1):472–478.CrossrefGoogle Scholar
  • Spiegelhalter DJ, Lauritzen SL (1990) Sequential updating of conditional probabilities on directed graphical structures. Networks 20(5):579–605.CrossrefGoogle Scholar
  • Sunnåker M, Busetto AG, Numminen E, Corander J, Foll M, Dessimoz C (2013) Approximate Bayesian computation. PLoS Comput. Biol. 9(1):e1002803.CrossrefGoogle Scholar
  • Szepesvári C (1997) The asymptotic convergence rate of Q-learning. Jordan MI, Kearns MJ, Solla SA, eds. Advances in Neural Information Processing Systems, vol. 10 (MIT Press, Cambridge, MA), 1064–1070.Google Scholar
  • Tsitsiklis JN (1994) Asynchronous stochastic approximation and Q-learning. Machine Learn. 16(3):185–202.CrossrefGoogle Scholar
  • Wang C, Blei DM (2013) Variational inference in nonconjugate models. J. Machine Learn. Res. 14(1):1005–1031.Google Scholar
  • Wang Y, Blei DM (2019) Frequentist consistency of variational Bayes. J. Amer. Statist. Assoc. Forthcoming.Google Scholar
  • Watkins CJCH, Dayan P (1992) Q-Learning. Machine Learn. 8(3–4):279–292.CrossrefGoogle Scholar
  • Wen Z, Van Roy B (2013) Efficient exploration and value function generalization in deterministic systems. Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 26 (Curran Associates, Red Hook, NY), 3021–3029.Google Scholar
  • Yousefian F, Nedić A, Shanbhag UV (2012) On stochastic gradient and subgradient methods with adaptive steplength sequences. Automatica 48(1):56–67.CrossrefGoogle Scholar
  • Zhang Q, Song Y (2015) Simulation selection for empirical model comparison. Yilmaz L, Chan WKV, Moon I, Roeder TMK, Macal C, Rossetti MD, eds. Proc. 2015 Winter Simulation Conf. (IEEE, Piscataway, NJ), 3777–3788.CrossrefGoogle Scholar
  • Zhang Q, Song Y (2017) Moment-matching-based conjugacy approximation for Bayesian ranking and selection. ACM Trans. Model. Comput. Simulation 27(4):26:1–26:23.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.