Robust Multiarmed Bandit Problems

Published Online:https://doi.org/10.1287/mnsc.2015.2153

References

  • Agrawal R (1995) The continuum-armed bandit problem. SIAM J. Control Optim. 33(6):1926–1951.CrossrefGoogle Scholar
  • Auer P, Cesa-Bianchi N, Fischer P (2002a) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2–3):235–256.CrossrefGoogle Scholar
  • Auer P, Cesa-Bianchi N, Freund Y, Schapire RE (1995) Gambling in a rigged casino: The adversarial multiarmed bandit problem. Proc. 6th Annual IEEE Sympos. Foundations Comput. Sci. (IEEE Computer Society Press, Washington, DC), 322–331.CrossrefGoogle Scholar
  • Auer P, Cesa-Bianchi N, Freund Y, Schapire RE (2002b) The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1):48–77.CrossrefGoogle Scholar
  • Babaioff M, Sharma Y, Slivkins A (2014) Characterizing truthful multi-armed bandit mechanisms. SIAM J. Comput. 43(1):194–230.CrossrefGoogle Scholar
  • Ben-Tal A, Nemirovski A (1998) Robust convex optimization. Math. Oper. Res. 23(4):769–805.LinkGoogle Scholar
  • Ben-Tal A, Nemirovski A (1999) Robust solutions to uncertain programs. Oper. Res. Lett. 25(1):1–13.CrossrefGoogle Scholar
  • Ben-Tal A, Nemirovski A (2000) Robust solutions of linear programming problems contaminated with uncertain data. Math. Programming 88(3):411–424.CrossrefGoogle Scholar
  • Bertsekas DP (1995) Dynamic Programming and Optimal Control Volume II (Athena Scientific, Belmont, MA).Google Scholar
  • Bertsimas D, Nino-Mora J (1996) Conservation laws, extended polymatroids and multiarmed bandit problems: A polyhedral approach to indexable systems. Math. Oper. Res. 21(2):257–306.LinkGoogle Scholar
  • Bertsimas D, Sim M (2004) The price of robustness. Oper. Res. 52(1):35–53.LinkGoogle Scholar
  • Brown DB, Smith JE (2013) Optimal sequential exploration: Bandits, clairvoyants, and wildcats. Oper. Res. 61(3):644–665.LinkGoogle Scholar
  • Brown DB, Smith JE, Sun P (2010) Information relaxations and duality in stochastic dynamic programs. Oper. Res. 58(4):785–801.LinkGoogle Scholar
  • Caro F, Gallien J (2007) Dynamic assortment with demand learning for seasonal consumer goods. Management Sci. 53(2):276–292.LinkGoogle Scholar
  • Caro F, Gupta AD (2015) Robust control of the multi-armed bandit problem. Ann. Oper. Res., ePub ahead of print August 21, http://link.springer.com/article/10.1007%2Fs10479-015-1965-7.CrossrefGoogle Scholar
  • Cesa-Bianchi N, Lugosi G (2006) Prediction, Learning, and Games (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Chan CW, Farias VF (2009) Stochastic depletion problems: Effective myopic policies for a class of dynamic optimization problems. Math. Oper. Res. 34(2):333–350.LinkGoogle Scholar
  • Dai Pra P, Meneghini L, Runggaldier WJ (1996) Connections between stochastic control and dynamic games. Math. Control Signals Systems 9(4):303–326.CrossrefGoogle Scholar
  • Dupuis P, Ellis RS (1997) A Weak Convergence Approach to the Theory of Large Deviations (Wiley, New York).CrossrefGoogle Scholar
  • Efron B (1979) Bootstrap methods: Another look at the jackknife. Ann. Statist. 7(1):1–235.CrossrefGoogle Scholar
  • El Ghaoui L, Lebret H (1997) Robust solutions to least-square problems to uncertain data matrices. SIAM J. Matrix Anal. Appl. 18(4):1035–1064.CrossrefGoogle Scholar
  • Epstein LG, Schneider M (2003) Recursive multiple-priors. J. Econom. Theory 113(1):1–31.CrossrefGoogle Scholar
  • Epstein LG, Schneider M (2007) Learning under ambiguity. Rev. Econom. Stud. 74(4):1275–1303.CrossrefGoogle Scholar
  • Gittins JC (1979) Bandit processes and dynamic allocation indices. J. Royal Statist. Soc. Ser. B 41(2):148–164.Google Scholar
  • González-Trejo JI, Hernández-Lerma O, Hoyos-Reyes LF (2003) Minimax control of discrete-time stochastic systems. SIAM J. Control Optim. 41(5):1626–1659.CrossrefGoogle Scholar
  • Hansen LP, Sargent TJ (2005) Robust estimation and control under commitment. J. Econom. Theory 124(2):258–301.CrossrefGoogle Scholar
  • Hansen LP, Sargent TJ (2007) Robust estimation and control without commitment. J. Econom. Theory 136(1):1–27.CrossrefGoogle Scholar
  • Hansen LP, Sargent TJ (2008) Robustness (Princeton University Press, Princeton, NJ).CrossrefGoogle Scholar
  • Haugh MB, Lim AEB (2012) Linear-quadratic control and information relaxations. Oper. Res. Lett. 40(6):521–528.CrossrefGoogle Scholar
  • Iyengar GN (2005) Robust dynamic programming. Math. Oper. Res. 30(2):257–280.LinkGoogle Scholar
  • Jacobson DH (1973) Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games. IEEE Trans. Automatic Control 18(2):124–131.CrossrefGoogle Scholar
  • Kleinberg R (2004) Nearly tight bounds for the continuum-armed bandit problem. Jordan MI, LeCun Y, Solla SA, eds. Advances in Neural Information Processing Systems 17 (MIT Press, Cambridge, MA), 697–704.Google Scholar
  • Lai TL, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.CrossrefGoogle Scholar
  • Li JY, Kwon RH (2013) Portfolio selection under model uncertainty: A penalized moment-based optimization approach. J. Global Optim. 56(1):131–164.CrossrefGoogle Scholar
  • Lim AEB, Shanthikumar JG (2007) Relative entropy, exponential utility, and robust dynamic pricing. Oper. Res. 55(2):198–214.LinkGoogle Scholar
  • Lim AEB, Shanthikumar JG, Shen ZJ Max (2006) Model uncertainty, robust optimization, and learning. Johnson MP, Norman B, Secomandi N, eds. 2006 TutORials Oper. Res. (INFORMS, Catonsville, MD), 66–94.LinkGoogle Scholar
  • Lim AEB, Shanthikumar JG, Vahn G-Y (2012) Robust portfolio choice with learning in the framework of regret: Single-period case. Management Sci. 58(9):1732–1746.LinkGoogle Scholar
  • Lim AEB, Shanthikumar JG, Watewai T (2011). Robust asset allocation with benchmarked objectives. Math. Finance 21(4):643–679.Google Scholar
  • Mersereau AJ, Rusmevichientong P, Tsitsiklis JN (2009) A structured multiarmed bandit problem and the greedy policy. IEEE Trans. Automatic Control 54(12):2787–2802.CrossrefGoogle Scholar
  • Nilim A, El Ghaoui L (2005) Robust control of Markov decision processes with uncertain transition matrices. Oper. Res. 53(5):780–798.LinkGoogle Scholar
  • Niño-Mora J (2012) Towards minimum loss job routing to parallel heterogeneous multiserver queues via index policies. Eur. J. Oper. Res. 220(3):705–715.CrossrefGoogle Scholar
  • Pandey S, Chakrabarti D, Agarwal D (2007) Multi-armed bandit problems with dependent arms. Proc. 24th Internat. Conf. Machine Learn. (ACM, New York), 721–728.CrossrefGoogle Scholar
  • Petersen IR, James MR, Dupuis P (2000) Minimax optimal control of stochastic uncertain systems with relative entropy constraints. IEEE Trans. Automatic Control 45(3):398–412.CrossrefGoogle Scholar
  • Robbins H (1952) Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 55:527–535.CrossrefGoogle Scholar
  • Rogers LCG (2007) Pathwise stochastic optimal control. SIAM J. Control Optim. 46(3):1116–1132.CrossrefGoogle Scholar
  • Rusmevichientong P, Shen ZJM, Shmoys DB (2010) Dynamic assortment optimization with a multinomial logit choice model and capacity constraint. Oper. Res. 58(6):1666–1680.LinkGoogle Scholar
  • Ryzhov IO, Powell WB, Frazier PI (2012) The knowledge gradient algorithm for a general class of online learning problems. Oper. Res. 60(1):180–195.LinkGoogle Scholar
  • Scott SL (2010) A modern Bayesian look at the multi-armed bandit. Appl. Stochastic Models Bus. Indust. 26(6):639–658.CrossrefGoogle Scholar
  • Scott SL (2013) Multi-armed bandit experiments. Accessed August 15, 2013, https://support.google.com/analytics/answer/2844870?hl=en.Google Scholar
  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J. Royal Statist. Soc. Ser. B 58(1):267–288.CrossrefGoogle Scholar
  • Tsitsiklis JN (1986) A lemma on the multi-armed bandit problem. IEEE Trans. Automatic Control 31(6):576–577.CrossrefGoogle Scholar
  • White JM (2013) Bandit Algorithms for Website Optimization: Developing, Deploying, Debugging (O’Reilly Media, Sebastopol, CA).Google Scholar
  • Whittle P (1980) Multi-armed bandits and the Gittins index. J. Royal Statist. Soc. Ser. B 42(2):143–149.Google Scholar
  • Whittle P (1981) Risk-sensitive linear/quadratic/Gaussian control. Adv. Appl. Probab. 13(4):764–777.CrossrefGoogle Scholar
  • Whittle P (1990) A risk-sensitive maximum principle. Systems Control Lett. 15(3):183–192.CrossrefGoogle Scholar
  • Whittle P (1991) A risk-sensitive maximum principle: The case of imperfect state observations. IEEE Trans. Automatic Control 36(7):793–801.CrossrefGoogle Scholar
  • Wiesemann W, Kuhn D, Rustem B (2013) Robust Markov decision processes. Math. Oper. Res. 38(1):153–183.LinkGoogle Scholar
  • Ye F, Zhou E (2015) Information relaxation and dual formulation of controlled Markov diffusions. IEEE Trans. Automatic Control Forthcoming.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.