Robust Multiarmed Bandit Problems

Michael Jong Kim
Michael Jong Kim
[email protected]
Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, Ontario M5S 3G8, Canada
Search for more papers by this author
,
Andrew E.B. Lim
Andrew E.B. Lim
[email protected]
Department of Decision Sciences and Department of Finance, NUS Business School, National University of Singapore, Singapore 119245
Search for more papers by this author

Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, Ontario M5S 3G8, Canada

Department of Decision Sciences and Department of Finance, NUS Business School, National University of Singapore, Singapore 119245

Search for more papers by this author

Published Online:5 Aug 2015https://doi.org/10.1287/mnsc.2015.2153

References

Agrawal R (1995) The continuum-armed bandit problem. SIAM J. Control Optim. 33(6):1926–1951.Crossref, Google Scholar
Auer P, Cesa-Bianchi N, Fischer P (2002a) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2–3):235–256.Crossref, Google Scholar
Auer P, Cesa-Bianchi N, Freund Y, Schapire RE (1995) Gambling in a rigged casino: The adversarial multiarmed bandit problem. Proc. 6th Annual IEEE Sympos. Foundations Comput. Sci. (IEEE Computer Society Press, Washington, DC), 322–331.Crossref, Google Scholar
Auer P, Cesa-Bianchi N, Freund Y, Schapire RE (2002b) The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1):48–77.Crossref, Google Scholar
Babaioff M, Sharma Y, Slivkins A (2014) Characterizing truthful multi-armed bandit mechanisms. SIAM J. Comput. 43(1):194–230.Crossref, Google Scholar
Ben-Tal A, Nemirovski A (1998) Robust convex optimization. Math. Oper. Res. 23(4):769–805.Link, Google Scholar
Ben-Tal A, Nemirovski A (1999) Robust solutions to uncertain programs. Oper. Res. Lett. 25(1):1–13.Crossref, Google Scholar
Ben-Tal A, Nemirovski A (2000) Robust solutions of linear programming problems contaminated with uncertain data. Math. Programming 88(3):411–424.Crossref, Google Scholar
Bertsekas DP (1995) Dynamic Programming and Optimal Control Volume II (Athena Scientific, Belmont, MA).Google Scholar
Bertsimas D, Nino-Mora J (1996) Conservation laws, extended polymatroids and multiarmed bandit problems: A polyhedral approach to indexable systems. Math. Oper. Res. 21(2):257–306.Link, Google Scholar
Bertsimas D, Sim M (2004) The price of robustness. Oper. Res. 52(1):35–53.Link, Google Scholar
Brown DB, Smith JE (2013) Optimal sequential exploration: Bandits, clairvoyants, and wildcats. Oper. Res. 61(3):644–665.Link, Google Scholar
Brown DB, Smith JE, Sun P (2010) Information relaxations and duality in stochastic dynamic programs. Oper. Res. 58(4):785–801.Link, Google Scholar
Caro F, Gallien J (2007) Dynamic assortment with demand learning for seasonal consumer goods. Management Sci. 53(2):276–292.Link, Google Scholar
Caro F, Gupta AD (2015) Robust control of the multi-armed bandit problem. Ann. Oper. Res., ePub ahead of print August 21, http://link.springer.com/article/10.1007%2Fs10479-015-1965-7.Crossref, Google Scholar
Cesa-Bianchi N, Lugosi G (2006) Prediction, Learning, and Games (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Chan CW, Farias VF (2009) Stochastic depletion problems: Effective myopic policies for a class of dynamic optimization problems. Math. Oper. Res. 34(2):333–350.Link, Google Scholar
Dai Pra P, Meneghini L, Runggaldier WJ (1996) Connections between stochastic control and dynamic games. Math. Control Signals Systems 9(4):303–326.Crossref, Google Scholar
Dupuis P, Ellis RS (1997) A Weak Convergence Approach to the Theory of Large Deviations (Wiley, New York).Crossref, Google Scholar
Efron B (1979) Bootstrap methods: Another look at the jackknife. Ann. Statist. 7(1):1–235.Crossref, Google Scholar
El Ghaoui L, Lebret H (1997) Robust solutions to least-square problems to uncertain data matrices. SIAM J. Matrix Anal. Appl. 18(4):1035–1064.Crossref, Google Scholar
Epstein LG, Schneider M (2003) Recursive multiple-priors. J. Econom. Theory 113(1):1–31.Crossref, Google Scholar
Epstein LG, Schneider M (2007) Learning under ambiguity. Rev. Econom. Stud. 74(4):1275–1303.Crossref, Google Scholar
Gittins JC (1979) Bandit processes and dynamic allocation indices. J. Royal Statist. Soc. Ser. B 41(2):148–164.Google Scholar
González-Trejo JI, Hernández-Lerma O, Hoyos-Reyes LF (2003) Minimax control of discrete-time stochastic systems. SIAM J. Control Optim. 41(5):1626–1659.Crossref, Google Scholar
Hansen LP, Sargent TJ (2005) Robust estimation and control under commitment. J. Econom. Theory 124(2):258–301.Crossref, Google Scholar
Hansen LP, Sargent TJ (2007) Robust estimation and control without commitment. J. Econom. Theory 136(1):1–27.Crossref, Google Scholar
Hansen LP, Sargent TJ (2008) Robustness (Princeton University Press, Princeton, NJ).Crossref, Google Scholar
Haugh MB, Lim AEB (2012) Linear-quadratic control and information relaxations. Oper. Res. Lett. 40(6):521–528.Crossref, Google Scholar
Iyengar GN (2005) Robust dynamic programming. Math. Oper. Res. 30(2):257–280.Link, Google Scholar
Jacobson DH (1973) Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games. IEEE Trans. Automatic Control 18(2):124–131.Crossref, Google Scholar
Kleinberg R (2004) Nearly tight bounds for the continuum-armed bandit problem. Jordan MI, LeCun Y, Solla SA, eds. Advances in Neural Information Processing Systems 17 (MIT Press, Cambridge, MA), 697–704.Google Scholar
Lai TL, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.Crossref, Google Scholar
Li JY, Kwon RH (2013) Portfolio selection under model uncertainty: A penalized moment-based optimization approach. J. Global Optim. 56(1):131–164.Crossref, Google Scholar
Lim AEB, Shanthikumar JG (2007) Relative entropy, exponential utility, and robust dynamic pricing. Oper. Res. 55(2):198–214.Link, Google Scholar
Lim AEB, Shanthikumar JG, Shen ZJ Max (2006) Model uncertainty, robust optimization, and learning. Johnson MP, Norman B, Secomandi N, eds. 2006 TutORials Oper. Res. (INFORMS, Catonsville, MD), 66–94.Link, Google Scholar
Lim AEB, Shanthikumar JG, Vahn G-Y (2012) Robust portfolio choice with learning in the framework of regret: Single-period case. Management Sci. 58(9):1732–1746.Link, Google Scholar
Lim AEB, Shanthikumar JG, Watewai T (2011). Robust asset allocation with benchmarked objectives. Math. Finance 21(4):643–679.Google Scholar
Mersereau AJ, Rusmevichientong P, Tsitsiklis JN (2009) A structured multiarmed bandit problem and the greedy policy. IEEE Trans. Automatic Control 54(12):2787–2802.Crossref, Google Scholar
Nilim A, El Ghaoui L (2005) Robust control of Markov decision processes with uncertain transition matrices. Oper. Res. 53(5):780–798.Link, Google Scholar
Niño-Mora J (2012) Towards minimum loss job routing to parallel heterogeneous multiserver queues via index policies. Eur. J. Oper. Res. 220(3):705–715.Crossref, Google Scholar
Pandey S, Chakrabarti D, Agarwal D (2007) Multi-armed bandit problems with dependent arms. Proc. 24th Internat. Conf. Machine Learn. (ACM, New York), 721–728.Crossref, Google Scholar
Petersen IR, James MR, Dupuis P (2000) Minimax optimal control of stochastic uncertain systems with relative entropy constraints. IEEE Trans. Automatic Control 45(3):398–412.Crossref, Google Scholar
Robbins H (1952) Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 55:527–535.Crossref, Google Scholar
Rogers LCG (2007) Pathwise stochastic optimal control. SIAM J. Control Optim. 46(3):1116–1132.Crossref, Google Scholar
Rusmevichientong P, Shen ZJM, Shmoys DB (2010) Dynamic assortment optimization with a multinomial logit choice model and capacity constraint. Oper. Res. 58(6):1666–1680.Link, Google Scholar
Ryzhov IO, Powell WB, Frazier PI (2012) The knowledge gradient algorithm for a general class of online learning problems. Oper. Res. 60(1):180–195.Link, Google Scholar
Scott SL (2010) A modern Bayesian look at the multi-armed bandit. Appl. Stochastic Models Bus. Indust. 26(6):639–658.Crossref, Google Scholar
Scott SL (2013) Multi-armed bandit experiments. Accessed August 15, 2013, https://support.google.com/analytics/answer/2844870?hl=en.Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J. Royal Statist. Soc. Ser. B 58(1):267–288.Crossref, Google Scholar
Tsitsiklis JN (1986) A lemma on the multi-armed bandit problem. IEEE Trans. Automatic Control 31(6):576–577.Crossref, Google Scholar
White JM (2013) Bandit Algorithms for Website Optimization: Developing, Deploying, Debugging (O’Reilly Media, Sebastopol, CA).Google Scholar
Whittle P (1980) Multi-armed bandits and the Gittins index. J. Royal Statist. Soc. Ser. B 42(2):143–149.Google Scholar
Whittle P (1981) Risk-sensitive linear/quadratic/Gaussian control. Adv. Appl. Probab. 13(4):764–777.Crossref, Google Scholar
Whittle P (1990) A risk-sensitive maximum principle. Systems Control Lett. 15(3):183–192.Crossref, Google Scholar
Whittle P (1991) A risk-sensitive maximum principle: The case of imperfect state observations. IEEE Trans. Automatic Control 36(7):793–801.Crossref, Google Scholar
Wiesemann W, Kuhn D, Rustem B (2013) Robust Markov decision processes. Math. Oper. Res. 38(1):153–183.Link, Google Scholar
Ye F, Zhou E (2015) Information relaxation and dual formulation of controlled Markov diffusions. IEEE Trans. Automatic Control Forthcoming.Crossref, Google Scholar

Volume 62, Issue 1

January 2016

Pages i-vii, 1-301

Article Information

Metrics

Information

Received:February 28, 2013
Accepted:November 30, 2014
Published Online:August 05, 2015

Cite as

Michael Jong Kim, Andrew E.B. Lim (2015) Robust Multiarmed Bandit Problems. Management Science 62(1):264-285.

https://doi.org/10.1287/mnsc.2015.2153

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Robust Multiarmed Bandit Problems

References

Volume 62, Issue 1

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News