Robust Multiarmed Bandit Problems
Published Online:5 Aug 2015https://doi.org/10.1287/mnsc.2015.2153
References
- (1995) The continuum-armed bandit problem. SIAM J. Control Optim. 33(6):1926–1951.Crossref, Google Scholar
- (2002a) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2–3):235–256.Crossref, Google Scholar
- (1995) Gambling in a rigged casino: The adversarial multiarmed bandit problem. Proc. 6th Annual IEEE Sympos. Foundations Comput. Sci. (IEEE Computer Society Press, Washington, DC), 322–331.Crossref, Google Scholar
- (2002b) The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1):48–77.Crossref, Google Scholar
- (2014) Characterizing truthful multi-armed bandit mechanisms. SIAM J. Comput. 43(1):194–230.Crossref, Google Scholar
- (1998) Robust convex optimization. Math. Oper. Res. 23(4):769–805.Link, Google Scholar
- (1999) Robust solutions to uncertain programs. Oper. Res. Lett. 25(1):1–13.Crossref, Google Scholar
- (2000) Robust solutions of linear programming problems contaminated with uncertain data. Math. Programming 88(3):411–424.Crossref, Google Scholar
- (1995) Dynamic Programming and Optimal Control Volume II (Athena Scientific, Belmont, MA).Google Scholar
- (1996) Conservation laws, extended polymatroids and multiarmed bandit problems: A polyhedral approach to indexable systems. Math. Oper. Res. 21(2):257–306.Link, Google Scholar
- (2004) The price of robustness. Oper. Res. 52(1):35–53.Link, Google Scholar
- (2013) Optimal sequential exploration: Bandits, clairvoyants, and wildcats. Oper. Res. 61(3):644–665.Link, Google Scholar
- (2010) Information relaxations and duality in stochastic dynamic programs. Oper. Res. 58(4):785–801.Link, Google Scholar
- (2007) Dynamic assortment with demand learning for seasonal consumer goods. Management Sci. 53(2):276–292.Link, Google Scholar
- (2015) Robust control of the multi-armed bandit problem. Ann. Oper. Res., ePub ahead of print August 21, http://link.springer.com/article/10.1007%2Fs10479-015-1965-7.Crossref, Google Scholar
- (2006) Prediction, Learning, and Games (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- (2009) Stochastic depletion problems: Effective myopic policies for a class of dynamic optimization problems. Math. Oper. Res. 34(2):333–350.Link, Google Scholar
- (1996) Connections between stochastic control and dynamic games. Math. Control Signals Systems 9(4):303–326.Crossref, Google Scholar
- (1997) A Weak Convergence Approach to the Theory of Large Deviations (Wiley, New York).Crossref, Google Scholar
- (1979) Bootstrap methods: Another look at the jackknife. Ann. Statist. 7(1):1–235.Crossref, Google Scholar
- (1997) Robust solutions to least-square problems to uncertain data matrices. SIAM J. Matrix Anal. Appl. 18(4):1035–1064.Crossref, Google Scholar
- (2003) Recursive multiple-priors. J. Econom. Theory 113(1):1–31.Crossref, Google Scholar
- (2007) Learning under ambiguity. Rev. Econom. Stud. 74(4):1275–1303.Crossref, Google Scholar
- (1979) Bandit processes and dynamic allocation indices. J. Royal Statist. Soc. Ser. B 41(2):148–164.Google Scholar
- (2003) Minimax control of discrete-time stochastic systems. SIAM J. Control Optim. 41(5):1626–1659.Crossref, Google Scholar
- (2005) Robust estimation and control under commitment. J. Econom. Theory 124(2):258–301.Crossref, Google Scholar
- (2007) Robust estimation and control without commitment. J. Econom. Theory 136(1):1–27.Crossref, Google Scholar
- (2008) Robustness (Princeton University Press, Princeton, NJ).Crossref, Google Scholar
- (2012) Linear-quadratic control and information relaxations. Oper. Res. Lett. 40(6):521–528.Crossref, Google Scholar
- (2005) Robust dynamic programming. Math. Oper. Res. 30(2):257–280.Link, Google Scholar
- (1973) Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games. IEEE Trans. Automatic Control 18(2):124–131.Crossref, Google Scholar
- (2004) Nearly tight bounds for the continuum-armed bandit problem. Jordan MI, LeCun Y, Solla SA, eds. Advances in Neural Information Processing Systems 17 (MIT Press, Cambridge, MA), 697–704.Google Scholar
- (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.Crossref, Google Scholar
- (2013) Portfolio selection under model uncertainty: A penalized moment-based optimization approach. J. Global Optim. 56(1):131–164.Crossref, Google Scholar
- (2007) Relative entropy, exponential utility, and robust dynamic pricing. Oper. Res. 55(2):198–214.Link, Google Scholar
- (2006) Model uncertainty, robust optimization, and learning. Johnson MP, Norman B, Secomandi N, eds. 2006 TutORials Oper. Res. (INFORMS, Catonsville, MD), 66–94.Link, Google Scholar
- (2012) Robust portfolio choice with learning in the framework of regret: Single-period case. Management Sci. 58(9):1732–1746.Link, Google Scholar
- (2011). Robust asset allocation with benchmarked objectives. Math. Finance 21(4):643–679.Google Scholar
- (2009) A structured multiarmed bandit problem and the greedy policy. IEEE Trans. Automatic Control 54(12):2787–2802.Crossref, Google Scholar
- (2005) Robust control of Markov decision processes with uncertain transition matrices. Oper. Res. 53(5):780–798.Link, Google Scholar
- (2012) Towards minimum loss job routing to parallel heterogeneous multiserver queues via index policies. Eur. J. Oper. Res. 220(3):705–715.Crossref, Google Scholar
- (2007) Multi-armed bandit problems with dependent arms. Proc. 24th Internat. Conf. Machine Learn. (ACM, New York), 721–728.Crossref, Google Scholar
- (2000) Minimax optimal control of stochastic uncertain systems with relative entropy constraints. IEEE Trans. Automatic Control 45(3):398–412.Crossref, Google Scholar
- (1952) Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 55:527–535.Crossref, Google Scholar
- (2007) Pathwise stochastic optimal control. SIAM J. Control Optim. 46(3):1116–1132.Crossref, Google Scholar
- (2010) Dynamic assortment optimization with a multinomial logit choice model and capacity constraint. Oper. Res. 58(6):1666–1680.Link, Google Scholar
- (2012) The knowledge gradient algorithm for a general class of online learning problems. Oper. Res. 60(1):180–195.Link, Google Scholar
- (2010) A modern Bayesian look at the multi-armed bandit. Appl. Stochastic Models Bus. Indust. 26(6):639–658.Crossref, Google Scholar
- (2013) Multi-armed bandit experiments. Accessed August 15, 2013, https://support.google.com/analytics/answer/2844870?hl=en.Google Scholar
- (1996) Regression shrinkage and selection via the lasso. J. Royal Statist. Soc. Ser. B 58(1):267–288.Crossref, Google Scholar
- (1986) A lemma on the multi-armed bandit problem. IEEE Trans. Automatic Control 31(6):576–577.Crossref, Google Scholar
- (2013) Bandit Algorithms for Website Optimization: Developing, Deploying, Debugging (O’Reilly Media, Sebastopol, CA).Google Scholar
- (1980) Multi-armed bandits and the Gittins index. J. Royal Statist. Soc. Ser. B 42(2):143–149.Google Scholar
- (1981) Risk-sensitive linear/quadratic/Gaussian control. Adv. Appl. Probab. 13(4):764–777.Crossref, Google Scholar
- (1990) A risk-sensitive maximum principle. Systems Control Lett. 15(3):183–192.Crossref, Google Scholar
- (1991) A risk-sensitive maximum principle: The case of imperfect state observations. IEEE Trans. Automatic Control 36(7):793–801.Crossref, Google Scholar
- (2013) Robust Markov decision processes. Math. Oper. Res. 38(1):153–183.Link, Google Scholar
- (2015) Information relaxation and dual formulation of controlled Markov diffusions. IEEE Trans. Automatic Control Forthcoming.Crossref, Google Scholar

