Risk-Sensitive and Risk-Neutral Multiarmed Bandits

Published Online:https://doi.org/10.1287/moor.1060.0240

References

  • Bertsekas D. M.Dynamic Programming and Optimal Control (1995) II(Athena Scientific, Belmont, MA) Google Scholar
  • Denardo E. V. Sequential decision processes. (1965) . Ph.D. thesis, Northwestern University, Evanston, ILGoogle Scholar
  • Denardo E. V. Contraction mappings in the theory underlying dynamic programming. SIAM Rev. (1967) 9:165–177CrossrefGoogle Scholar
  • Denardo E. V., Rothblum U. G. Optimal stopping, exponential utility and linear programming. Math. Programming (1979) 16:228–244CrossrefGoogle Scholar
  • Denardo E. V., Rothblum U. G. A turnpike theorem for a risk-sensitive Markov decision problem with stopping. SIAM J. Control Optim. (2006) 45:414–431CrossrefGoogle Scholar
  • Denardo E. V., Feinberg E., Rothblum U. G. On the unconstrained and constrained multi-armed bandit. (2007) . ForthcomingGoogle Scholar
  • Denardo E. V., Park H., Rothblum U. G. A multi-armed bandit with risk-averse exponential utility and stopping. (2004) . Unpublished manuscript, Faculty of Industrial Engineering and Management, Technion, Haifa, IsraelGoogle Scholar
  • Denardo E. V., Rothblum U. G., Van der Heyden L. Index policies for stochastic search in a forest with an application to R&D project management. Math. Oper. Res. (2004) 29:162–181LinkGoogle Scholar
  • Dumitriu I., Tetali P., Winkler P. On playing golf with two balls. SIAM J. Discrete Math. (2003) 16:604–615CrossrefGoogle Scholar
  • El Karoui N., Karatzas I. Dynamic allocation indices in continuous time. Ann. Appl. Probab. (1994) 4:255–286CrossrefGoogle Scholar
  • Gittins J. C.Multi-Armed Bandit Allocation Indices (1989) (John Wiley & Sons, New York) Google Scholar
  • Gittins J. C., Jones D. M., Gani J., Sarkadu K., Vince I. A dynamic allocation index for the sequential design experiments. Progress in Statistics. Eur. Meeting of Statisticians I (1974) (North Holland, Amsterdam, The Netherlands) 241–266Google Scholar
  • Glazebrook K. D. Indices for families of competing Markov decision processes with influence. Ann. Appl. Probab. (1993) 3:1013–1032CrossrefGoogle Scholar
  • Kallenberg L. C. M. A note on M. N. Katehakis and Y. R. Chen’s computation of the Gittins index. Math. Oper. Res. (1986) 11:184–186LinkGoogle Scholar
  • Kaspi H., Mandelbaum A. Multi-armed bandits in discrete and continuous time. Ann. Appl. Probab. (1998) 8:1270–1290CrossrefGoogle Scholar
  • Katehakis M., Rothblum U. G. Finite state multi-armed bandit problems: Sensitive-discount-optimality, average-reward-optimality and average-overtaking-optimality. Ann. Appl. Probab. (1996) 6:1024–1034CrossrefGoogle Scholar
  • Katehakis M. N., Veinott A. F. The multi-armed bandit problem: Decomposition and computation. Math. Oper. Res. (1987) 22:262–268LinkGoogle Scholar
  • Katta A. K., Sethuraman J. A note on bandits with a twist. SIAM J. Discrete Math. (2004) 18:110–113CrossrefGoogle Scholar
  • Nash P. A generalized bandit problem. J. Roy. Statist. Soc. B (1980) 42:165–169Google Scholar
  • Pinedo M.Scheduling: Theory, Algorithms and Systems (2002) 2nd ed.(Prentice Hall, Englewood Cliffs, NJ) Google Scholar
  • Presman E. L., Sonin I. M.Sequential Control with Incomplete Information: The Bayesian Approach (1984) (Nauka, Moscow, USSR) Google Scholar
  • Ross S.Introduction to Stochastic Dynamic Programming (1983) (Academic Press, San Diego, CA) Google Scholar
  • Schlag K. Why imitate, and if so, how? A bounded rational approach to multi-armed bandits. J. Econom. Theory (1998) 78:130–156CrossrefGoogle Scholar
  • Sonin I. A generalized Gittins index for Markov chain and its recursive calculation. (2005) . Technical report, Dept. Mathematics, University of North Carolina at Charlotte, Charlotte, NCGoogle Scholar
  • Tsisiklis J. A short proof of the Gittins index theorem. Ann. Appl. Probab. (1994) 4:194–199CrossrefGoogle Scholar
  • Varaiya P., Walrand J., Buyukkoc C. Extensions of the multi-armed bandit problem: The discounted case. IEEE Trans. Automatic Control (1985) AC-30:426–439CrossrefGoogle Scholar
  • Veinott A. F. Discrete dynamic programming with sensitive discount optimality criteria. Ann. Math. Statist. (1969) 40:1635–1660CrossrefGoogle Scholar
  • Weber R. On the Gittins index for multiarmed bandits. Ann. Appl. Probab. (1992) 2:1024–1033CrossrefGoogle Scholar
  • Weiss G. Branching bandit processes. Probab. Engrg. Inform. Sci. (1988) 2:269–278CrossrefGoogle Scholar
  • Whittle P. Multi-armed bandits and the Gittins index. J. Roy. Statist. Soc. B (1980) 43:143–149Google Scholar
  • Whittle P.Optimization over Time (1982) 1(John Wiley, New York) Google Scholar
  • Whittle P. Arm-acquiring bandits. Ann. Probab. (1982) 9:284–292CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.