Risk-Sensitive and Risk-Neutral Multiarmed Bandits

Eric V. Denardo
Eric V. Denardo
[email protected]
Center for System Science, Yale University, P.O. Box 208267, New Haven, Connecticut 06520
Search for more papers by this author
,
Haechurl Park
Haechurl Park
[email protected]
Department of Business Administration, Chung-Ang University, 221 Heukseok-Dong, Dongjak-gu, Seoul 156-756, Korea
Search for more papers by this author
,
Uriel G. Rothblum
Uriel G. Rothblum
[email protected]
Faculty of Industrial Engineering and Management, Technion—Israel Institute of Technology, Haifa 32000, Israel
Search for more papers by this author

Eric V. Denardo

[email protected]

Center for System Science, Yale University, P.O. Box 208267, New Haven, Connecticut 06520

Search for more papers by this author

Haechurl Park

[email protected]

Department of Business Administration, Chung-Ang University, 221 Heukseok-Dong, Dongjak-gu, Seoul 156-756, Korea

Search for more papers by this author

Uriel G. Rothblum

[email protected]

Faculty of Industrial Engineering and Management, Technion—Israel Institute of Technology, Haifa 32000, Israel

Search for more papers by this author

Published Online:1 May 2007https://doi.org/10.1287/moor.1060.0240

References

Bertsekas D. M.Dynamic Programming and Optimal Control (1995) II(Athena Scientific, Belmont, MA) Google Scholar
Denardo E. V. Sequential decision processes. (1965) . Ph.D. thesis, Northwestern University, Evanston, ILGoogle Scholar
Denardo E. V. Contraction mappings in the theory underlying dynamic programming. SIAM Rev. (1967) 9:165–177Crossref, Google Scholar
Denardo E. V., Rothblum U. G. Optimal stopping, exponential utility and linear programming. Math. Programming (1979) 16:228–244Crossref, Google Scholar
Denardo E. V., Rothblum U. G. A turnpike theorem for a risk-sensitive Markov decision problem with stopping. SIAM J. Control Optim. (2006) 45:414–431Crossref, Google Scholar
Denardo E. V., Feinberg E., Rothblum U. G. On the unconstrained and constrained multi-armed bandit. (2007) . ForthcomingGoogle Scholar
Denardo E. V., Park H., Rothblum U. G. A multi-armed bandit with risk-averse exponential utility and stopping. (2004) . Unpublished manuscript, Faculty of Industrial Engineering and Management, Technion, Haifa, IsraelGoogle Scholar
Denardo E. V., Rothblum U. G., Van der Heyden L. Index policies for stochastic search in a forest with an application to R&D project management. Math. Oper. Res. (2004) 29:162–181Link, Google Scholar
Dumitriu I., Tetali P., Winkler P. On playing golf with two balls. SIAM J. Discrete Math. (2003) 16:604–615Crossref, Google Scholar
El Karoui N., Karatzas I. Dynamic allocation indices in continuous time. Ann. Appl. Probab. (1994) 4:255–286Crossref, Google Scholar
Gittins J. C.Multi-Armed Bandit Allocation Indices (1989) (John Wiley & Sons, New York) Google Scholar
Gittins J. C., Jones D. M., Gani J., Sarkadu K., Vince I. A dynamic allocation index for the sequential design experiments. Progress in Statistics. Eur. Meeting of Statisticians I (1974) (North Holland, Amsterdam, The Netherlands) 241–266Google Scholar
Glazebrook K. D. Indices for families of competing Markov decision processes with influence. Ann. Appl. Probab. (1993) 3:1013–1032Crossref, Google Scholar
Kallenberg L. C. M. A note on M. N. Katehakis and Y. R. Chen’s computation of the Gittins index. Math. Oper. Res. (1986) 11:184–186Link, Google Scholar
Kaspi H., Mandelbaum A. Multi-armed bandits in discrete and continuous time. Ann. Appl. Probab. (1998) 8:1270–1290Crossref, Google Scholar
Katehakis M., Rothblum U. G. Finite state multi-armed bandit problems: Sensitive-discount-optimality, average-reward-optimality and average-overtaking-optimality. Ann. Appl. Probab. (1996) 6:1024–1034Crossref, Google Scholar
Katehakis M. N., Veinott A. F. The multi-armed bandit problem: Decomposition and computation. Math. Oper. Res. (1987) 22:262–268Link, Google Scholar
Katta A. K., Sethuraman J. A note on bandits with a twist. SIAM J. Discrete Math. (2004) 18:110–113Crossref, Google Scholar
Nash P. A generalized bandit problem. J. Roy. Statist. Soc. B (1980) 42:165–169Google Scholar
Pinedo M.Scheduling: Theory, Algorithms and Systems (2002) 2nd ed.(Prentice Hall, Englewood Cliffs, NJ) Google Scholar
Presman E. L., Sonin I. M.Sequential Control with Incomplete Information: The Bayesian Approach (1984) (Nauka, Moscow, USSR) Google Scholar
Ross S.Introduction to Stochastic Dynamic Programming (1983) (Academic Press, San Diego, CA) Google Scholar
Schlag K. Why imitate, and if so, how? A bounded rational approach to multi-armed bandits. J. Econom. Theory (1998) 78:130–156Crossref, Google Scholar
Sonin I. A generalized Gittins index for Markov chain and its recursive calculation. (2005) . Technical report, Dept. Mathematics, University of North Carolina at Charlotte, Charlotte, NCGoogle Scholar
Tsisiklis J. A short proof of the Gittins index theorem. Ann. Appl. Probab. (1994) 4:194–199Crossref, Google Scholar
Varaiya P., Walrand J., Buyukkoc C. Extensions of the multi-armed bandit problem: The discounted case. IEEE Trans. Automatic Control (1985) AC-30:426–439Crossref, Google Scholar
Veinott A. F. Discrete dynamic programming with sensitive discount optimality criteria. Ann. Math. Statist. (1969) 40:1635–1660Crossref, Google Scholar
Weber R. On the Gittins index for multiarmed bandits. Ann. Appl. Probab. (1992) 2:1024–1033Crossref, Google Scholar
Weiss G. Branching bandit processes. Probab. Engrg. Inform. Sci. (1988) 2:269–278Crossref, Google Scholar
Whittle P. Multi-armed bandits and the Gittins index. J. Roy. Statist. Soc. B (1980) 43:143–149Google Scholar
Whittle P.Optimization over Time (1982) 1(John Wiley, New York) Google Scholar
Whittle P. Arm-acquiring bandits. Ann. Probab. (1982) 9:284–292Crossref, Google Scholar

cover image Mathematics of Operations Research

Volume 32, Issue 2

May 2007

Pages 257-496

Article Information

Metrics

Information

Received:October 14, 2004
Published Online:May 01, 2007

Cite as

Eric V. Denardo, Haechurl Park, Uriel G. Rothblum, (2007) Risk-Sensitive and Risk-Neutral Multiarmed Bandits. Mathematics of Operations Research 32(2):374-394.

https://doi.org/10.1287/moor.1060.0240

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Risk-Sensitive and Risk-Neutral Multiarmed Bandits

References

Volume 32, Issue 2

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News