Two-Armed Restless Bandits with Imperfect Information: Stochastic Control and Indexability

Roland Fryer
Roland Fryer
[email protected]
http://orcid.org/0000-0002-4512-423X
Harvard University, Cambridge, Massachusetts 02138 and National Bureau of Economic Research (NBER), Cambridge, Massachusetts
Search for more papers by this author
,
Philipp Harms
Corresponding Author
Philipp Harms
[email protected]
http://orcid.org/0000-0003-4792-1424
Freiburg University, 79085 Freiburg, Germany
Search for more papers by this author

Harvard University, Cambridge, Massachusetts 02138 and National Bureau of Economic Research (NBER), Cambridge, Massachusetts

Search for more papers by this author

Philipp Harms

Corresponding Author

Philipp Harms

[email protected]

http://orcid.org/0000-0003-4792-1424

Freiburg University, 79085 Freiburg, Germany

Search for more papers by this author

Published Online:5 Sep 2017https://doi.org/10.1287/moor.2017.0863

References

Acemoglu D, Dahleh M, Lobel I, Ozdaglar A (2011) Bayesian learning in social networks. Rev. Econom. Stud. 78(4):1201.Crossref, Google Scholar
Auer P, Cesa-Bianchi N, Freund Y, Schapire RE (2002/03) The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1):48–77 (electronic).Crossref, Google Scholar
Banks JS, Sundaram RK (1992) Denumerable-armed bandits. Econometrica 60(5):1071–1096.Crossref, Google Scholar
Banks JS, Sundaram RK (1994) Switching costs and the gittins index. Econometrica 62(3):687–694.Crossref, Google Scholar
Basu A, Bose A, Ghosh JK (1990) An expository review of sequential design and allocation rules. Technical report, Department of Statistics, Purdue University, West Lafayette, IN.Google Scholar
Bergemann D, Välimäki J (2006) Bandit problems. Cowles Foundation Discussion Papers 1551, Cowles Foundation for Research in Economics, Yale University, New Haven, CT.Google Scholar
Berry DA, Fristedt B (1985) Bandit Problems: Sequential Allocation of Experiments, Monographs on Statistics and Applied Probability (Chapman & Hall, London).Crossref, Google Scholar
Bolton P, Harris C (1999) Strategic experimentation. Econometrica 67(2):349–374.Crossref, Google Scholar
Bolton P, Harris C (2000) Strategic experimentation: The undiscounted case. Hammond PJ, Myles GD, eds. Incentives, Organization and Public Economics. Papers in Honour of Sir James Mirrlees (Oxford University Press, New York), 53–68.Google Scholar
Ceci C, Gerardi A, Tardelli P (2002) Existence of optimal controls for partially observed jump processes. Acta Appl. Math. 74(2):155–175.Crossref, Google Scholar
Cohen A, Solan E (2013) Bandit problems with Levy payoff processes. Math. Oper. Res. 38(1):92–107.Link, Google Scholar
Cunha F, Heckman JJ, Lochner L, Masterov DV (2006) Interpreting the evidence on life cycle skill formation. Hanushek EA, Welch F, eds. Handbook of the Economics of Education Volume 1 (North-Holland, Amsterdam).Google Scholar
Delzeith O (2004) On Skorohod spaces as universal sample path spaces. Preprint arXiv: math/0412092.Google Scholar
Dobbie W, Fryer R (2011) Getting beneath the veil of effective schools: Evidence from New York City. Working paper 17632, National Bureau of Economic Research, Cambridge, MA.Google Scholar
El Karoui N, Karatzas I (1994) Dynamic allocation problems in continuous time. Ann. Appl. Probab. 4(2):255–286.Crossref, Google Scholar
El Karoui N, Nguyen DH, Jeanblanc-Picqué M (1988) Existence of an optimal Markovian filter for the control under partial observations. SIAM J. Control Optim. 26(5):1025–1061.Crossref, Google Scholar
Faihe Y, Müller JP (1998) Behaviors coordination using restless bandits allocation indexes. Pfeifer R, Blumberg B, Meyer J-A, Wilson SW, eds. From Animals to Animats 5: Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior (MIT Press, Cambridge, MA), 159–164.Google Scholar
Fleming WH, Nisio M (1966) On the existence of optimal stochastic controls. J. Math. Mech. 15:777–794.Google Scholar
Fleming WH, Pardoux É (1982) Optimal control for partially observed diffusions. SIAM J. Control Optim. 20(2):261–285.Crossref, Google Scholar
Gittins J (1979) Bandit processes and dynamic allocation indices. J. Roy. Statist. Soc. Ser. B (Methodological) 41(2):148–177.Crossref, Google Scholar
Gittins J, Glazebrook K, Weber R (2011) Multi-Armed Bandit Allocation Indices (John Wiley & Sons, Hoboken, NJ).Crossref, Google Scholar
Glazebrook KD, Kirkbride C, Ruiz-Hernandez D (2006) Spinning plates and squad systems: Policies for bi-directional restless bandits. Adv. Appl. Probab. 38(1):95–115.Crossref, Google Scholar
Glazebrook KD, Niño-Mora J, Ansell PS (2002) Index policies for a class of discounted restless bandits. Adv. Appl. Probab. 34(4):754–774.Crossref, Google Scholar
Glazebrook KD, Ruiz-Hernandez D, Kirkbride C (2006) Some indexable families of restless bandit problems. Adv. Appl. Probab. 38(3):643–672.Crossref, Google Scholar
Heckman J, Cunha F (2010) Investing in our young people. Reynolds A, Rolnick A, Englund M, Temple JA, eds. Cost-Effective Programs in Children’s First Decade: A Human Capital Integration (Cambridge University, New York), 381–414.Google Scholar
Jacod J, Mémin J (1981) Sur un type de convergence intermédiaire entre la convergence en loi et la convergence en probabilitéédiaire entre la convergence en loi et la convergence en probabilité. Séminaire de Probabilités XV 1979/80 (Springer, New York), 529–546.Crossref, Google Scholar
Jacod J, Shiryaev AN (2003) Limit theorems for stochastic processes. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], 2nd ed., Vol. 288 (Springer, Berlin), 592–628.Crossref, Google Scholar
Jun T (2004) A survey on the bandit problem with switching costs. De Economist 152(4):513–541.Crossref, Google Scholar
Karatzas I (1984) Gittins indices in the dynamic allocation problem for diffusion processes. Ann. Probab. 12(1):173–192.Crossref, Google Scholar
Keller G, Rady S (2010) Strategic experimentation with Poisson bandits. Theoret. Econom. 5(2):275–311.Crossref, Google Scholar
Keller G, Rady S (2015) Breakdowns. Theoretical Econom. 10(1):175–202.Crossref, Google Scholar
Keller G, Rady S, Cripps M (2005) Strategic experimentation with exponential bandits. Econometrica 73(1):39–68.Crossref, Google Scholar
Klimov GP (1975) Time-sharing service systems. I. Theory Probab. ITS Appl. 19(3):532–551.Crossref, Google Scholar
Kohlmann M (1982) Existence of optimal controls for a partially observed semimartingale. Stochastic Processes and Their Appl. 13(2):215–226.Crossref, Google Scholar
Komatsu T (1973) Markov processes associated with certain integro-differential operators. Osaka J. Math. 10(2):271–303.Google Scholar
Kroft K, Lange F, Notowidigdo MJ (2012) Duration dependence and labor market conditions: Theory and evidence from a field experiment. Working paper 18387, National Bureau of Economic Research, Cambridge, MA.Google Scholar
Kurtz TG (1987) Martingale problems for controlled processes. Stochastic Modelling and Filtering (Springer, Berlin), 75–90.Crossref, Google Scholar
Kurtz TG (1998) Martingale problems for conditional distributions of Markov processes. Electron. J. Probab. 3(9):1–29.Google Scholar
Kurtz TG, Nappo G (2011) The filtered martingale problem. Crisan D, Rozovskiĭ, eds. The Oxford Handbook of Nonlinear Filtering (Oxford University Press, New York), 129–168.Google Scholar
Kurtz TG, Ocone DL (1988) Unique characterization of conditional distributions in nonlinear filtering. Ann. Probab. 16(1):80–107.Crossref, Google Scholar
Kurtz TG, Stockbridge RH (1998) Existence of Markov controls and characterization of optimal Markov controls. SIAM J. Control Optim. 36(2):609–653.Crossref, Google Scholar
Kushner HJ, Dupuis PG (2000) Numerical Methods for Stochastic Control Problems in Continuous Time, Vol. 24 (Springer, New York).Google Scholar
La Scala BF, Moran B (2006) Optimal target tracking with restless bandits. Digital Signal Processing 16(5):479–487.Crossref, Google Scholar
Lépingle D, Mémin J (1978) Sur l’intégrabilité uniforme des martingales exponentielles. Z. Wahrsch. Verw. Gebiete 42(3):175–203.Crossref, Google Scholar
Li C-P, Neely MJ (2012) Network utility maximization over partially observable Markovian channels. Performance Evaluation 70(7–8):528–548.Crossref, Google Scholar
Liu K, Zhao Q (2010) Indexability of restless bandit problems and optimality of Whittle index for dynamic multichannel access. IEEE Trans. Inform. Theory 56(11):5547–5567.Crossref, Google Scholar
Mahajan A, Teneketzis D (2008) Multi-armed bandit problems. Foundations Appl. Sensor Management 5(1):121–151.Crossref, Google Scholar
Mandelbaum A (1987) Continuous multi-armed bandits and multiparameter processes. Ann. Probab. 15(4):1527–1556.Crossref, Google Scholar
McCall BP, McCall JJ (1987) A sequential study of migration and job search. J. Labor Econom. 5(4):452–476.Crossref, Google Scholar
Morimoto H (1991) On average cost stopping time problems. Probab. Theory Related Fields 90(4):469–490.Crossref, Google Scholar
Niño-Mora J (2001) Restless bandits, partial conservation laws and indexability. Adv. Appl. Probab. 33(1):76–98.Crossref, Google Scholar
Niño-Mora J (2008) An index policy for dynamic fading-channel allocation to heterogeneous mobile users with partial observations. Proc. 4th EURO-NGI Conf. Next Generation Internet Networks (IEEE, Piscataway, NJ), 231–238.Crossref, Google Scholar
Novikov AA (1980) On conditions for uniform integrability of continuous non-negative martingales. Theory Probab. Its Appl. 24(4):820–824.Crossref, Google Scholar
Ny JL, Dahleh M, Feron E (2008) Multi-UAV dynamic routing with partial observations using restless bandit allocation indices. Amer. Control Conf. (IEEE, Piscataway, NJ), 4220–4225.Google Scholar
Ny JL, Feron E, Dahleh MA (2011) Scheduling continuous-time Kalman filters. IEEE Trans. Automatic Control 56(6):1381–1394.Crossref, Google Scholar
Øksendal BK, Sulem A (2005) Applied Stochastic Control of Jump Diffusions, Vol. 498 (Springer, Berlin).Google Scholar
Pandey S, Chakrabarti D, Agarwal D (2007) Multi-armed bandit problems with dependent arms. Proc. 24th Internat. Conf. Machine Learn., ICML ’07 (ACM, New York), 721–728.Crossref, Google Scholar
Peskir G, Shiryaev A (2006) Optimal Stopping and Free-Boundary Problems, Lectures in Mathematics ETH Zürich (Birkhäuser, Basel, Switzerland).Google Scholar
Pham H (1995) Optimal stopping of controlled jump diffusion processes and viscosity solutions. C. R. Acad. Sci. Paris Sér. I Math. 320(9):1113–1118.Google Scholar
Pham H (1998) Optimal stopping of controlled jump diffusion processes: A viscosity solution approach. J. Math. Systems Estim. Control 8(1):1–27.Google Scholar
Powell WB (2007) Approximate Dynamic Programming: Solving the Curses of Dimensionality, Vol. 703 (John Wiley & Sons, Hoboken, NJ).Crossref, Google Scholar
Presman EL (1990) Poisson version of the two-armed bandit problem with discounting. Theory Probab. Its Appl. 35(2):307–317.Crossref, Google Scholar
Presman ÈL, Sonin IN (1990) Sequential Control with Incomplete Information. Economic Theory, Econometrics, and Mathematical Economics (Academic Press, San Diego).Google Scholar
Protter PE (2005) Stochastic Integration and Differential Equations. Stochastic Modeling and Applied Probability, Vol. 21 (Springer, Berlin).Crossref, Google Scholar
Robbins H (1952) Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58(5):527–535.Crossref, Google Scholar
Rothschild M (1974) A two-armed bandit theory of market pricing. J. Econom. Theory 9(2):185–202.Crossref, Google Scholar
Schachermayer W, Schachinger W (1999) Is there a predictable criterion for mutual singularity of two probability measures on a filtered space? Teor. Veroyatnost. i Primenen. 44(1):101–110.Crossref, Google Scholar
Seierstad A (2009) Stochastic Control in Discrete and Continuous Time (Springer, New York).Crossref, Google Scholar
Startek M (2012) Vague convergence in the Skorohod representation theorem. Int. J. Contemp. Math. Sci. 7(22):1061–1066.Google Scholar
Stockbridge RH (2005) A separation principle for partially observed control of singular stochastic processes. Nonlinear Anal. 63:e2057–e2065.Crossref, Google Scholar
Stroock DW (1975) Diffusion processes associated with lévy generators. Probab. Theory and Related Fields 32(3):209–244.Google Scholar
Stroock DW, Varadhan SRS (2006) Multidimensional Diffusion Processes, Classics in Mathematics, (Springer, Berlin).Google Scholar
Veatch MH, Wein LM (1996) Scheduling a make-to-stock queue: Index policies and hedging points. Oper. Res. 44(4):634–647.Link, Google Scholar
Washburn R (2008) Application of multi-armed bandits to sensor management. Foundations Appl. Sensor Management (Springer, New York), 153–175.Crossref, Google Scholar
Weber RR, Weiss G (1990) On an index policy for restless bandits. J. Appl. Probab. 27(3):637–648.Crossref, Google Scholar
Weber RR, Weiss G (1991) Addendum to “On an index policy for restless bandits.” Adv. Appl. Probab. 23(2):429–430.Crossref, Google Scholar
Weitzman ML (1979) Optimal search for the best alternative. Econometrica 47(3):641–654.Crossref, Google Scholar
Whittle P (1981) Arm-acquiring bandits. Ann. Probab. 9(2):284–292.Crossref, Google Scholar
Whittle P (1988) Restless bandits: Activity allocation in a changing world. J. Appl. Probab. 25(A):287–298.Crossref, Google Scholar
Wonham WM (1968) On the separation theorem of stochastic control. SIAM J. Control 6(2):312–326.Crossref, Google Scholar

cover image Mathematics of Operations Research

Volume 43, Issue 2

May 2018

Pages 347-692, C2

Article Information

Metrics

Information

Received:May 30, 2015
Accepted:January 19, 2017
Published Online:September 05, 2017

Cite as

Roland Fryer, Philipp Harms (2017) Two-Armed Restless Bandits with Imperfect Information: Stochastic Control and Indexability. Mathematics of Operations Research 43(2):399-427.

https://doi.org/10.1287/moor.2017.0863

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Two-Armed Restless Bandits with Imperfect Information: Stochastic Control and Indexability

References

Volume 43, Issue 2

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News