Two-Armed Restless Bandits with Imperfect Information: Stochastic Control and Indexability

Published Online:https://doi.org/10.1287/moor.2017.0863

References

  • Acemoglu D, Dahleh M, Lobel I, Ozdaglar A (2011) Bayesian learning in social networks. Rev. Econom. Stud. 78(4):1201.CrossrefGoogle Scholar
  • Auer P, Cesa-Bianchi N, Freund Y, Schapire RE (2002/03) The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1):48–77 (electronic).CrossrefGoogle Scholar
  • Banks JS, Sundaram RK (1992) Denumerable-armed bandits. Econometrica 60(5):1071–1096.CrossrefGoogle Scholar
  • Banks JS, Sundaram RK (1994) Switching costs and the gittins index. Econometrica 62(3):687–694.CrossrefGoogle Scholar
  • Basu A, Bose A, Ghosh JK (1990) An expository review of sequential design and allocation rules. Technical report, Department of Statistics, Purdue University, West Lafayette, IN.Google Scholar
  • Bergemann D, Välimäki J (2006) Bandit problems. Cowles Foundation Discussion Papers 1551, Cowles Foundation for Research in Economics, Yale University, New Haven, CT.Google Scholar
  • Berry DA, Fristedt B (1985) Bandit Problems: Sequential Allocation of Experiments, Monographs on Statistics and Applied Probability (Chapman & Hall, London).CrossrefGoogle Scholar
  • Bolton P, Harris C (1999) Strategic experimentation. Econometrica 67(2):349–374.CrossrefGoogle Scholar
  • Bolton P, Harris C (2000) Strategic experimentation: The undiscounted case. Hammond PJ, Myles GD, eds. Incentives, Organization and Public Economics. Papers in Honour of Sir James Mirrlees (Oxford University Press, New York), 53–68.Google Scholar
  • Ceci C, Gerardi A, Tardelli P (2002) Existence of optimal controls for partially observed jump processes. Acta Appl. Math. 74(2):155–175.CrossrefGoogle Scholar
  • Cohen A, Solan E (2013) Bandit problems with Levy payoff processes. Math. Oper. Res. 38(1):92–107.LinkGoogle Scholar
  • Cunha F, Heckman JJ, Lochner L, Masterov DV (2006) Interpreting the evidence on life cycle skill formation. Hanushek EA, Welch F, eds. Handbook of the Economics of Education Volume 1 (North-Holland, Amsterdam).Google Scholar
  • Delzeith O (2004) On Skorohod spaces as universal sample path spaces. Preprint arXiv: math/0412092.Google Scholar
  • Dobbie W, Fryer R (2011) Getting beneath the veil of effective schools: Evidence from New York City. Working paper 17632, National Bureau of Economic Research, Cambridge, MA.Google Scholar
  • El Karoui N, Karatzas I (1994) Dynamic allocation problems in continuous time. Ann. Appl. Probab. 4(2):255–286.CrossrefGoogle Scholar
  • El Karoui N, Nguyen DH, Jeanblanc-Picqué M (1988) Existence of an optimal Markovian filter for the control under partial observations. SIAM J. Control Optim. 26(5):1025–1061.CrossrefGoogle Scholar
  • Faihe Y, Müller JP (1998) Behaviors coordination using restless bandits allocation indexes. Pfeifer R, Blumberg B, Meyer J-A, Wilson SW, eds. From Animals to Animats 5: Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior (MIT Press, Cambridge, MA), 159–164.Google Scholar
  • Fleming WH, Nisio M (1966) On the existence of optimal stochastic controls. J. Math. Mech. 15:777–794.Google Scholar
  • Fleming WH, Pardoux É (1982) Optimal control for partially observed diffusions. SIAM J. Control Optim. 20(2):261–285.CrossrefGoogle Scholar
  • Gittins J (1979) Bandit processes and dynamic allocation indices. J. Roy. Statist. Soc. Ser. B (Methodological) 41(2):148–177.CrossrefGoogle Scholar
  • Gittins J, Glazebrook K, Weber R (2011) Multi-Armed Bandit Allocation Indices (John Wiley & Sons, Hoboken, NJ).CrossrefGoogle Scholar
  • Glazebrook KD, Kirkbride C, Ruiz-Hernandez D (2006) Spinning plates and squad systems: Policies for bi-directional restless bandits. Adv. Appl. Probab. 38(1):95–115.CrossrefGoogle Scholar
  • Glazebrook KD, Niño-Mora J, Ansell PS (2002) Index policies for a class of discounted restless bandits. Adv. Appl. Probab. 34(4):754–774.CrossrefGoogle Scholar
  • Glazebrook KD, Ruiz-Hernandez D, Kirkbride C (2006) Some indexable families of restless bandit problems. Adv. Appl. Probab. 38(3):643–672.CrossrefGoogle Scholar
  • Heckman J, Cunha F (2010) Investing in our young people. Reynolds A, Rolnick A, Englund M, Temple JA, eds. Cost-Effective Programs in Children’s First Decade: A Human Capital Integration (Cambridge University, New York), 381–414.Google Scholar
  • Jacod J, Mémin J (1981) Sur un type de convergence intermédiaire entre la convergence en loi et la convergence en probabilitéédiaire entre la convergence en loi et la convergence en probabilité. Séminaire de Probabilités XV 1979/80 (Springer, New York), 529–546.CrossrefGoogle Scholar
  • Jacod J, Shiryaev AN (2003) Limit theorems for stochastic processes. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], 2nd ed., Vol. 288 (Springer, Berlin), 592–628.CrossrefGoogle Scholar
  • Jun T (2004) A survey on the bandit problem with switching costs. De Economist 152(4):513–541.CrossrefGoogle Scholar
  • Karatzas I (1984) Gittins indices in the dynamic allocation problem for diffusion processes. Ann. Probab. 12(1):173–192.CrossrefGoogle Scholar
  • Keller G, Rady S (2010) Strategic experimentation with Poisson bandits. Theoret. Econom. 5(2):275–311.CrossrefGoogle Scholar
  • Keller G, Rady S (2015) Breakdowns. Theoretical Econom. 10(1):175–202.CrossrefGoogle Scholar
  • Keller G, Rady S, Cripps M (2005) Strategic experimentation with exponential bandits. Econometrica 73(1):39–68.CrossrefGoogle Scholar
  • Klimov GP (1975) Time-sharing service systems. I. Theory Probab. ITS Appl. 19(3):532–551.CrossrefGoogle Scholar
  • Kohlmann M (1982) Existence of optimal controls for a partially observed semimartingale. Stochastic Processes and Their Appl. 13(2):215–226.CrossrefGoogle Scholar
  • Komatsu T (1973) Markov processes associated with certain integro-differential operators. Osaka J. Math. 10(2):271–303.Google Scholar
  • Kroft K, Lange F, Notowidigdo MJ (2012) Duration dependence and labor market conditions: Theory and evidence from a field experiment. Working paper 18387, National Bureau of Economic Research, Cambridge, MA.Google Scholar
  • Kurtz TG (1987) Martingale problems for controlled processes. Stochastic Modelling and Filtering (Springer, Berlin), 75–90.CrossrefGoogle Scholar
  • Kurtz TG (1998) Martingale problems for conditional distributions of Markov processes. Electron. J. Probab. 3(9):1–29.Google Scholar
  • Kurtz TG, Nappo G (2011) The filtered martingale problem. Crisan D, Rozovskiĭ, eds. The Oxford Handbook of Nonlinear Filtering (Oxford University Press, New York), 129–168.Google Scholar
  • Kurtz TG, Ocone DL (1988) Unique characterization of conditional distributions in nonlinear filtering. Ann. Probab. 16(1):80–107.CrossrefGoogle Scholar
  • Kurtz TG, Stockbridge RH (1998) Existence of Markov controls and characterization of optimal Markov controls. SIAM J. Control Optim. 36(2):609–653.CrossrefGoogle Scholar
  • Kushner HJ, Dupuis PG (2000) Numerical Methods for Stochastic Control Problems in Continuous Time, Vol. 24 (Springer, New York).Google Scholar
  • La Scala BF, Moran B (2006) Optimal target tracking with restless bandits. Digital Signal Processing 16(5):479–487.CrossrefGoogle Scholar
  • Lépingle D, Mémin J (1978) Sur l’intégrabilité uniforme des martingales exponentielles. Z. Wahrsch. Verw. Gebiete 42(3):175–203.CrossrefGoogle Scholar
  • Li C-P, Neely MJ (2012) Network utility maximization over partially observable Markovian channels. Performance Evaluation 70(7–8):528–548.CrossrefGoogle Scholar
  • Liu K, Zhao Q (2010) Indexability of restless bandit problems and optimality of Whittle index for dynamic multichannel access. IEEE Trans. Inform. Theory 56(11):5547–5567.CrossrefGoogle Scholar
  • Mahajan A, Teneketzis D (2008) Multi-armed bandit problems. Foundations Appl. Sensor Management 5(1):121–151.CrossrefGoogle Scholar
  • Mandelbaum A (1987) Continuous multi-armed bandits and multiparameter processes. Ann. Probab. 15(4):1527–1556.CrossrefGoogle Scholar
  • McCall BP, McCall JJ (1987) A sequential study of migration and job search. J. Labor Econom. 5(4):452–476.CrossrefGoogle Scholar
  • Morimoto H (1991) On average cost stopping time problems. Probab. Theory Related Fields 90(4):469–490.CrossrefGoogle Scholar
  • Niño-Mora J (2001) Restless bandits, partial conservation laws and indexability. Adv. Appl. Probab. 33(1):76–98.CrossrefGoogle Scholar
  • Niño-Mora J (2008) An index policy for dynamic fading-channel allocation to heterogeneous mobile users with partial observations. Proc. 4th EURO-NGI Conf. Next Generation Internet Networks (IEEE, Piscataway, NJ), 231–238.CrossrefGoogle Scholar
  • Novikov AA (1980) On conditions for uniform integrability of continuous non-negative martingales. Theory Probab. Its Appl. 24(4):820–824.CrossrefGoogle Scholar
  • Ny JL, Dahleh M, Feron E (2008) Multi-UAV dynamic routing with partial observations using restless bandit allocation indices. Amer. Control Conf. (IEEE, Piscataway, NJ), 4220–4225.Google Scholar
  • Ny JL, Feron E, Dahleh MA (2011) Scheduling continuous-time Kalman filters. IEEE Trans. Automatic Control 56(6):1381–1394.CrossrefGoogle Scholar
  • Øksendal BK, Sulem A (2005) Applied Stochastic Control of Jump Diffusions, Vol. 498 (Springer, Berlin).Google Scholar
  • Pandey S, Chakrabarti D, Agarwal D (2007) Multi-armed bandit problems with dependent arms. Proc. 24th Internat. Conf. Machine Learn., ICML ’07 (ACM, New York), 721–728.CrossrefGoogle Scholar
  • Peskir G, Shiryaev A (2006) Optimal Stopping and Free-Boundary Problems, Lectures in Mathematics ETH Zürich (Birkhäuser, Basel, Switzerland).Google Scholar
  • Pham H (1995) Optimal stopping of controlled jump diffusion processes and viscosity solutions. C. R. Acad. Sci. Paris Sér. I Math. 320(9):1113–1118.Google Scholar
  • Pham H (1998) Optimal stopping of controlled jump diffusion processes: A viscosity solution approach. J. Math. Systems Estim. Control 8(1):1–27.Google Scholar
  • Powell WB (2007) Approximate Dynamic Programming: Solving the Curses of Dimensionality, Vol. 703 (John Wiley & Sons, Hoboken, NJ).CrossrefGoogle Scholar
  • Presman EL (1990) Poisson version of the two-armed bandit problem with discounting. Theory Probab. Its Appl. 35(2):307–317.CrossrefGoogle Scholar
  • Presman ÈL, Sonin IN (1990) Sequential Control with Incomplete Information. Economic Theory, Econometrics, and Mathematical Economics (Academic Press, San Diego).Google Scholar
  • Protter PE (2005) Stochastic Integration and Differential Equations. Stochastic Modeling and Applied Probability, Vol. 21 (Springer, Berlin).CrossrefGoogle Scholar
  • Robbins H (1952) Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58(5):527–535.CrossrefGoogle Scholar
  • Rothschild M (1974) A two-armed bandit theory of market pricing. J. Econom. Theory 9(2):185–202.CrossrefGoogle Scholar
  • Schachermayer W, Schachinger W (1999) Is there a predictable criterion for mutual singularity of two probability measures on a filtered space? Teor. Veroyatnost. i Primenen. 44(1):101–110.CrossrefGoogle Scholar
  • Seierstad A (2009) Stochastic Control in Discrete and Continuous Time (Springer, New York).CrossrefGoogle Scholar
  • Startek M (2012) Vague convergence in the Skorohod representation theorem. Int. J. Contemp. Math. Sci. 7(22):1061–1066.Google Scholar
  • Stockbridge RH (2005) A separation principle for partially observed control of singular stochastic processes. Nonlinear Anal. 63:e2057–e2065.CrossrefGoogle Scholar
  • Stroock DW (1975) Diffusion processes associated with lévy generators. Probab. Theory and Related Fields 32(3):209–244.Google Scholar
  • Stroock DW, Varadhan SRS (2006) Multidimensional Diffusion Processes, Classics in Mathematics, (Springer, Berlin).Google Scholar
  • Veatch MH, Wein LM (1996) Scheduling a make-to-stock queue: Index policies and hedging points. Oper. Res. 44(4):634–647.LinkGoogle Scholar
  • Washburn R (2008) Application of multi-armed bandits to sensor management. Foundations Appl. Sensor Management (Springer, New York), 153–175.CrossrefGoogle Scholar
  • Weber RR, Weiss G (1990) On an index policy for restless bandits. J. Appl. Probab. 27(3):637–648.CrossrefGoogle Scholar
  • Weber RR, Weiss G (1991) Addendum to “On an index policy for restless bandits.” Adv. Appl. Probab. 23(2):429–430.CrossrefGoogle Scholar
  • Weitzman ML (1979) Optimal search for the best alternative. Econometrica 47(3):641–654.CrossrefGoogle Scholar
  • Whittle P (1981) Arm-acquiring bandits. Ann. Probab. 9(2):284–292.CrossrefGoogle Scholar
  • Whittle P (1988) Restless bandits: Activity allocation in a changing world. J. Appl. Probab. 25(A):287–298.CrossrefGoogle Scholar
  • Wonham WM (1968) On the separation theorem of stochastic control. SIAM J. Control 6(2):312–326.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.