Two-Armed Restless Bandits with Imperfect Information: Stochastic Control and Indexability
Published Online:5 Sep 2017https://doi.org/10.1287/moor.2017.0863
References
- (2011) Bayesian learning in social networks. Rev. Econom. Stud. 78(4):1201.Crossref, Google Scholar
- (2002/03) The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1):48–77 (electronic).Crossref, Google Scholar
- (1992) Denumerable-armed bandits. Econometrica 60(5):1071–1096.Crossref, Google Scholar
- (1994) Switching costs and the gittins index. Econometrica 62(3):687–694.Crossref, Google Scholar
- (1990) An expository review of sequential design and allocation rules. Technical report, Department of Statistics, Purdue University, West Lafayette, IN.Google Scholar
- (2006) Bandit problems. Cowles Foundation Discussion Papers 1551, Cowles Foundation for Research in Economics, Yale University, New Haven, CT.Google Scholar
- (1985) Bandit Problems: Sequential Allocation of Experiments, Monographs on Statistics and Applied Probability (Chapman & Hall, London).Crossref, Google Scholar
- (1999) Strategic experimentation. Econometrica 67(2):349–374.Crossref, Google Scholar
- (2000) Strategic experimentation: The undiscounted case. Hammond PJ, Myles GD, eds. Incentives, Organization and Public Economics. Papers in Honour of Sir James Mirrlees (Oxford University Press, New York), 53–68.Google Scholar
- (2002) Existence of optimal controls for partially observed jump processes. Acta Appl. Math. 74(2):155–175.Crossref, Google Scholar
- (2013) Bandit problems with Levy payoff processes. Math. Oper. Res. 38(1):92–107.Link, Google Scholar
- (2006) Interpreting the evidence on life cycle skill formation. Hanushek EA, Welch F, eds. Handbook of the Economics of Education Volume 1 (North-Holland, Amsterdam).Google Scholar
- (2004) On Skorohod spaces as universal sample path spaces. Preprint arXiv: math/0412092.Google Scholar
- (2011) Getting beneath the veil of effective schools: Evidence from New York City. Working paper 17632, National Bureau of Economic Research, Cambridge, MA.Google Scholar
- (1994) Dynamic allocation problems in continuous time. Ann. Appl. Probab. 4(2):255–286.Crossref, Google Scholar
- (1988) Existence of an optimal Markovian filter for the control under partial observations. SIAM J. Control Optim. 26(5):1025–1061.Crossref, Google Scholar
- (1998) Behaviors coordination using restless bandits allocation indexes. Pfeifer R, Blumberg B, Meyer J-A, Wilson SW, eds. From Animals to Animats 5: Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior (MIT Press, Cambridge, MA), 159–164.Google Scholar
- (1966) On the existence of optimal stochastic controls. J. Math. Mech. 15:777–794.Google Scholar
- (1982) Optimal control for partially observed diffusions. SIAM J. Control Optim. 20(2):261–285.Crossref, Google Scholar
- (1979) Bandit processes and dynamic allocation indices. J. Roy. Statist. Soc. Ser. B (Methodological) 41(2):148–177.Crossref, Google Scholar
- (2011) Multi-Armed Bandit Allocation Indices (John Wiley & Sons, Hoboken, NJ).Crossref, Google Scholar
- (2006) Spinning plates and squad systems: Policies for bi-directional restless bandits. Adv. Appl. Probab. 38(1):95–115.Crossref, Google Scholar
- (2002) Index policies for a class of discounted restless bandits. Adv. Appl. Probab. 34(4):754–774.Crossref, Google Scholar
- (2006) Some indexable families of restless bandit problems. Adv. Appl. Probab. 38(3):643–672.Crossref, Google Scholar
- (2010) Investing in our young people. Reynolds A, Rolnick A, Englund M, Temple JA, eds. Cost-Effective Programs in Children’s First Decade: A Human Capital Integration (Cambridge University, New York), 381–414.Google Scholar
- (1981) Sur un type de convergence intermédiaire entre la convergence en loi et la convergence en probabilitéédiaire entre la convergence en loi et la convergence en probabilité. Séminaire de Probabilités XV 1979/80 (Springer, New York), 529–546.Crossref, Google Scholar
- (2003) Limit theorems for stochastic processes. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], 2nd ed., Vol. 288 (Springer, Berlin), 592–628.Crossref, Google Scholar
- (2004) A survey on the bandit problem with switching costs. De Economist 152(4):513–541.Crossref, Google Scholar
- (1984) Gittins indices in the dynamic allocation problem for diffusion processes. Ann. Probab. 12(1):173–192.Crossref, Google Scholar
- (2010) Strategic experimentation with Poisson bandits. Theoret. Econom. 5(2):275–311.Crossref, Google Scholar
- (2015) Breakdowns. Theoretical Econom. 10(1):175–202.Crossref, Google Scholar
- (2005) Strategic experimentation with exponential bandits. Econometrica 73(1):39–68.Crossref, Google Scholar
- (1975) Time-sharing service systems. I. Theory Probab. ITS Appl. 19(3):532–551.Crossref, Google Scholar
- (1982) Existence of optimal controls for a partially observed semimartingale. Stochastic Processes and Their Appl. 13(2):215–226.Crossref, Google Scholar
- (1973) Markov processes associated with certain integro-differential operators. Osaka J. Math. 10(2):271–303.Google Scholar
- (2012) Duration dependence and labor market conditions: Theory and evidence from a field experiment. Working paper 18387, National Bureau of Economic Research, Cambridge, MA.Google Scholar
- (1987) Martingale problems for controlled processes. Stochastic Modelling and Filtering (Springer, Berlin), 75–90.Crossref, Google Scholar
- (1998) Martingale problems for conditional distributions of Markov processes. Electron. J. Probab. 3(9):1–29.Google Scholar
- (2011) The filtered martingale problem. Crisan D, Rozovskiĭ, eds. The Oxford Handbook of Nonlinear Filtering (Oxford University Press, New York), 129–168.Google Scholar
- (1988) Unique characterization of conditional distributions in nonlinear filtering. Ann. Probab. 16(1):80–107.Crossref, Google Scholar
- (1998) Existence of Markov controls and characterization of optimal Markov controls. SIAM J. Control Optim. 36(2):609–653.Crossref, Google Scholar
- (2000) Numerical Methods for Stochastic Control Problems in Continuous Time, Vol. 24 (Springer, New York).Google Scholar
- (2006) Optimal target tracking with restless bandits. Digital Signal Processing 16(5):479–487.Crossref, Google Scholar
- (1978) Sur l’intégrabilité uniforme des martingales exponentielles. Z. Wahrsch. Verw. Gebiete 42(3):175–203.Crossref, Google Scholar
- (2012) Network utility maximization over partially observable Markovian channels. Performance Evaluation 70(7–8):528–548.Crossref, Google Scholar
- (2010) Indexability of restless bandit problems and optimality of Whittle index for dynamic multichannel access. IEEE Trans. Inform. Theory 56(11):5547–5567.Crossref, Google Scholar
- (2008) Multi-armed bandit problems. Foundations Appl. Sensor Management 5(1):121–151.Crossref, Google Scholar
- (1987) Continuous multi-armed bandits and multiparameter processes. Ann. Probab. 15(4):1527–1556.Crossref, Google Scholar
- (1987) A sequential study of migration and job search. J. Labor Econom. 5(4):452–476.Crossref, Google Scholar
- (1991) On average cost stopping time problems. Probab. Theory Related Fields 90(4):469–490.Crossref, Google Scholar
- (2001) Restless bandits, partial conservation laws and indexability. Adv. Appl. Probab. 33(1):76–98.Crossref, Google Scholar
- (2008) An index policy for dynamic fading-channel allocation to heterogeneous mobile users with partial observations. Proc. 4th EURO-NGI Conf. Next Generation Internet Networks (IEEE, Piscataway, NJ), 231–238.Crossref, Google Scholar
- (1980) On conditions for uniform integrability of continuous non-negative martingales. Theory Probab. Its Appl. 24(4):820–824.Crossref, Google Scholar
- (2008) Multi-UAV dynamic routing with partial observations using restless bandit allocation indices. Amer. Control Conf. (IEEE, Piscataway, NJ), 4220–4225.Google Scholar
- (2011) Scheduling continuous-time Kalman filters. IEEE Trans. Automatic Control 56(6):1381–1394.Crossref, Google Scholar
- (2005) Applied Stochastic Control of Jump Diffusions, Vol. 498 (Springer, Berlin).Google Scholar
- (2007) Multi-armed bandit problems with dependent arms. Proc. 24th Internat. Conf. Machine Learn., ICML ’07 (ACM, New York), 721–728.Crossref, Google Scholar
- (2006) Optimal Stopping and Free-Boundary Problems, Lectures in Mathematics ETH Zürich (Birkhäuser, Basel, Switzerland).Google Scholar
- (1995) Optimal stopping of controlled jump diffusion processes and viscosity solutions. C. R. Acad. Sci. Paris Sér. I Math. 320(9):1113–1118.Google Scholar
- (1998) Optimal stopping of controlled jump diffusion processes: A viscosity solution approach. J. Math. Systems Estim. Control 8(1):1–27.Google Scholar
- (2007) Approximate Dynamic Programming: Solving the Curses of Dimensionality, Vol. 703 (John Wiley & Sons, Hoboken, NJ).Crossref, Google Scholar
- (1990) Poisson version of the two-armed bandit problem with discounting. Theory Probab. Its Appl. 35(2):307–317.Crossref, Google Scholar
- (1990) Sequential Control with Incomplete Information. Economic Theory, Econometrics, and Mathematical Economics (Academic Press, San Diego).Google Scholar
- (2005) Stochastic Integration and Differential Equations. Stochastic Modeling and Applied Probability, Vol. 21 (Springer, Berlin).Crossref, Google Scholar
- (1952) Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58(5):527–535.Crossref, Google Scholar
- (1974) A two-armed bandit theory of market pricing. J. Econom. Theory 9(2):185–202.Crossref, Google Scholar
- (1999) Is there a predictable criterion for mutual singularity of two probability measures on a filtered space? Teor. Veroyatnost. i Primenen. 44(1):101–110.Crossref, Google Scholar
- (2009) Stochastic Control in Discrete and Continuous Time (Springer, New York).Crossref, Google Scholar
- (2012) Vague convergence in the Skorohod representation theorem. Int. J. Contemp. Math. Sci. 7(22):1061–1066.Google Scholar
- (2005) A separation principle for partially observed control of singular stochastic processes. Nonlinear Anal. 63:e2057–e2065.Crossref, Google Scholar
- (1975) Diffusion processes associated with lévy generators. Probab. Theory and Related Fields 32(3):209–244.Google Scholar
- (2006) Multidimensional Diffusion Processes, Classics in Mathematics, (Springer, Berlin).Google Scholar
- (1996) Scheduling a make-to-stock queue: Index policies and hedging points. Oper. Res. 44(4):634–647.Link, Google Scholar
- (2008) Application of multi-armed bandits to sensor management. Foundations Appl. Sensor Management (Springer, New York), 153–175.Crossref, Google Scholar
- (1990) On an index policy for restless bandits. J. Appl. Probab. 27(3):637–648.Crossref, Google Scholar
- (1991) Addendum to “On an index policy for restless bandits.” Adv. Appl. Probab. 23(2):429–430.Crossref, Google Scholar
- (1979) Optimal search for the best alternative. Econometrica 47(3):641–654.Crossref, Google Scholar
- (1981) Arm-acquiring bandits. Ann. Probab. 9(2):284–292.Crossref, Google Scholar
- (1988) Restless bandits: Activity allocation in a changing world. J. Appl. Probab. 25(A):287–298.Crossref, Google Scholar
- (1968) On the separation theorem of stochastic control. SIAM J. Control 6(2):312–326.Crossref, Google Scholar

