Reinforcement with Fading Memories

Published Online:https://doi.org/10.1287/moor.2019.1031

References

  • [1] Ames WF, Pachpatte B (1997) Inequalities for Differential and Integral Equations, vol. 197 (Elsevier, Amsterdam).Google Scholar
  • [2] Auer P, Cesa-Bianchi N, Freund Y, Schapire RE (2002) The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1):48–77.CrossrefGoogle Scholar
  • [3] Beggs AW (2005) On the convergence of reinforcement learning. J. Econom. Theory 122(1):1–36.CrossrefGoogle Scholar
  • [4] Benkard CL (2000) Learning and forgetting: The dynamics of aircraft production. Amer. Econom. Rev. 90(4):1034–1054.CrossrefGoogle Scholar
  • [5] Benveniste A, Métivier M, Priouret P (2012) Adaptive Algorithms and Stochastic Approximations, vol. 22 (Springer, Berlin).Google Scholar
  • [6] Besbes O, Gur Y, Zeevi A (2015) Non-stationary stochastic optimization. Oper. Res. 63(5):1227–1244.LinkGoogle Scholar
  • [7] Bramson M (1998) State space collapse with application to heavy traffic limits for multiclass queueing networks. Queueing Systems 30(1–2):89–140.CrossrefGoogle Scholar
  • [8] Coddington EA, Levinson N (1955) Theory of Ordinary Differential Equations (Krieger, London).Google Scholar
  • [9] Coutin L, Decreusefond L, Dhersin J-S (2010) A Markov model for the spread of viruses in an open population. J. Appl. Probab. 47(4):976–996.CrossrefGoogle Scholar
  • [10] Cover T, Hellman M (1970) The two-armed-bandit problem with time-invariant finite memory. IEEE Trans. Inform. Theory 16(2):185–195.CrossrefGoogle Scholar
  • [11] Darling R, Norris JR (2008) Differential equation approximations for Markov chains. Probab. Surveys 5:37–79.CrossrefGoogle Scholar
  • [12] Ding Y, Nagarajan M, Zhang ZG (2016) Asymptotic analysis of multi-queue service systems with dynamic customer choice. Working paper, University of British Columbia, Vancouver.Google Scholar
  • [13] Dong J, Yom-Tov E, Yom-Tov GB (2018) The impact of delay announcements on hospital network coordination and waiting times. Management Sci. 65(5):1969–1994.Google Scholar
  • [14] Ebbinghaus H (1964) Memory: A Contribution to Experimental Psychology (Dover Publications, Mineola, NY).Google Scholar
  • [15] Erev I, Roth AE (1998) Predicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibria. Amer. Econom. Rev. 88(4):848–881.Google Scholar
  • [16] Ethier SN, Kurtz TG (2005) Markov Processes: Characterization and Convergence (John Wiley & Sons, Hoboken, NJ).Google Scholar
  • [17] Evdokimova E, De Turck K, Fiems D (2018) Coupled queues with customer impatience. Performance Evaluation 118:33–47.CrossrefGoogle Scholar
  • [18] Gamarnik D, Tsitsiklis JN, Zubeldia M (2016) Delay, memory, and messaging tradeoffs in distributed service systems. Proc. ACM SIGMETRICS Internat. Conf. (ACM, New York), 1–12.Google Scholar
  • [19] Garivier A, Moulines E (2008) On upper-confidence bound policies for non-stationary bandit problems. Preprint, submitted May 22, https://arxiv.org/abs/0805.3415.Google Scholar
  • [20] Grimmett G, Stirzaker D (2001) Probability and Random Processes (Oxford University Press, Oxford, UK).CrossrefGoogle Scholar
  • [21] Harley CB (1981) Learning the evolutionarily stable strategy. J. Theoretical Biology 89(4):611–633.CrossrefGoogle Scholar
  • [22] Hassin R, Haviv M (2003) To Queue or Not to Queue: Equilibrium Behavior in Queueing Systems, vol. 59 (Springer Science & Business Media, Berlin).CrossrefGoogle Scholar
  • [23] Hellman ME, Cover TM (1970) Learning with finite memory. Ann. Math. Statist. 41(3):765–782.CrossrefGoogle Scholar
  • [24] Herrnstein RJ (1970) On the law of effect. J. Experiment. Anal. Behav. 13(2):243–266.CrossrefGoogle Scholar
  • [25] Keskin NB, Zeevi A (2016) Chasing demand: Learning and earning in a changing environment. Math. Oper. Res. 42(2):277–307.LinkGoogle Scholar
  • [26] Kurtz TG (1970) Solutions of ordinary differential equations as limits of pure jump Markov processes. J. Appl. Probab. 7(1):49–58.CrossrefGoogle Scholar
  • [27] Kurtz TG (1978) Strong approximation theorems for density dependent Markov chains. Stochastic Processes Appl. 6(3):223–240.CrossrefGoogle Scholar
  • [28] Luce RD (1959) Individual Choice Behavior: A Theoretical Analysis (John Wiley & Sons, Hoboken, NJ).Google Scholar
  • [29] Mandelbaum A, Massey WA, Reiman MI (1998) Strong approximations for markovian service networks. Queueing Systems 30(1–2):149–201.CrossrefGoogle Scholar
  • [30] Massey WA, Whitt W (1998) Uniform acceleration expansions for Markov chains with time-varying rates. Ann. Appl. Probab. 8(4):1130–1155.CrossrefGoogle Scholar
  • [31] Massoulié L, Xu K (2018) On the capacity of information processing systems. Oper. Res. 66(2):568–586.LinkGoogle Scholar
  • [32] Mertikopoulos P, Sandholm WH (2016) Learning in games via reinforcement and regularization. Math. Oper. Res. 41(4):1297–1324.LinkGoogle Scholar
  • [33] Mitzenmacher M, Prabhakar B, Shah D (2002) Load balancing with memory. Proc. IEEE Sympos. Foundations Comput. Sci. (FOCS) (IEEE, Piscataway, NJ), 799–808.Google Scholar
  • [34] Pemantle R (2007) A survey of random processes with reinforcement. Probab. Surveys 4:1–79.CrossrefGoogle Scholar
  • [35] Pender J, Rand RH, Wesson E (2016) Managing information in queues: The impact of giving delayed information to customers. Preprint, submitted September 23, https://arxiv.org/abs/1610.01972.Google Scholar
  • [36] Perry O, Whitt W (2011) An ode for an overloaded X model involving a stochastic averaging principle. Stochastic Systems 1(1):59–108.LinkGoogle Scholar
  • [37] Rustichini A (1999) Optimal properties of stimulus-response learning models. Games Econom. Behav. 29(1–2):244–273.CrossrefGoogle Scholar
  • [38] Shwartz A, Weiss A (1995) Large Deviations for Performance Analysis: Queues, Communication and Computing, vol. 5 (CRC Press, Boca Raton, FL).Google Scholar
  • [39] Train KE (2009) Discrete Choice Methods with Simulation (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • [40] Tsitsiklis JN, Xu K (2012) On the power of (even a little) resource pooling. Stochastics Systems 2(1):1–66.LinkGoogle Scholar
  • [41] Van Erven T, Kotl W (2014) Follow the leader with dropout perturbations. PMLR 35:949–974.Google Scholar
  • [42] Washburn R, Willsky A (1981) Optional sampling of submartingales indexed by partially ordered sets. Ann. Probab. 9(6):957–970.CrossrefGoogle Scholar
  • [43] Xu K (2018) Query complexity of Bayesian private learning. NIPS 31:2431–2440.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.