Reinforcement with Fading Memories
Published Online:18 Jun 2020https://doi.org/10.1287/moor.2019.1031
References
- [1] (1997) Inequalities for Differential and Integral Equations, vol. 197 (Elsevier, Amsterdam).Google Scholar
- [2] (2002) The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1):48–77.Crossref, Google Scholar
- [3] (2005) On the convergence of reinforcement learning. J. Econom. Theory 122(1):1–36.Crossref, Google Scholar
- [4] (2000) Learning and forgetting: The dynamics of aircraft production. Amer. Econom. Rev. 90(4):1034–1054.Crossref, Google Scholar
- [5] (2012) Adaptive Algorithms and Stochastic Approximations, vol. 22 (Springer, Berlin).Google Scholar
- [6] (2015) Non-stationary stochastic optimization. Oper. Res. 63(5):1227–1244.Link, Google Scholar
- [7] (1998) State space collapse with application to heavy traffic limits for multiclass queueing networks. Queueing Systems 30(1–2):89–140.Crossref, Google Scholar
- [8] (1955) Theory of Ordinary Differential Equations (Krieger, London).Google Scholar
- [9] (2010) A Markov model for the spread of viruses in an open population. J. Appl. Probab. 47(4):976–996.Crossref, Google Scholar
- [10] (1970) The two-armed-bandit problem with time-invariant finite memory. IEEE Trans. Inform. Theory 16(2):185–195.Crossref, Google Scholar
- [11] (2008) Differential equation approximations for Markov chains. Probab. Surveys 5:37–79.Crossref, Google Scholar
- [12] (2016) Asymptotic analysis of multi-queue service systems with dynamic customer choice. Working paper, University of British Columbia, Vancouver.Google Scholar
- [13] (2018) The impact of delay announcements on hospital network coordination and waiting times. Management Sci. 65(5):1969–1994.Google Scholar
- [14] (1964) Memory: A Contribution to Experimental Psychology (Dover Publications, Mineola, NY).Google Scholar
- [15] (1998) Predicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibria. Amer. Econom. Rev. 88(4):848–881.Google Scholar
- [16] (2005) Markov Processes: Characterization and Convergence (John Wiley & Sons, Hoboken, NJ).Google Scholar
- [17] (2018) Coupled queues with customer impatience. Performance Evaluation 118:33–47.Crossref, Google Scholar
- [18] (2016) Delay, memory, and messaging tradeoffs in distributed service systems. Proc. ACM SIGMETRICS Internat. Conf. (ACM, New York), 1–12.Google Scholar
- [19] (2008) On upper-confidence bound policies for non-stationary bandit problems. Preprint, submitted May 22, https://arxiv.org/abs/0805.3415.Google Scholar
- [20] (2001) Probability and Random Processes (Oxford University Press, Oxford, UK).Crossref, Google Scholar
- [21] (1981) Learning the evolutionarily stable strategy. J. Theoretical Biology 89(4):611–633.Crossref, Google Scholar
- [22] (2003) To Queue or Not to Queue: Equilibrium Behavior in Queueing Systems, vol. 59 (Springer Science & Business Media, Berlin).Crossref, Google Scholar
- [23] (1970) Learning with finite memory. Ann. Math. Statist. 41(3):765–782.Crossref, Google Scholar
- [24] (1970) On the law of effect. J. Experiment. Anal. Behav. 13(2):243–266.Crossref, Google Scholar
- [25] (2016) Chasing demand: Learning and earning in a changing environment. Math. Oper. Res. 42(2):277–307.Link, Google Scholar
- [26] (1970) Solutions of ordinary differential equations as limits of pure jump Markov processes. J. Appl. Probab. 7(1):49–58.Crossref, Google Scholar
- [27] (1978) Strong approximation theorems for density dependent Markov chains. Stochastic Processes Appl. 6(3):223–240.Crossref, Google Scholar
- [28] (1959) Individual Choice Behavior: A Theoretical Analysis (John Wiley & Sons, Hoboken, NJ).Google Scholar
- [29] (1998) Strong approximations for markovian service networks. Queueing Systems 30(1–2):149–201.Crossref, Google Scholar
- [30] (1998) Uniform acceleration expansions for Markov chains with time-varying rates. Ann. Appl. Probab. 8(4):1130–1155.Crossref, Google Scholar
- [31] (2018) On the capacity of information processing systems. Oper. Res. 66(2):568–586.Link, Google Scholar
- [32] (2016) Learning in games via reinforcement and regularization. Math. Oper. Res. 41(4):1297–1324.Link, Google Scholar
- [33] (2002) Load balancing with memory. Proc. IEEE Sympos. Foundations Comput. Sci. (FOCS) (IEEE, Piscataway, NJ), 799–808.Google Scholar
- [34] (2007) A survey of random processes with reinforcement. Probab. Surveys 4:1–79.Crossref, Google Scholar
- [35] (2016) Managing information in queues: The impact of giving delayed information to customers. Preprint, submitted September 23, https://arxiv.org/abs/1610.01972.Google Scholar
- [36] (2011) An ode for an overloaded X model involving a stochastic averaging principle. Stochastic Systems 1(1):59–108.Link, Google Scholar
- [37] (1999) Optimal properties of stimulus-response learning models. Games Econom. Behav. 29(1–2):244–273.Crossref, Google Scholar
- [38] (1995) Large Deviations for Performance Analysis: Queues, Communication and Computing, vol. 5 (CRC Press, Boca Raton, FL).Google Scholar
- [39] (2009) Discrete Choice Methods with Simulation (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- [40] (2012) On the power of (even a little) resource pooling. Stochastics Systems 2(1):1–66.Link, Google Scholar
- [41] (2014) Follow the leader with dropout perturbations. PMLR 35:949–974.Google Scholar
- [42] (1981) Optional sampling of submartingales indexed by partially ordered sets. Ann. Probab. 9(6):957–970.Crossref, Google Scholar
- [43] (2018) Query complexity of Bayesian private learning. NIPS 31:2431–2440.Google Scholar

