Reinforcement with Fading Memories

Kuang Xu
Corresponding Author
Kuang Xu
[email protected]
https://orcid.org/0000-0002-2221-1648
Graduate School of Business, Stanford University, Stanford, California 94305;
Search for more papers by this author
,
Se-Young Yun
Se-Young Yun
[email protected]
Department of Industrial & Systems Engineering, KAIST, Daejeon, Republic of Korea
Search for more papers by this author

Kuang Xu

Corresponding Author

Kuang Xu

[email protected]

https://orcid.org/0000-0002-2221-1648

Graduate School of Business, Stanford University, Stanford, California 94305;

Search for more papers by this author

Se-Young Yun

[email protected]

Department of Industrial & Systems Engineering, KAIST, Daejeon, Republic of Korea

Search for more papers by this author

Published Online:18 Jun 2020https://doi.org/10.1287/moor.2019.1031

References

[1] Ames WF, Pachpatte B (1997) Inequalities for Differential and Integral Equations, vol. 197 (Elsevier, Amsterdam).Google Scholar
[2] Auer P, Cesa-Bianchi N, Freund Y, Schapire RE (2002) The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1):48–77.Crossref, Google Scholar
[3] Beggs AW (2005) On the convergence of reinforcement learning. J. Econom. Theory 122(1):1–36.Crossref, Google Scholar
[4] Benkard CL (2000) Learning and forgetting: The dynamics of aircraft production. Amer. Econom. Rev. 90(4):1034–1054.Crossref, Google Scholar
[5] Benveniste A, Métivier M, Priouret P (2012) Adaptive Algorithms and Stochastic Approximations, vol. 22 (Springer, Berlin).Google Scholar
[6] Besbes O, Gur Y, Zeevi A (2015) Non-stationary stochastic optimization. Oper. Res. 63(5):1227–1244.Link, Google Scholar
[7] Bramson M (1998) State space collapse with application to heavy traffic limits for multiclass queueing networks. Queueing Systems 30(1–2):89–140.Crossref, Google Scholar
[8] Coddington EA, Levinson N (1955) Theory of Ordinary Differential Equations (Krieger, London).Google Scholar
[9] Coutin L, Decreusefond L, Dhersin J-S (2010) A Markov model for the spread of viruses in an open population. J. Appl. Probab. 47(4):976–996.Crossref, Google Scholar
[10] Cover T, Hellman M (1970) The two-armed-bandit problem with time-invariant finite memory. IEEE Trans. Inform. Theory 16(2):185–195.Crossref, Google Scholar
[11] Darling R, Norris JR (2008) Differential equation approximations for Markov chains. Probab. Surveys 5:37–79.Crossref, Google Scholar
[12] Ding Y, Nagarajan M, Zhang ZG (2016) Asymptotic analysis of multi-queue service systems with dynamic customer choice. Working paper, University of British Columbia, Vancouver.Google Scholar
[13] Dong J, Yom-Tov E, Yom-Tov GB (2018) The impact of delay announcements on hospital network coordination and waiting times. Management Sci. 65(5):1969–1994.Google Scholar
[14] Ebbinghaus H (1964) Memory: A Contribution to Experimental Psychology (Dover Publications, Mineola, NY).Google Scholar
[15] Erev I, Roth AE (1998) Predicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibria. Amer. Econom. Rev. 88(4):848–881.Google Scholar
[16] Ethier SN, Kurtz TG (2005) Markov Processes: Characterization and Convergence (John Wiley & Sons, Hoboken, NJ).Google Scholar
[17] Evdokimova E, De Turck K, Fiems D (2018) Coupled queues with customer impatience. Performance Evaluation 118:33–47.Crossref, Google Scholar
[18] Gamarnik D, Tsitsiklis JN, Zubeldia M (2016) Delay, memory, and messaging tradeoffs in distributed service systems. Proc. ACM SIGMETRICS Internat. Conf. (ACM, New York), 1–12.Google Scholar
[19] Garivier A, Moulines E (2008) On upper-confidence bound policies for non-stationary bandit problems. Preprint, submitted May 22, https://arxiv.org/abs/0805.3415.Google Scholar
[20] Grimmett G, Stirzaker D (2001) Probability and Random Processes (Oxford University Press, Oxford, UK).Crossref, Google Scholar
[21] Harley CB (1981) Learning the evolutionarily stable strategy. J. Theoretical Biology 89(4):611–633.Crossref, Google Scholar
[22] Hassin R, Haviv M (2003) To Queue or Not to Queue: Equilibrium Behavior in Queueing Systems, vol. 59 (Springer Science & Business Media, Berlin).Crossref, Google Scholar
[23] Hellman ME, Cover TM (1970) Learning with finite memory. Ann. Math. Statist. 41(3):765–782.Crossref, Google Scholar
[24] Herrnstein RJ (1970) On the law of effect. J. Experiment. Anal. Behav. 13(2):243–266.Crossref, Google Scholar
[25] Keskin NB, Zeevi A (2016) Chasing demand: Learning and earning in a changing environment. Math. Oper. Res. 42(2):277–307.Link, Google Scholar
[26] Kurtz TG (1970) Solutions of ordinary differential equations as limits of pure jump Markov processes. J. Appl. Probab. 7(1):49–58.Crossref, Google Scholar
[27] Kurtz TG (1978) Strong approximation theorems for density dependent Markov chains. Stochastic Processes Appl. 6(3):223–240.Crossref, Google Scholar
[28] Luce RD (1959) Individual Choice Behavior: A Theoretical Analysis (John Wiley & Sons, Hoboken, NJ).Google Scholar
[29] Mandelbaum A, Massey WA, Reiman MI (1998) Strong approximations for markovian service networks. Queueing Systems 30(1–2):149–201.Crossref, Google Scholar
[30] Massey WA, Whitt W (1998) Uniform acceleration expansions for Markov chains with time-varying rates. Ann. Appl. Probab. 8(4):1130–1155.Crossref, Google Scholar
[31] Massoulié L, Xu K (2018) On the capacity of information processing systems. Oper. Res. 66(2):568–586.Link, Google Scholar
[32] Mertikopoulos P, Sandholm WH (2016) Learning in games via reinforcement and regularization. Math. Oper. Res. 41(4):1297–1324.Link, Google Scholar
[33] Mitzenmacher M, Prabhakar B, Shah D (2002) Load balancing with memory. Proc. IEEE Sympos. Foundations Comput. Sci. (FOCS) (IEEE, Piscataway, NJ), 799–808.Google Scholar
[34] Pemantle R (2007) A survey of random processes with reinforcement. Probab. Surveys 4:1–79.Crossref, Google Scholar
[35] Pender J, Rand RH, Wesson E (2016) Managing information in queues: The impact of giving delayed information to customers. Preprint, submitted September 23, https://arxiv.org/abs/1610.01972.Google Scholar
[36] Perry O, Whitt W (2011) An ode for an overloaded X model involving a stochastic averaging principle. Stochastic Systems 1(1):59–108.Link, Google Scholar
[37] Rustichini A (1999) Optimal properties of stimulus-response learning models. Games Econom. Behav. 29(1–2):244–273.Crossref, Google Scholar
[38] Shwartz A, Weiss A (1995) Large Deviations for Performance Analysis: Queues, Communication and Computing, vol. 5 (CRC Press, Boca Raton, FL).Google Scholar
[39] Train KE (2009) Discrete Choice Methods with Simulation (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
[40] Tsitsiklis JN, Xu K (2012) On the power of (even a little) resource pooling. Stochastics Systems 2(1):1–66.Link, Google Scholar
[41] Van Erven T, Kotl W (2014) Follow the leader with dropout perturbations. PMLR 35:949–974.Google Scholar
[42] Washburn R, Willsky A (1981) Optional sampling of submartingales indexed by partially ordered sets. Ann. Probab. 9(6):957–970.Crossref, Google Scholar
[43] Xu K (2018) Query complexity of Bayesian private learning. NIPS 31:2431–2440.Google Scholar

cover image Mathematics of Operations Research

Volume 45, Issue 4

November 2020

Pages 1193-1620, C2

Article Information

Supplemental Material

Metrics

Information

Received:September 01, 2017
Accepted:July 25, 2019
Published Online:June 18, 2020

Cite as

Kuang Xu, Se-Young Yun (2020) Reinforcement with Fading Memories. Mathematics of Operations Research 45(4):1258-1288.

https://doi.org/10.1287/moor.2019.1031

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Reinforcement with Fading Memories

References

Volume 45, Issue 4

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News