Analytical Solution to a Discrete-Time Model for Dynamic Learning and Decision Making

Published Online:https://doi.org/10.1287/mnsc.2021.4194

References

  • Afèche P, Ata B (2013) Bayesian dynamic pricing in queueing systems with unknown delay cost characteristics. Manufacturing Service Oper. Management 15(2):292–304.LinkGoogle Scholar
  • Aghion P, Bolton P, Harris C, Jullien B (1991) Optimal learning by experimentation. Rev. Econom. Stud. 58:621–654.CrossrefGoogle Scholar
  • Agrawal S, Goyal N (2013) Thompson sampling for contextual bandits with linear payoffs. Proc. 30th Internat. Conf. on Machine Learn (PMLR), 127–135.Google Scholar
  • Alizamir S, de Véricourt F, Sun P (2013) Diagnostic accuracy under congestion. Management Sci. 59(1):157–171.LinkGoogle Scholar
  • Alt B, Schultheis M, Koeppl H (2020) POMDPs in continuous time and discrete spaces. Proc. 34th Conf. on Neural Inform. Processing Systems, NeurIPS, Vancouver, Canada.Google Scholar
  • Araman VF, Caldentey R (2009) Dynamic pricing for nonperishable products with demand learning. Oper. Res. 57(5):1169–1188.LinkGoogle Scholar
  • Araman VF, Caldentey R (2020) Diffusion approximations for a class of sequential experimentation problems. Preprint, submitted November 2, 2019, https://dx.doi.org/10.2139/ssrn.3479676.Google Scholar
  • Arrow KJ, Blackwell D, Girshick MA (1949) Bayes and minimax solutions of sequential decision problems. Econometrica 17:213–244.CrossrefGoogle Scholar
  • Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2/3):235–256.CrossrefGoogle Scholar
  • Aurentz JL, Mach T, Vandebril R, Watkins DS (2015) Fast and backward stable computation of roots of polynomials. SIAM J. Matrix Anal. Appl. 36(3):942–973.CrossrefGoogle Scholar
  • Aviv Y, Pazgal A (2005) A partially observed Markov decision process for dynamic pricing. Management Sci. 51:1400–1416.LinkGoogle Scholar
  • Ayer T, Alagoz O, Stout NK (2012) OR Forum—A POMDP approach to personalize mammography screening decisions. Oper. Res. 60(5):1019–1034.LinkGoogle Scholar
  • Bain A, Crisan D (2008) Fundamentals of Stochastic Filtering (Springer, New York).Google Scholar
  • Bartroff J, Lai TL, Shih M-C (2013) Sequential Experimentation in Clinical Trials: Design and Analysis (Springer, New York).CrossrefGoogle Scholar
  • Bensoussan A (1992) Stochastic Control of Partially Observable Systems (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Berry DA, Fristedt B (1985) Bandit Problems, Sequential Allocation of Experiments (Chapman & Hall, New York).CrossrefGoogle Scholar
  • Bertsekas DP (2012) Approximate dynamic programming, vol. II. Dynamic Programming and Optimal Control, 4th ed. (Athena Scientific, Belmont, MA).Google Scholar
  • Bertsekas DP (2017) Dynamic Programming and Optimal Control, vol. I, 4th ed. (Athena Scientific, Belmont, MA).Google Scholar
  • Bertsimas D, Mersereau AJ (2007) A learning approach for interactive marketing to a customer segment. Oper. Res. 55(6):1120–1135.LinkGoogle Scholar
  • Bertsimas D, Perakis G (2006) Dynamic pricing: A learning approach. Hearn D, Lawphongpanich S, eds. Mathematical and Computational Models for Congestion Charging (Springer, Berlin), 45–79.CrossrefGoogle Scholar
  • Besbes O, Zeevi A (2009) Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Oper. Res. 57(6):1407–1420.LinkGoogle Scholar
  • Bini DA, Fiorentino G (2000) Design, analysis, and implementation of a multiprecision polynomial rootfinder. Numerical Algorithms 23:127–173.CrossrefGoogle Scholar
  • Bini DA, Robol L (2014) Solving secular and polynomial equations: A multiprecision algorithm. J. Comput. Appl. Math. 272:276–292.CrossrefGoogle Scholar
  • Bolton P, Harris C (1999) Strategic experimentation. Econometrica 67:349–374.CrossrefGoogle Scholar
  • Bouneffouf D, Rish I, Aggarwal C (2020) Survey on applications of multi-armed and contextual bandits. Proc. IEEE Congress on Evolutionary Comput, CEC, Glasgow, UK.Google Scholar
  • Broder J, Rusmevichientong P (2012) Dynamic pricing under a general parametric choice model. Oper. Res. 60(4):965–980.LinkGoogle Scholar
  • Burnetas AN, Katehakis MN (1996) Optimal adaptive policies for sequential allocation problems. Adv. Appl. Math. 17(2):122–142.CrossrefGoogle Scholar
  • Caro F, Gallien J (2007) Dynamic assortment with demand learning for seasonal consumer goods. Management Sci. 53(2):276–292.LinkGoogle Scholar
  • Carvalho AX, Puterman ML (2005) Learning and pricing in an Internet environment with binomial demand. J. Revenue Pricing Management 3:320–336.CrossrefGoogle Scholar
  • den Boer AV (2015) Dynamic pricing and learning: Historical origins, current research, and new directions. Survey Oper. Res. Management Sci. 20:1–18.CrossrefGoogle Scholar
  • Easley D, Kiefer NM (1988) Controlling a stochastic process with unknown parameters. Econometrica 56:1045–1064.CrossrefGoogle Scholar
  • Elaydi S (2005) An Introduction to Difference Equations, 3rd ed. (Springer, New York).Google Scholar
  • Farias VF, Van Roy B (2010) Dynamic pricing with a prior on market response. Oper. Res. 58(1):16–29.LinkGoogle Scholar
  • Gittins JC (1979) Bandit processes and dynamic allocation indices. J. Royal Statist. Soc. B 14:148–177.Google Scholar
  • Harrison JM, Sunar N (2015) Investment timing with incomplete information and multiple means of learning. Oper. Res. 63(2):442–457.LinkGoogle Scholar
  • Harrison JM, Keskin NB, Zeevi A (2012) Bayesian dynamic pricing policies: Learning and earning under a binary prior distribution. Management Sci. 58:570–586.LinkGoogle Scholar
  • Keller G, Rady S (1999) Optimal experimentation in a changing environment. Rev. Econom. Stud. 66:475–507.CrossrefGoogle Scholar
  • Keskin NB, Zeevi A (2014) Dynamic pricing with an unknown demand model: Asymptotically optimal semi-myopic policies. Oper. Res. 62(5):1142–1167.LinkGoogle Scholar
  • Kwon HD, Lippman SA (2011) Acquisition of project-specific assets with Bayesian updating. Oper. Res. 59:1119–1130.LinkGoogle Scholar
  • Lai TL, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6:4–22.CrossrefGoogle Scholar
  • McLennan A (1984) Price dispersion and incomplete learning in the long run. J. Econom. Dynamic Control 7:331–347.CrossrefGoogle Scholar
  • Moscarini G, Smith L (2001) The optimal level of experimentation. Econometrica 69:1629–1644.CrossrefGoogle Scholar
  • Negoescu DM, Bimpikis K, Brandeau ML, Iancu DA (2018) Dynamic learning of patient response types: An application to treating chronic diseases. Management Sci. 64(8):3469–3488.LinkGoogle Scholar
  • Peskir G, Shiryaev AN (2000) Sequential testing problems for Poisson processes. Ann. Statist. 28:837–859.CrossrefGoogle Scholar
  • Robbins H (1952) Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. (New Series) 58:527–535.CrossrefGoogle Scholar
  • Romberg HF (1972) Continuous sequential testing of a Poisson process to minimize the Bayes risk. J. Amer. Statist. Assoc. 67:921–926.CrossrefGoogle Scholar
  • Rothschild M (1974) A two-armed bandit theory of market pricing. J. Econom. Theory 9:185–202.CrossrefGoogle Scholar
  • Saghafian S, Hopp WJ, Iravani SMR, Cheng Y, Diermeier D (2018) Workload management in telemedical physician triage and other knowledge-based service systems. Management Sci. 64(11):5180–5197.LinkGoogle Scholar
  • Shiryaev AN (1967) Two problems of sequential analysis. Cybernet. Systems Anal. 3:63–69.CrossrefGoogle Scholar
  • Smallwood RD, Sondik EJ (1973) The optimal control of partially observable Markov processes over a finite horizon. Oper. Res. 21:1071–1088.LinkGoogle Scholar
  • Sondik EJ (1978) The optimal control of partially observable Markov processes over the infinite horizon: Discounted costs. Oper. Res. 26:282–304.LinkGoogle Scholar
  • Sutton RS, Barto AG (2018) Reinforcement Learning: An Introduction, 2nd ed. (MIT Press, Cambridge, MA).Google Scholar
  • Tartakovsky A, Nikiforov I, Basseville M (2014) Sequential Analysis: Hypothesis Testing and Changepoint Detection (Chapman and Hall/CRC, Boca Raton, FL).CrossrefGoogle Scholar
  • Tsoukalas A, Albertson T, Tagkopoulos I (2015) From data to optimal decision making: A data-driven, probabilistic machine learning approach to decision support for patients with sepsis. JMIR Medical Inform. 3(1):e11.CrossrefGoogle Scholar
  • Villar SS, Bowden J, Wason J (2015) Multi-armed bandit models for the optimal design of clinical trials: Benefits and challenges. Statist. Sci. 30(2):199–215.CrossrefGoogle Scholar
  • Wald A (1945) Sequential tests of statistical hypotheses. Ann. Math. Statist. 16:117–186.CrossrefGoogle Scholar
  • Wald A (1947) Foundations of a general theory of sequential decision functions. Econometrica 15:279–313.CrossrefGoogle Scholar
  • Wald A, Wolfowitz J (1948) Optimum character of the sequential probability ratio test. Ann. Math. Statist. 19:326–339.CrossrefGoogle Scholar
  • Wald A, Wolfowitz J (1950) Bayes solutions of sequential decision problems. Ann. Math. Statist. 21:82–99.CrossrefGoogle Scholar
  • Wang G, Xiong J, Zhang S (2016) Partially observable stochastic optimal control. Internat. J. Numerical Anal. Modeling 13(3):493–512.Google Scholar
  • Whitehead J (1997) The Design and Analysis of Sequential Clinical Trials, 2nd ed. (John Wiley & Sons, West Sussex, UK).CrossrefGoogle Scholar
  • William H (2009) Bandit solutions provide unified ethical models for randomized clinical trials and comparative effectiveness research. Proc. National Acad. Sci. USA 106(52):22387–22392.CrossrefGoogle Scholar
  • Zhang H (2010) Partially observable Markov decision processes: A geometric technique and analysis. Oper. Res. 58:214–228.LinkGoogle Scholar
  • Zhang H (2021) Dynamic learning and decision making via basis weight vectors. Preprint, submitted September 1, 2020, https://dx.doi.org/10.2139/ssrn.3679048.Google Scholar
  • Zhang J, Denton BT, Balasubramanian H, Shah ND, Inman BA (2012) Optimization of prostate biopsy referral decisions. Manufacturing Service Oper. Management 14(4):529–547.LinkGoogle Scholar
  • Ziegler GM (1995) Lectures on Polytopes (Graduate Texts in Mathematics) (Springer-Verlag, New York).CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.