Analytical Solution to a Discrete-Time Model for Dynamic Learning and Decision Making
Published Online:1 Feb 2022https://doi.org/10.1287/mnsc.2021.4194
References
- (2013) Bayesian dynamic pricing in queueing systems with unknown delay cost characteristics. Manufacturing Service Oper. Management 15(2):292–304.Link, Google Scholar
- (1991) Optimal learning by experimentation. Rev. Econom. Stud. 58:621–654.Crossref, Google Scholar
- (2013) Thompson sampling for contextual bandits with linear payoffs. Proc. 30th Internat. Conf. on Machine Learn (PMLR), 127–135.Google Scholar
- (2013) Diagnostic accuracy under congestion. Management Sci. 59(1):157–171.Link, Google Scholar
- (2020) POMDPs in continuous time and discrete spaces. Proc. 34th Conf. on Neural Inform. Processing Systems, NeurIPS, Vancouver, Canada.Google Scholar
- (2009) Dynamic pricing for nonperishable products with demand learning. Oper. Res. 57(5):1169–1188.Link, Google Scholar
- (2020) Diffusion approximations for a class of sequential experimentation problems. Preprint, submitted November 2, 2019, https://dx.doi.org/10.2139/ssrn.3479676.Google Scholar
- (1949) Bayes and minimax solutions of sequential decision problems. Econometrica 17:213–244.Crossref, Google Scholar
- (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2/3):235–256.Crossref, Google Scholar
- (2015) Fast and backward stable computation of roots of polynomials. SIAM J. Matrix Anal. Appl. 36(3):942–973.Crossref, Google Scholar
- (2005) A partially observed Markov decision process for dynamic pricing. Management Sci. 51:1400–1416.Link, Google Scholar
- (2012) OR Forum—A POMDP approach to personalize mammography screening decisions. Oper. Res. 60(5):1019–1034.Link, Google Scholar
- (2008) Fundamentals of Stochastic Filtering (Springer, New York).Google Scholar
- (2013) Sequential Experimentation in Clinical Trials: Design and Analysis (Springer, New York).Crossref, Google Scholar
- (1992) Stochastic Control of Partially Observable Systems (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- (1985) Bandit Problems, Sequential Allocation of Experiments (Chapman & Hall, New York).Crossref, Google Scholar
- (2012) Approximate dynamic programming, vol. II. Dynamic Programming and Optimal Control, 4th ed. (Athena Scientific, Belmont, MA).Google Scholar
- (2017) Dynamic Programming and Optimal Control, vol. I, 4th ed. (Athena Scientific, Belmont, MA).Google Scholar
- (2007) A learning approach for interactive marketing to a customer segment. Oper. Res. 55(6):1120–1135.Link, Google Scholar
- (2006) Dynamic pricing: A learning approach. Hearn D, Lawphongpanich S, eds. Mathematical and Computational Models for Congestion Charging (Springer, Berlin), 45–79.Crossref, Google Scholar
- (2009) Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Oper. Res. 57(6):1407–1420.Link, Google Scholar
- (2000) Design, analysis, and implementation of a multiprecision polynomial rootfinder. Numerical Algorithms 23:127–173.Crossref, Google Scholar
- (2014) Solving secular and polynomial equations: A multiprecision algorithm. J. Comput. Appl. Math. 272:276–292.Crossref, Google Scholar
- (1999) Strategic experimentation. Econometrica 67:349–374.Crossref, Google Scholar
- (2020) Survey on applications of multi-armed and contextual bandits. Proc. IEEE Congress on Evolutionary Comput, CEC, Glasgow, UK.Google Scholar
- (2012) Dynamic pricing under a general parametric choice model. Oper. Res. 60(4):965–980.Link, Google Scholar
- (1996) Optimal adaptive policies for sequential allocation problems. Adv. Appl. Math. 17(2):122–142.Crossref, Google Scholar
- (2007) Dynamic assortment with demand learning for seasonal consumer goods. Management Sci. 53(2):276–292.Link, Google Scholar
- (2005) Learning and pricing in an Internet environment with binomial demand. J. Revenue Pricing Management 3:320–336.Crossref, Google Scholar
- (2015) Dynamic pricing and learning: Historical origins, current research, and new directions. Survey Oper. Res. Management Sci. 20:1–18.Crossref, Google Scholar
- (1988) Controlling a stochastic process with unknown parameters. Econometrica 56:1045–1064.Crossref, Google Scholar
- (2005) An Introduction to Difference Equations, 3rd ed. (Springer, New York).Google Scholar
- (2010) Dynamic pricing with a prior on market response. Oper. Res. 58(1):16–29.Link, Google Scholar
- (1979) Bandit processes and dynamic allocation indices. J. Royal Statist. Soc. B 14:148–177.Google Scholar
- (2015) Investment timing with incomplete information and multiple means of learning. Oper. Res. 63(2):442–457.Link, Google Scholar
- (2012) Bayesian dynamic pricing policies: Learning and earning under a binary prior distribution. Management Sci. 58:570–586.Link, Google Scholar
- (1999) Optimal experimentation in a changing environment. Rev. Econom. Stud. 66:475–507.Crossref, Google Scholar
- (2014) Dynamic pricing with an unknown demand model: Asymptotically optimal semi-myopic policies. Oper. Res. 62(5):1142–1167.Link, Google Scholar
- (2011) Acquisition of project-specific assets with Bayesian updating. Oper. Res. 59:1119–1130.Link, Google Scholar
- (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6:4–22.Crossref, Google Scholar
- (1984) Price dispersion and incomplete learning in the long run. J. Econom. Dynamic Control 7:331–347.Crossref, Google Scholar
- (2001) The optimal level of experimentation. Econometrica 69:1629–1644.Crossref, Google Scholar
- (2018) Dynamic learning of patient response types: An application to treating chronic diseases. Management Sci. 64(8):3469–3488.Link, Google Scholar
- (2000) Sequential testing problems for Poisson processes. Ann. Statist. 28:837–859.Crossref, Google Scholar
- (1952) Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. (New Series) 58:527–535.Crossref, Google Scholar
- (1972) Continuous sequential testing of a Poisson process to minimize the Bayes risk. J. Amer. Statist. Assoc. 67:921–926.Crossref, Google Scholar
- (1974) A two-armed bandit theory of market pricing. J. Econom. Theory 9:185–202.Crossref, Google Scholar
- (2018) Workload management in telemedical physician triage and other knowledge-based service systems. Management Sci. 64(11):5180–5197.Link, Google Scholar
- (1967) Two problems of sequential analysis. Cybernet. Systems Anal. 3:63–69.Crossref, Google Scholar
- (1973) The optimal control of partially observable Markov processes over a finite horizon. Oper. Res. 21:1071–1088.Link, Google Scholar
- (1978) The optimal control of partially observable Markov processes over the infinite horizon: Discounted costs. Oper. Res. 26:282–304.Link, Google Scholar
- (2018) Reinforcement Learning: An Introduction, 2nd ed. (MIT Press, Cambridge, MA).Google Scholar
- (2014) Sequential Analysis: Hypothesis Testing and Changepoint Detection (Chapman and Hall/CRC, Boca Raton, FL).Crossref, Google Scholar
- (2015) From data to optimal decision making: A data-driven, probabilistic machine learning approach to decision support for patients with sepsis. JMIR Medical Inform. 3(1):e11.Crossref, Google Scholar
- (2015) Multi-armed bandit models for the optimal design of clinical trials: Benefits and challenges. Statist. Sci. 30(2):199–215.Crossref, Google Scholar
- (1945) Sequential tests of statistical hypotheses. Ann. Math. Statist. 16:117–186.Crossref, Google Scholar
- (1947) Foundations of a general theory of sequential decision functions. Econometrica 15:279–313.Crossref, Google Scholar
- (1948) Optimum character of the sequential probability ratio test. Ann. Math. Statist. 19:326–339.Crossref, Google Scholar
- (1950) Bayes solutions of sequential decision problems. Ann. Math. Statist. 21:82–99.Crossref, Google Scholar
- (2016) Partially observable stochastic optimal control. Internat. J. Numerical Anal. Modeling 13(3):493–512.Google Scholar
- (1997) The Design and Analysis of Sequential Clinical Trials, 2nd ed. (John Wiley & Sons, West Sussex, UK).Crossref, Google Scholar
- (2009) Bandit solutions provide unified ethical models for randomized clinical trials and comparative effectiveness research. Proc. National Acad. Sci. USA 106(52):22387–22392.Crossref, Google Scholar
- (2010) Partially observable Markov decision processes: A geometric technique and analysis. Oper. Res. 58:214–228.Link, Google Scholar
- (2021) Dynamic learning and decision making via basis weight vectors. Preprint, submitted September 1, 2020, https://dx.doi.org/10.2139/ssrn.3679048.Google Scholar
- (2012) Optimization of prostate biopsy referral decisions. Manufacturing Service Oper. Management 14(4):529–547.Link, Google Scholar
- (1995) Lectures on Polytopes (Graduate Texts in Mathematics) (Springer-Verlag, New York).Crossref, Google Scholar

