Dynamic Learning and Decision Making via Basis Weight Vectors
Published Online:9 Feb 2022https://doi.org/10.1287/opre.2021.2240
References
- (2011) Improved algorithms for linear stochastic bandits. Adv. Neural Inform. Processing Systems, vol. 24 (Curran Associates, Red Hook, NY), 2312–2320.Google Scholar
- (2013) Thompson sampling for contextual bandits with linear payoffs. Proc. 30th Internat. Conf. Machine Learning, Atlanta, June 17–19, 127–135.Google Scholar
- (2009) Dynamic pricing for nonperishable products with demand learning. Oper. Res. 57(5):1169–1188.Link, Google Scholar
- (2002) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learning Res. 3:397–422.Google Scholar
- (2005) A partially observed Markov decision process for dynamic pricing. Management Sci. 51(9):1400–1416.Link, Google Scholar
- (2012) Approximate Dynamic Programming, Volume II: Dynamic Programming and Optimal Control, 4th ed. (Athena Scientific, Belmont, MA).Google Scholar
- (2017) Dynamic Programming and Optimal Control, vol. I, 4th ed. (Athena Scientific, Belmont, MA).Google Scholar
- (2007) A learning approach for interactive marketing to a customer segment. Oper. Res. 55(6):1120–1135.Link, Google Scholar
- (2006) Dynamic pricing: A learning approach. Lawphongpanich S, Hearn DW, Smith MJ, eds. Mathematical and Computational Models for Congestion Charging (Springer, Berlin).Crossref, Google Scholar
- (2009) Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Oper. Res. 57(6):1407–1420.Link, Google Scholar
- (2012) Dynamic pricing under a general parametric choice model. Oper. Res. 60(4):965–980.Link, Google Scholar
- (2014) Information relaxations, duality, and convex stochastic dynamic programs. Oper. Res. 62(6):1394–1415.Link, Google Scholar
- (2010) Information relaxations and duality in stochastic dynamic programs. Oper. Res. 58(4, part 1):785–801.Link, Google Scholar
- (2007) Dynamic assortment with demand learning for seasonal consumer goods. Management Sci. 53(2):276–292.Link, Google Scholar
- (2005) Learning and pricing in an internet environment with binomial demand. J. Revenue Pricing Management 3:320–336.Crossref, Google Scholar
- (1994) Acting optimally in partially observable stochastic domains. Proc. 12th National Conf. Artificial Intelligence (AAAI Press, Menlo Park, CA), 1023–1028.Google Scholar
- (1988) Algorithms for partially observable Markov decision processes. Unpublished doctoral dissertation, University of British Columbia.Google Scholar
- (2011) Contextual bandits with linear payoff functions. Proc. 14th Internat. Conf. Artificial Intelligence Statist., Fort Lauderdale, FL, April 11–13, 208–214.Google Scholar
- (2008) Stochastic linear optimization under bandit feedback. 21st Annual Conference on Learning Theory, Helsinki, Finland, July 9–12, 355–366.Google Scholar
- (2014) Simultaneously learning and optimizing using controlled variance pricing. Management Sci. 60(3):770–783.Link, Google Scholar
- (1988) Controlling a stochastic process with unknown parameters. Econometrica 56(5):1045–1064.Crossref, Google Scholar
- (2010) Dynamic pricing with a prior on market response. Oper. Res. 58(1):16–29.Link, Google Scholar
- (2004) Region-based incremental pruning for POMDPs. Proc. 20th Conf. Uncertainty Artificial Intelligence (Morgan Kaufmann, San Francisco), 146–153.Google Scholar
- (2010) Parametric bandits: The Generalized linear case. Proc. 23rd Internat. Conf. Neural Inform. Processing Systems, vol. 1 (Curran Associates, Red Hook, NY), 586–594.Google Scholar
- (1979) Bandit processes and dynamic allocation indices. J. Royal Statist. Soc. B 41(2):148–177. Crossref, Google Scholar
- (2015) Investment timing with incomplete information and multiple means of learning. Oper. Res. 63(2):442–457.Link, Google Scholar
- (2012) Bayesian dynamic pricing policies: Learning and earning under a binary prior distribution. Management Sci. 58(3):570–586.Link, Google Scholar
- (1995) Continuous Univariate Distributions, vol. 2, 2nd ed. (Wiley, Hoboken, NJ).Google Scholar
- (1999) Optimal experimentation in a changing environment. Rev. Econom. Stud. 66(3):475–507.Crossref, Google Scholar
- (2019) Dynamic selling mechanisms for product differentiation and learning. Oper. Res. 67(4):1069–1089.Abstract, Google Scholar
- (2018) On incomplete learning and certainty-equivalence control. Oper. Res. 66(4):1136–1167.Link, Google Scholar
- (2016) Partially Observable Markov Decision Processes: From Filtering to Controlled Sensing (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- (2010) A contextual-bandit approach to personalized news article recommendation. Proc. 19th Internat. Conf. World Wide Web (ACM, New York), 661–670.Google Scholar
- (1991) A survey of algorithmic methods for partially observed Markov decision processes. Ann. Oper. Res. 28:47–65.Crossref, Google Scholar
- (1984) Price dispersion and incomplete learning in the long run. J. Econom. Dynam. Control 7(3):331–347.Crossref, Google Scholar
- (1982) A survey of partially observable Markov decision processes: Theory, models and algorithms. Management Sci. 28(1):1–16.Link, Google Scholar
- (2005) Exploiting structure to efficiently solve large scale partially observable Markov decision processes. Unpublished doctoral dissertation, University of Toronto.Google Scholar
- (2011) Approximate Dynamic Programming: Solving the Curses of Dimensionality, 2nd ed. (Wiley, Hoboken, NJ).Crossref, Google Scholar
- (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming (Wiley, New York).Crossref, Google Scholar
- (1952) Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. (N.S.) 58(5):527–535.Crossref, Google Scholar
- (1974) A two-armed bandit theory of market pricing. J. Econom. Theory 9(2):185–202.Crossref, Google Scholar
- (2010) Linearly parameterized bandits. Math. Oper. Res. 35(2):395–411.Link, Google Scholar
- (1973) The optimal control of partially observable Markov processes over a finite horizon. Oper. Res. 21(5):1071–1088.Link, Google Scholar
- (2018) Reinforcement Learning: An Introduction, 2nd ed. (MIT Press, Cambridge, MA). Google Scholar
- (1945) Sequential tests of statistical hypotheses. Ann. Math. Statist. 16(2):117–186.Crossref, Google Scholar
- (2010) Implementation and parallelization of a reverse-search algorithm for Minkowski sums. Proc. 12th Workshop Algorithm Engrg. Experiments (SIAM, Philadelphia), 34–42.Google Scholar
- (1991) A survey of solution techniques for the partially observed decision process. Ann. Oper. Res. 32:215–230.Crossref, Google Scholar
- (2010) Partially observable Markov decision processes: A geometric technique and analysis. Oper. Res. 58(1):214–228.Link, Google Scholar
- (2022) Analytical solution to a discrete-time model for dynamic learning and decision making. Management Sci., ePub ahead of print February 1, https://doi.org/10.1287/mnsc.2021.4194.Link, Google Scholar
- (1996) Planning in stochastic domains: Problem characteristics and approximation. Technical Report HKUST-CS96-31, Hong Kong University of Science and Technology.Google Scholar

