Dynamic Learning and Decision Making via Basis Weight Vectors

Published Online:https://doi.org/10.1287/opre.2021.2240

References

  • Abbasi-Yadkori Y, Pál D, Szepesvári C (2011) Improved algorithms for linear stochastic bandits. Adv. Neural Inform. Processing Systems, vol. 24 (Curran Associates, Red Hook, NY), 2312–2320.Google Scholar
  • Agrawal S, Goyal N (2013) Thompson sampling for contextual bandits with linear payoffs. Proc. 30th Internat. Conf. Machine Learning, Atlanta, June 17–19, 127–135.Google Scholar
  • Araman VF, Caldentey R (2009) Dynamic pricing for nonperishable products with demand learning. Oper. Res. 57(5):1169–1188.LinkGoogle Scholar
  • Auer P (2002) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learning Res. 3:397–422.Google Scholar
  • Aviv Y, Pazgal A (2005) A partially observed Markov decision process for dynamic pricing. Management Sci. 51(9):1400–1416.LinkGoogle Scholar
  • Bertsekas DP (2012) Approximate Dynamic Programming, Volume II: Dynamic Programming and Optimal Control, 4th ed. (Athena Scientific, Belmont, MA).Google Scholar
  • Bertsekas DP (2017) Dynamic Programming and Optimal Control, vol. I, 4th ed. (Athena Scientific, Belmont, MA).Google Scholar
  • Bertsimas D, Mersereau AJ (2007) A learning approach for interactive marketing to a customer segment. Oper. Res. 55(6):1120–1135.LinkGoogle Scholar
  • Bertsimas D, Perakis G (2006) Dynamic pricing: A learning approach. Lawphongpanich S, Hearn DW, Smith MJ, eds. Mathematical and Computational Models for Congestion Charging (Springer, Berlin).CrossrefGoogle Scholar
  • Besbes O, Zeevi A (2009) Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Oper. Res. 57(6):1407–1420.LinkGoogle Scholar
  • Broder J, Rusmevichientong P (2012) Dynamic pricing under a general parametric choice model. Oper. Res. 60(4):965–980.LinkGoogle Scholar
  • Brown DB, Smith JE (2014) Information relaxations, duality, and convex stochastic dynamic programs. Oper. Res. 62(6):1394–1415.LinkGoogle Scholar
  • Brown DB, Smith JE, Sun P (2010) Information relaxations and duality in stochastic dynamic programs. Oper. Res. 58(4, part 1):785–801.LinkGoogle Scholar
  • Caro F, Gallien J (2007) Dynamic assortment with demand learning for seasonal consumer goods. Management Sci. 53(2):276–292.LinkGoogle Scholar
  • Carvalho AX, Puterman ML (2005) Learning and pricing in an internet environment with binomial demand. J. Revenue Pricing Management 3:320–336.CrossrefGoogle Scholar
  • Cassandra AR, Kaelbling LP, Littman ML (1994) Acting optimally in partially observable stochastic domains. Proc. 12th National Conf. Artificial Intelligence (AAAI Press, Menlo Park, CA), 1023–1028.Google Scholar
  • Cheng HT (1988) Algorithms for partially observable Markov decision processes. Unpublished doctoral dissertation, University of British Columbia.Google Scholar
  • Chu W, Li L, Reyzin L, Schapire RE (2011) Contextual bandits with linear payoff functions. Proc. 14th Internat. Conf. Artificial Intelligence Statist., Fort Lauderdale, FL, April 11–13, 208–214.Google Scholar
  • Dani V, Hayes TP, Kakade SM (2008) Stochastic linear optimization under bandit feedback. 21st Annual Conference on Learning Theory, Helsinki, Finland, July 9–12, 355–366.Google Scholar
  • den Boer AV, Zwart B (2014) Simultaneously learning and optimizing using controlled variance pricing. Management Sci. 60(3):770–783.LinkGoogle Scholar
  • Easley D, Kiefer NM (1988) Controlling a stochastic process with unknown parameters. Econometrica 56(5):1045–1064.CrossrefGoogle Scholar
  • Farias VF, Van Roy B (2010) Dynamic pricing with a prior on market response. Oper. Res. 58(1):16–29.LinkGoogle Scholar
  • Feng Z, Zilberstein S (2004) Region-based incremental pruning for POMDPs. Proc. 20th Conf. Uncertainty Artificial Intelligence (Morgan Kaufmann, San Francisco), 146–153.Google Scholar
  • Filippi S, Cappé O, Garivier A, Szepesvári C (2010) Parametric bandits: The Generalized linear case. Proc. 23rd Internat. Conf. Neural Inform. Processing Systems, vol. 1 (Curran Associates, Red Hook, NY), 586–594.Google Scholar
  • Gittins JC (1979) Bandit processes and dynamic allocation indices. J. Royal Statist. Soc. B 41(2):148–177. CrossrefGoogle Scholar
  • Harrison JM, Sunar N (2015) Investment timing with incomplete information and multiple means of learning. Oper. Res. 63(2):442–457.LinkGoogle Scholar
  • Harrison JM, Keskin NB, Zeevi A (2012) Bayesian dynamic pricing policies: Learning and earning under a binary prior distribution. Management Sci. 58(3):570–586.LinkGoogle Scholar
  • Johnson NL, Kotz S, Balakrishnan N (1995) Continuous Univariate Distributions, vol. 2, 2nd ed. (Wiley, Hoboken, NJ).Google Scholar
  • Keller G, Rady S (1999) Optimal experimentation in a changing environment. Rev. Econom. Stud. 66(3):475–507.CrossrefGoogle Scholar
  • Keskin NB, Birge JR (2019) Dynamic selling mechanisms for product differentiation and learning. Oper. Res. 67(4):1069–1089.AbstractGoogle Scholar
  • Keskin NB, Zeevi A (2018) On incomplete learning and certainty-equivalence control. Oper. Res. 66(4):1136–1167.LinkGoogle Scholar
  • Krishnamurthy V (2016) Partially Observable Markov Decision Processes: From Filtering to Controlled Sensing (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Li L, Chu W, Langford J, Schapire RE (2010) A contextual-bandit approach to personalized news article recommendation. Proc. 19th Internat. Conf. World Wide Web (ACM, New York), 661–670.Google Scholar
  • Lovejoy WS (1991) A survey of algorithmic methods for partially observed Markov decision processes. Ann. Oper. Res. 28:47–65.CrossrefGoogle Scholar
  • McLennan A (1984) Price dispersion and incomplete learning in the long run. J. Econom. Dynam. Control 7(3):331–347.CrossrefGoogle Scholar
  • Monahan GE (1982) A survey of partially observable Markov decision processes: Theory, models and algorithms. Management Sci. 28(1):1–16.LinkGoogle Scholar
  • Poupart P (2005) Exploiting structure to efficiently solve large scale partially observable Markov decision processes. Unpublished doctoral dissertation, University of Toronto.Google Scholar
  • Powell WB (2011) Approximate Dynamic Programming: Solving the Curses of Dimensionality, 2nd ed. (Wiley, Hoboken, NJ).CrossrefGoogle Scholar
  • Puterman ML (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming (Wiley, New York).CrossrefGoogle Scholar
  • Robbins H (1952) Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. (N.S.) 58(5):527–535.CrossrefGoogle Scholar
  • Rothschild M (1974) A two-armed bandit theory of market pricing. J. Econom. Theory 9(2):185–202.CrossrefGoogle Scholar
  • Rusmevichientong P, Tsitsiklis JN (2010) Linearly parameterized bandits. Math. Oper. Res. 35(2):395–411.LinkGoogle Scholar
  • Smallwood RD, Sondik EJ (1973) The optimal control of partially observable Markov processes over a finite horizon. Oper. Res. 21(5):1071–1088.LinkGoogle Scholar
  • Sutton RS, Barto AG (2018) Reinforcement Learning: An Introduction, 2nd ed. (MIT Press, Cambridge, MA). Google Scholar
  • Wald A (1945) Sequential tests of statistical hypotheses. Ann. Math. Statist. 16(2):117–186.CrossrefGoogle Scholar
  • Weibel C (2010) Implementation and parallelization of a reverse-search algorithm for Minkowski sums. Proc. 12th Workshop Algorithm Engrg. Experiments (SIAM, Philadelphia), 34–42.Google Scholar
  • White CC (1991) A survey of solution techniques for the partially observed decision process. Ann. Oper. Res. 32:215–230.CrossrefGoogle Scholar
  • Zhang H (2010) Partially observable Markov decision processes: A geometric technique and analysis. Oper. Res. 58(1):214–228.LinkGoogle Scholar
  • Zhang H (2022) Analytical solution to a discrete-time model for dynamic learning and decision making. Management Sci., ePub ahead of print February 1, https://doi.org/10.1287/mnsc.2021.4194.LinkGoogle Scholar
  • Zhang NL, Liu W (1996) Planning in stochastic domains: Problem characteristics and approximation. Technical Report HKUST-CS96-31, Hong Kong University of Science and Technology.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.