Dynamic Learning and Decision Making via Basis Weight Vectors

Hao Zhang
Hao Zhang
[email protected]
https://orcid.org/0000-0002-5078-9252
Sauder School of Business, University of British Columbia, Vancouver, British Columbia V6T 1Z2, Canada
Search for more papers by this author

Sauder School of Business, University of British Columbia, Vancouver, British Columbia V6T 1Z2, Canada

Search for more papers by this author

Published Online:9 Feb 2022https://doi.org/10.1287/opre.2021.2240

References

Abbasi-Yadkori Y, Pál D, Szepesvári C (2011) Improved algorithms for linear stochastic bandits. Adv. Neural Inform. Processing Systems, vol. 24 (Curran Associates, Red Hook, NY), 2312–2320.Google Scholar
Agrawal S, Goyal N (2013) Thompson sampling for contextual bandits with linear payoffs. Proc. 30th Internat. Conf. Machine Learning, Atlanta, June 17–19, 127–135.Google Scholar
Araman VF, Caldentey R (2009) Dynamic pricing for nonperishable products with demand learning. Oper. Res. 57(5):1169–1188.Link, Google Scholar
Auer P (2002) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learning Res. 3:397–422.Google Scholar
Aviv Y, Pazgal A (2005) A partially observed Markov decision process for dynamic pricing. Management Sci. 51(9):1400–1416.Link, Google Scholar
Bertsekas DP (2012) Approximate Dynamic Programming, Volume II: Dynamic Programming and Optimal Control, 4th ed. (Athena Scientific, Belmont, MA).Google Scholar
Bertsekas DP (2017) Dynamic Programming and Optimal Control, vol. I, 4th ed. (Athena Scientific, Belmont, MA).Google Scholar
Bertsimas D, Mersereau AJ (2007) A learning approach for interactive marketing to a customer segment. Oper. Res. 55(6):1120–1135.Link, Google Scholar
Bertsimas D, Perakis G (2006) Dynamic pricing: A learning approach. Lawphongpanich S, Hearn DW, Smith MJ, eds. Mathematical and Computational Models for Congestion Charging (Springer, Berlin).Crossref, Google Scholar
Besbes O, Zeevi A (2009) Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Oper. Res. 57(6):1407–1420.Link, Google Scholar
Broder J, Rusmevichientong P (2012) Dynamic pricing under a general parametric choice model. Oper. Res. 60(4):965–980.Link, Google Scholar
Brown DB, Smith JE (2014) Information relaxations, duality, and convex stochastic dynamic programs. Oper. Res. 62(6):1394–1415.Link, Google Scholar
Brown DB, Smith JE, Sun P (2010) Information relaxations and duality in stochastic dynamic programs. Oper. Res. 58(4, part 1):785–801.Link, Google Scholar
Caro F, Gallien J (2007) Dynamic assortment with demand learning for seasonal consumer goods. Management Sci. 53(2):276–292.Link, Google Scholar
Carvalho AX, Puterman ML (2005) Learning and pricing in an internet environment with binomial demand. J. Revenue Pricing Management 3:320–336.Crossref, Google Scholar
Cassandra AR, Kaelbling LP, Littman ML (1994) Acting optimally in partially observable stochastic domains. Proc. 12th National Conf. Artificial Intelligence (AAAI Press, Menlo Park, CA), 1023–1028.Google Scholar
Cheng HT (1988) Algorithms for partially observable Markov decision processes. Unpublished doctoral dissertation, University of British Columbia.Google Scholar
Chu W, Li L, Reyzin L, Schapire RE (2011) Contextual bandits with linear payoff functions. Proc. 14th Internat. Conf. Artificial Intelligence Statist., Fort Lauderdale, FL, April 11–13, 208–214.Google Scholar
Dani V, Hayes TP, Kakade SM (2008) Stochastic linear optimization under bandit feedback. 21st Annual Conference on Learning Theory, Helsinki, Finland, July 9–12, 355–366.Google Scholar
den Boer AV, Zwart B (2014) Simultaneously learning and optimizing using controlled variance pricing. Management Sci. 60(3):770–783.Link, Google Scholar
Easley D, Kiefer NM (1988) Controlling a stochastic process with unknown parameters. Econometrica 56(5):1045–1064.Crossref, Google Scholar
Farias VF, Van Roy B (2010) Dynamic pricing with a prior on market response. Oper. Res. 58(1):16–29.Link, Google Scholar
Feng Z, Zilberstein S (2004) Region-based incremental pruning for POMDPs. Proc. 20th Conf. Uncertainty Artificial Intelligence (Morgan Kaufmann, San Francisco), 146–153.Google Scholar
Filippi S, Cappé O, Garivier A, Szepesvári C (2010) Parametric bandits: The Generalized linear case. Proc. 23rd Internat. Conf. Neural Inform. Processing Systems, vol. 1 (Curran Associates, Red Hook, NY), 586–594.Google Scholar
Gittins JC (1979) Bandit processes and dynamic allocation indices. J. Royal Statist. Soc. B 41(2):148–177. Crossref, Google Scholar
Harrison JM, Sunar N (2015) Investment timing with incomplete information and multiple means of learning. Oper. Res. 63(2):442–457.Link, Google Scholar
Harrison JM, Keskin NB, Zeevi A (2012) Bayesian dynamic pricing policies: Learning and earning under a binary prior distribution. Management Sci. 58(3):570–586.Link, Google Scholar
Johnson NL, Kotz S, Balakrishnan N (1995) Continuous Univariate Distributions, vol. 2, 2nd ed. (Wiley, Hoboken, NJ).Google Scholar
Keller G, Rady S (1999) Optimal experimentation in a changing environment. Rev. Econom. Stud. 66(3):475–507.Crossref, Google Scholar
Keskin NB, Birge JR (2019) Dynamic selling mechanisms for product differentiation and learning. Oper. Res. 67(4):1069–1089.Abstract, Google Scholar
Keskin NB, Zeevi A (2018) On incomplete learning and certainty-equivalence control. Oper. Res. 66(4):1136–1167.Link, Google Scholar
Krishnamurthy V (2016) Partially Observable Markov Decision Processes: From Filtering to Controlled Sensing (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Li L, Chu W, Langford J, Schapire RE (2010) A contextual-bandit approach to personalized news article recommendation. Proc. 19th Internat. Conf. World Wide Web (ACM, New York), 661–670.Google Scholar
Lovejoy WS (1991) A survey of algorithmic methods for partially observed Markov decision processes. Ann. Oper. Res. 28:47–65.Crossref, Google Scholar
McLennan A (1984) Price dispersion and incomplete learning in the long run. J. Econom. Dynam. Control 7(3):331–347.Crossref, Google Scholar
Monahan GE (1982) A survey of partially observable Markov decision processes: Theory, models and algorithms. Management Sci. 28(1):1–16.Link, Google Scholar
Poupart P (2005) Exploiting structure to efficiently solve large scale partially observable Markov decision processes. Unpublished doctoral dissertation, University of Toronto.Google Scholar
Powell WB (2011) Approximate Dynamic Programming: Solving the Curses of Dimensionality, 2nd ed. (Wiley, Hoboken, NJ).Crossref, Google Scholar
Puterman ML (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming (Wiley, New York).Crossref, Google Scholar
Robbins H (1952) Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. (N.S.) 58(5):527–535.Crossref, Google Scholar
Rothschild M (1974) A two-armed bandit theory of market pricing. J. Econom. Theory 9(2):185–202.Crossref, Google Scholar
Rusmevichientong P, Tsitsiklis JN (2010) Linearly parameterized bandits. Math. Oper. Res. 35(2):395–411.Link, Google Scholar
Smallwood RD, Sondik EJ (1973) The optimal control of partially observable Markov processes over a finite horizon. Oper. Res. 21(5):1071–1088.Link, Google Scholar
Sutton RS, Barto AG (2018) Reinforcement Learning: An Introduction, 2nd ed. (MIT Press, Cambridge, MA). Google Scholar
Wald A (1945) Sequential tests of statistical hypotheses. Ann. Math. Statist. 16(2):117–186.Crossref, Google Scholar
Weibel C (2010) Implementation and parallelization of a reverse-search algorithm for Minkowski sums. Proc. 12th Workshop Algorithm Engrg. Experiments (SIAM, Philadelphia), 34–42.Google Scholar
White CC (1991) A survey of solution techniques for the partially observed decision process. Ann. Oper. Res. 32:215–230.Crossref, Google Scholar
Zhang H (2010) Partially observable Markov decision processes: A geometric technique and analysis. Oper. Res. 58(1):214–228.Link, Google Scholar
Zhang H (2022) Analytical solution to a discrete-time model for dynamic learning and decision making. Management Sci., ePub ahead of print February 1, https://doi.org/10.1287/mnsc.2021.4194.Link, Google Scholar
Zhang NL, Liu W (1996) Planning in stochastic domains: Problem characteristics and approximation. Technical Report HKUST-CS96-31, Hong Kong University of Science and Technology.Google Scholar

Volume 70, Issue 3

May-June 2022

Pages iii-viii, 1293-1952, C2-C3

Article Information

Supplemental Material

Metrics

Information

Received:August 10, 2018
Accepted:October 14, 2021
Published Online:February 09, 2022

Cite as

Hao Zhang (2022) Dynamic Learning and Decision Making via Basis Weight Vectors. Operations Research 70(3):1835-1853.

https://doi.org/10.1287/opre.2021.2240

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Dynamic Learning and Decision Making via Basis Weight Vectors

References

Volume 70, Issue 3

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News