Deep Reinforcement Learning for Sequential Targeting

Published Online:https://doi.org/10.1287/mnsc.2022.4621

References

  • Agarwal A, Hosanagar K, Smith MD (2011) Location, location, location: An analysis of profitability of position in online advertising markets. J. Marketing Res. 48(6):1057–1073.CrossrefGoogle Scholar
  • Bettman JR, Kakkar P (1977) Effects of information presentation format on consumer information acquisition strategies. J. Consumer Res. 3(4):233–240.CrossrefGoogle Scholar
  • Blattberg RC, Neslin SA (1990) Sales Promotion: Concepts, Methods, and Strategies (Prentice Hall, Englewood Cliffs, NJ).Google Scholar
  • Brafman RI, Tennenholtz M (2002) R-max-a general polynomial time algorithm for near-optimal reinforcement learning. J. Machine Learn. Res. 3(Oct):213–231.Google Scholar
  • Business Insider (2017) Just 2% of app installs lead to purchases. (February 22), https://www.businessinsider.com/just-2-of-app-installs-lead-to-purchases-2017-2.Google Scholar
  • Cesa-Bianchi N, Gentile C, Lugosi G, Neu G (2017) Boltzmann exploration done right. Adv. Neural Inform. Processing Systems 30, 6284–6293.Google Scholar
  • Charikar MS (2002) Similarity estimation techniques from rounding algorithms. Proc. 34th Annual ACM Sympos. Theory Comput. (ACM), 380–388.Google Scholar
  • Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. Proc. 22nd ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining, 785–794.Google Scholar
  • Chen WR (2008) Determinants of firms’ backward-and forward-looking R&D search behavior. Organ. Sci. 19(4):609–622.LinkGoogle Scholar
  • Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. Preprint, submitted September 3, https://arxiv.org/abs/1406.1078.Google Scholar
  • eMarketer (2019) US time spent with mobile 2019. (May 30), https://www.emarketer.com/content/us-time-spent-with-mobile-2019.Google Scholar
  • Fortunato M, Azar MG, Piot B, Menick J, Osband I, Graves A, Mnih V, Munos R, Hassabis D, Pietquin O, Blundell C, Legg S (2017) Noisy networks for exploration. Preprint, submitted June 30, https://arxiv.org/abs/1706.10295v1.Google Scholar
  • Fudenberg D, Villas-Boas JM (2006) Behavior-based price discrimination and customer recognition. Hendershott T, ed. Economics and Information Systems, vol. 1 (Elsevier Science, Oxford, United Kingdom), 377–436.Google Scholar
  • Gómez-Pérez G, Martín-Guerrero JD, Soria-Olivas E, Balaguer-Ballester E, Palomares A, Casariego N (2009) Assigning discounts in a marketing campaign by using reinforcement learning and neural networks. Expert Systems Appl. 36(4):8022–8031.CrossrefGoogle Scholar
  • Hafner D, Lillicrap T, Norouzi M, Ba J (2020) Mastering Atari with discrete world models. Preprint, submitted December 22, https://arxiv.org/abs/2010.02193v2.Google Scholar
  • Hauser JR, Liberali G, Urban GL (2014) Website morphing 2.0: Switching costs, partial exposure, random exit, and when to morph. Management Sci. 60(6):1594–1616.LinkGoogle Scholar
  • Hauser JR, Urban GL, Liberali G, Braun M (2009) Website morphing. Marketing Sci. 28(2):202–223.LinkGoogle Scholar
  • Jaksch T, Ortner R, Auer P (2010) Near-optimal regret bounds for reinforcement learning. J. Machine Learn. Res. 11(2010):1563–1600.Google Scholar
  • Jedidi K, Mela CF, Gupta S (1999) Managing advertising and promotion for long-run profitability. Marketing Sci. 18(1):1–22.LinkGoogle Scholar
  • Kahn BE, Kalwani MU, Morrison DG (1986) Measuring variety-seeking and reinforcement behaviors using panel data. J. Marketing Res. 23(2):89–100.CrossrefGoogle Scholar
  • Kao HC, Tang KF, Chang EY (2018) Context-aware symptom checking for disease diagnosis using hierarchical reinforcement learning. Proc. 32nd AAAI Conf. Artificial Intelligence (AAAI), 2305–2313.Google Scholar
  • Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. Preprint, submitted December 22, https://arxiv.org/abs/1412.6980v1.Google Scholar
  • Lee D, Hosanagar K (2019) How do recommender systems affect sales diversity? A cross-category investigation via randomized field experiment. Inform. Systems Res. 30(1):239–259.LinkGoogle Scholar
  • Li J, Monroe W, Ritter A, Galley M, Gao J, Jurafsky D (2016) Deep reinforcement learning for dialogue generation. Preprint, submitted September 29, https://arxiv.org/abs/1606.01541.Google Scholar
  • Linden G, Smith B, York J (2003) Amazon.com recommendations: Item-to-item collaborative filtering. IEEE Internet Comput. 30(1):76–80.CrossrefGoogle Scholar
  • Mahajan V, Muller E (1986) Advertising pulsing policies for generating awareness for new products. Marketing Sci. 5(2):89–106.LinkGoogle Scholar
  • Medsker LR, Jain LC, eds. (2001) Recurrent Neural Networks: Design and Applications (CRC Press, Boca Raton, FL).Google Scholar
  • Mela CF, Gupta S, Lehmann DR (1997) The long-term impact of promotion and advertising on consumer brand choice. J. Marketing Res. 34(2):248–261.CrossrefGoogle Scholar
  • Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533.CrossrefGoogle Scholar
  • Mooney RJ, Roy L (2000) Content-based book recommending using learning for text categorization. Proc. 5th ACM Conf. Digital Libraries (ACM), 195–204.Google Scholar
  • Oprescu M, Syrgkanis V, Wu ZS (2019) Orthogonal random forest for causal inference. Preprint, submitted September 25, https://arxiv.org/abs/1806.03467v4.Google Scholar
  • Poterba JM (1988) Are consumers forward looking? Evidence from fiscal experiments. Amer. Econom. Rev. 78(2):413–418.Google Scholar
  • Rafieian O (2019) Optimizing user engagement through adaptive ad sequencing. Technical report, Cornell University, Ithaca, NY.Google Scholar
  • Raju JS (1992) The effect of price promotions on variability in product category sales. Marketing Sci. 11(3):207–220.LinkGoogle Scholar
  • Ruder S (2017) An overview of multi-task learning in deep neural networks. Preprint, submitted June 15, https://arxiv.org/abs/1706.05098.Google Scholar
  • Sahni NS, Narayanan S, Kalyanam K (2019) An experimental investigation of the effects of retargeted advertising: The role of frequency and timing. J. Marketing Res. 56(3):401–418.CrossrefGoogle Scholar
  • Schaul T, Quan J, Antonoglou I, Silver D (2015) Prioritized experience replay. Preprint, submitted November 19, https://arxiv.org/abs/1511.05952v2.Google Scholar
  • Schwartz EM, Bradlow ET, Fader PS (2017) Customer acquisition via display advertising using multi-armed bandit experiments. Marketing Sci. 36(4):500–522.LinkGoogle Scholar
  • Seetharaman P (2004) Modeling multiple sources of state dependence in random utility models: A distributed lag approach. Marketing Sci. 23(2):263–271.LinkGoogle Scholar
  • Simester DI, Sun P, Tsitsiklis JN (2006) Dynamic catalog mailing policies. Management Sci. 52(5):683–696.LinkGoogle Scholar
  • Strehl AL, Littman ML (2008) An analysis of model-based interval estimation for markov decision processes. J. Comput. System Sci. 74(8):1309–1331.CrossrefGoogle Scholar
  • Strehl AL, Li L, Wiewiora E, Langford J, Littman ML (2006) Pac model-free reinforcement learning. Proc. 23rd Internat. Conf. Machine Learn., 881–888.Google Scholar
  • Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. Adv. Neural Inform. Processing Systems, 3104–3112.Google Scholar
  • Tang H, Houthooft R, Foote D, Stooke A, Chen OX, Duan Y, Schulman J, DeTurck F, Abbeel P (2017) # exploration: A study of count-based exploration for deep reinforcement learning. Adv. Neural Inform. Processing Systems, 2753–2762.Google Scholar
  • Tokic M (2010) Adaptive ε-greedy exploration in reinforcement learning based on value differences. Annual Conf. Artificial Intelligence (Springer), 203–210.Google Scholar
  • Urban GL, Liberali G, MacDonald E, Bordley R, Hauser JR (2014) Morphing banner advertising. Marketing Sci. 33(1):27–46.LinkGoogle Scholar
  • Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. Thirtieth AAAI Conf. Artificial Intelligence.Google Scholar
  • Wang Z, Schaul T, Hessel M, van Hasselt H, Lanctot M, De Freitas N (2016) Dueling network architectures for deep reinforcement learning. Preprint, submitted April 5, https://arxiv.org/abs/1511.06581v3.Google Scholar
  • Winer RS (1986) A reference price model of brand choice for frequently purchased products. J. Consumer Res. 13(2):250–256.CrossrefGoogle Scholar
  • Zhang J, Hao B, Chen B, Li C, Chen H, Sund J (2019a) Hierarchical reinforcement learning for course recommendation in MOOCs. Proc. 33rd Conf. Artificial Intelligence (AAAI),435–442.CrossrefGoogle Scholar
  • Zhang Y, Li B, Luo X, Wang X (2019b) Personalized mobile targeting with user engagement stages: Combining a structural hidden Markov model and field experiment. Inform. Systems Res. 30(3):787–804.LinkGoogle Scholar
  • Zhao M, Li Z, An B, Lu H, Yang Y, Chu C (2018a) Impression allocation for combating fraud in e-commerce via deep reinforcement learning with action norm penalty. Proc. 27th Internat. Joint Conf. Artificial Intelligence, 3940–3946.Google Scholar
  • Zhao X, Zhang L, Ding Z, Xia L, Tang J, Yin D (2018b) Recommendations with negative feedback via pairwise deep reinforcement learning. Proc. 24th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM), 1040–1048.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.