Agarwal A, Hosanagar K, Smith MD (2011) Location, location, location: An analysis of profitability of position in online advertising markets. J. Marketing Res. 48(6):1057–1073.Crossref, Google Scholar
Bettman JR, Kakkar P (1977) Effects of information presentation format on consumer information acquisition strategies. J. Consumer Res. 3(4):233–240.Crossref, Google Scholar
Blattberg RC, Neslin SA (1990) Sales Promotion: Concepts, Methods, and Strategies (Prentice Hall, Englewood Cliffs, NJ).Google Scholar
Brafman RI, Tennenholtz M (2002) R-max-a general polynomial time algorithm for near-optimal reinforcement learning. J. Machine Learn. Res. 3(Oct):213–231.Google Scholar
Business Insider (2017) Just 2% of app installs lead to purchases. (February 22), https://www.businessinsider.com/just-2-of-app-installs-lead-to-purchases-2017-2.Google Scholar
Cesa-Bianchi N, Gentile C, Lugosi G, Neu G (2017) Boltzmann exploration done right. Adv. Neural Inform. Processing Systems 30, 6284–6293.Google Scholar
Charikar MS (2002) Similarity estimation techniques from rounding algorithms. Proc. 34th Annual ACM Sympos. Theory Comput. (ACM), 380–388.Google Scholar
Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. Proc. 22nd ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining, 785–794.Google Scholar
Chen WR (2008) Determinants of firms’ backward-and forward-looking R&D search behavior. Organ. Sci. 19(4):609–622.Link, Google Scholar
Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. Preprint, submitted September 3, https://arxiv.org/abs/1406.1078.Google Scholar
eMarketer (2019) US time spent with mobile 2019. (May 30), https://www.emarketer.com/content/us-time-spent-with-mobile-2019.Google Scholar
Fortunato M, Azar MG, Piot B, Menick J, Osband I, Graves A, Mnih V, Munos R, Hassabis D, Pietquin O, Blundell C, Legg S (2017) Noisy networks for exploration. Preprint, submitted June 30, https://arxiv.org/abs/1706.10295v1.Google Scholar
Fudenberg D, Villas-Boas JM (2006) Behavior-based price discrimination and customer recognition. Hendershott T, ed. Economics and Information Systems, vol. 1 (Elsevier Science, Oxford, United Kingdom), 377–436.Google Scholar
Gómez-Pérez G, Martín-Guerrero JD, Soria-Olivas E, Balaguer-Ballester E, Palomares A, Casariego N (2009) Assigning discounts in a marketing campaign by using reinforcement learning and neural networks. Expert Systems Appl. 36(4):8022–8031.Crossref, Google Scholar
Hafner D, Lillicrap T, Norouzi M, Ba J (2020) Mastering Atari with discrete world models. Preprint, submitted December 22, https://arxiv.org/abs/2010.02193v2.Google Scholar
Hauser JR, Liberali G, Urban GL (2014) Website morphing 2.0: Switching costs, partial exposure, random exit, and when to morph. Management Sci. 60(6):1594–1616.Link, Google Scholar
Hauser JR, Urban GL, Liberali G, Braun M (2009) Website morphing. Marketing Sci. 28(2):202–223.Link, Google Scholar
Jaksch T, Ortner R, Auer P (2010) Near-optimal regret bounds for reinforcement learning. J. Machine Learn. Res. 11(2010):1563–1600.Google Scholar
Jedidi K, Mela CF, Gupta S (1999) Managing advertising and promotion for long-run profitability. Marketing Sci. 18(1):1–22.Link, Google Scholar
Kahn BE, Kalwani MU, Morrison DG (1986) Measuring variety-seeking and reinforcement behaviors using panel data. J. Marketing Res. 23(2):89–100.Crossref, Google Scholar
Kao HC, Tang KF, Chang EY (2018) Context-aware symptom checking for disease diagnosis using hierarchical reinforcement learning. Proc. 32nd AAAI Conf. Artificial Intelligence (AAAI), 2305–2313.Google Scholar
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. Preprint, submitted December 22, https://arxiv.org/abs/1412.6980v1.Google Scholar
Lee D, Hosanagar K (2019) How do recommender systems affect sales diversity? A cross-category investigation via randomized field experiment. Inform. Systems Res. 30(1):239–259.Link, Google Scholar
Li J, Monroe W, Ritter A, Galley M, Gao J, Jurafsky D (2016) Deep reinforcement learning for dialogue generation. Preprint, submitted September 29, https://arxiv.org/abs/1606.01541.Google Scholar
Linden G, Smith B, York J (2003) Amazon.com recommendations: Item-to-item collaborative filtering. IEEE Internet Comput. 30(1):76–80.Crossref, Google Scholar
Mahajan V, Muller E (1986) Advertising pulsing policies for generating awareness for new products. Marketing Sci. 5(2):89–106.Link, Google Scholar
Medsker LR, Jain LC, eds. (2001) Recurrent Neural Networks: Design and Applications (CRC Press, Boca Raton, FL).Google Scholar
Mela CF, Gupta S, Lehmann DR (1997) The long-term impact of promotion and advertising on consumer brand choice. J. Marketing Res. 34(2):248–261.Crossref, Google Scholar
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533.Crossref, Google Scholar
Mooney RJ, Roy L (2000) Content-based book recommending using learning for text categorization. Proc. 5th ACM Conf. Digital Libraries (ACM), 195–204.Google Scholar
Oprescu M, Syrgkanis V, Wu ZS (2019) Orthogonal random forest for causal inference. Preprint, submitted September 25, https://arxiv.org/abs/1806.03467v4.Google Scholar
Poterba JM (1988) Are consumers forward looking? Evidence from fiscal experiments. Amer. Econom. Rev. 78(2):413–418.Google Scholar
Rafieian O (2019) Optimizing user engagement through adaptive ad sequencing. Technical report, Cornell University, Ithaca, NY.Google Scholar
Raju JS (1992) The effect of price promotions on variability in product category sales. Marketing Sci. 11(3):207–220.Link, Google Scholar
Ruder S (2017) An overview of multi-task learning in deep neural networks. Preprint, submitted June 15, https://arxiv.org/abs/1706.05098.Google Scholar
Sahni NS, Narayanan S, Kalyanam K (2019) An experimental investigation of the effects of retargeted advertising: The role of frequency and timing. J. Marketing Res. 56(3):401–418.Crossref, Google Scholar
Schaul T, Quan J, Antonoglou I, Silver D (2015) Prioritized experience replay. Preprint, submitted November 19, https://arxiv.org/abs/1511.05952v2.Google Scholar
Schwartz EM, Bradlow ET, Fader PS (2017) Customer acquisition via display advertising using multi-armed bandit experiments. Marketing Sci. 36(4):500–522.Link, Google Scholar
Seetharaman P (2004) Modeling multiple sources of state dependence in random utility models: A distributed lag approach. Marketing Sci. 23(2):263–271.Link, Google Scholar
Simester DI, Sun P, Tsitsiklis JN (2006) Dynamic catalog mailing policies. Management Sci. 52(5):683–696.Link, Google Scholar
Strehl AL, Littman ML (2008) An analysis of model-based interval estimation for markov decision processes. J. Comput. System Sci. 74(8):1309–1331.Crossref, Google Scholar
Strehl AL, Li L, Wiewiora E, Langford J, Littman ML (2006) Pac model-free reinforcement learning. Proc. 23rd Internat. Conf. Machine Learn., 881–888.Google Scholar
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. Adv. Neural Inform. Processing Systems, 3104–3112.Google Scholar
Tang H, Houthooft R, Foote D, Stooke A, Chen OX, Duan Y, Schulman J, DeTurck F, Abbeel P (2017) # exploration: A study of count-based exploration for deep reinforcement learning. Adv. Neural Inform. Processing Systems, 2753–2762.Google Scholar
Tokic M (2010) Adaptive ε-greedy exploration in reinforcement learning based on value differences. Annual Conf. Artificial Intelligence (Springer), 203–210.Google Scholar
Urban GL, Liberali G, MacDonald E, Bordley R, Hauser JR (2014) Morphing banner advertising. Marketing Sci. 33(1):27–46.Link, Google Scholar
Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. Thirtieth AAAI Conf. Artificial Intelligence.Google Scholar
Wang Z, Schaul T, Hessel M, van Hasselt H, Lanctot M, De Freitas N (2016) Dueling network architectures for deep reinforcement learning. Preprint, submitted April 5, https://arxiv.org/abs/1511.06581v3.Google Scholar
Winer RS (1986) A reference price model of brand choice for frequently purchased products. J. Consumer Res. 13(2):250–256.Crossref, Google Scholar
Zhang J, Hao B, Chen B, Li C, Chen H, Sund J (2019a) Hierarchical reinforcement learning for course recommendation in MOOCs. Proc. 33rd Conf. Artificial Intelligence (AAAI),435–442.Crossref, Google Scholar
Zhang Y, Li B, Luo X, Wang X (2019b) Personalized mobile targeting with user engagement stages: Combining a structural hidden Markov model and field experiment. Inform. Systems Res. 30(3):787–804.Link, Google Scholar
Zhao M, Li Z, An B, Lu H, Yang Y, Chu C (2018a) Impression allocation for combating fraud in e-commerce via deep reinforcement learning with action norm penalty. Proc. 27th Internat. Joint Conf. Artificial Intelligence, 3940–3946.Google Scholar
Zhao X, Zhang L, Ding Z, Xia L, Tang J, Yin D (2018b) Recommendations with negative feedback via pairwise deep reinforcement learning. Proc. 24th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM), 1040–1048.Google Scholar

Volume 69, Issue 9

September 2023

Pages 4973-5693, iii-iv

Article Information

Supplemental Material

Metrics

Information

Received:June 16, 2020
Accepted:April 18, 2022
Published Online:December 21, 2022

Cite as

Wen Wang, Beibei Li, Xueming Luo, Xiaoyi Wang (2022) Deep Reinforcement Learning for Sequential Targeting. Management Science 69(9):5439-5460.

https://doi.org/10.1287/mnsc.2022.4621

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Deep Reinforcement Learning for Sequential Targeting

References

Volume 69, Issue 9

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News