Dynamic Coupon Targeting Using Batch Deep Reinforcement Learning: An Application to Livestream Shopping

Published Online:https://doi.org/10.1287/mksc.2022.1403

References

  • Ascarza E, Netzer O, Hardie BGS (2018) Some customers would rather leave without saying goodbye. Marketing Sci. 37(1):54–77.LinkGoogle Scholar
  • Bell DR, Lattin JM (2000) Looking for loss aversion in scanner panel data: The confounding effect of price response heterogeneity. Marketing Sci. 19(2):185–200.LinkGoogle Scholar
  • Bertsekas D (2019) Reinforcement Learning and Optimal Control (Athena Scientific, Belmont, MA).Google Scholar
  • Cai Y, Judd KL (2010) Stable and efficient computational methods for dynamic programming. J. Eur. Econom. Assoc. 8(2–3):626–634.CrossrefGoogle Scholar
  • Dubé JP, Misra S (2022) Personalized pricing and customer welfare. J. Political Econom. 131(1):131–189.Google Scholar
  • Dubé JP, Hitsch GJ, Rossi PE (2010) State dependence and alternative explanations for consumer inertia. RAND J. Econom. 41(3):417–445.CrossrefGoogle Scholar
  • Dudík M, Langford J, Li L (2011) Doubly robust policy evaluation and learning. Getoor L, Scheffer T, eds. Proc. 28th Internat. Conf. on Machine Learn. (Omnipress, Madison, WI), 1097–1104.Google Scholar
  • Dudík M, Erhan D, Langford J, Li L (2014) Doubly robust policy evaluation and optimization. Statist. Sci. 29(4):485–511.CrossrefGoogle Scholar
  • Fader PS, Hardie BGS, Lee KL (2005) RFM and CLV: Using iso-value curves for customer base analysis. J. Marketing Res. 42(4):415–430.CrossrefGoogle Scholar
  • Fan J, Wang Z, Xie Y, Yang Z (2020) A theoretical analysis of deep q-learning. Bayen AM, Jadbabaie A, Pappas G, Parrilo PA, Recht B, Tomlin C, Zeilinger M, eds. Learning for Dynamics and Control (JMLR, Cambridge, MA), 120:486–489.Google Scholar
  • Friedman JH (2001) Greedy function approximation: A gradient boosting machine. Ann. Statist. 29(5):1189–1232.CrossrefGoogle Scholar
  • Fujimoto S, Conti E, Ghavamzadeh M, Pineau J (2019) Benchmarking batch deep reinforcement learning algorithms. Preprint, submitted October 3, https://arxiv.org/abs/1910.01708.Google Scholar
  • Furman J, Coyle D, Fletcher A, McAules D, Marsden P (2019) Unlocking digital competition: Report of the digital competition expert panel. Report, The National Archives, Kew, London. chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/785547/unlocking_digital_competition_furman_review_web.pdf.Google Scholar
  • Gedenk K, Neslin SA (1999) The role of retail promotion in determining future brand loyalty: Its effect on purchase event feedback. J. Retailing 75(4):433–459.CrossrefGoogle Scholar
  • Gönül FF, Kim BD, Shi M (2000) Mailing smarter to catalog customers. J. Interactive Marketing 14(2):2–16.CrossrefGoogle Scholar
  • Hauser JR, Urban GL, Liberali G, Braun M (2009) Website morphing. Marketing Sci. 28(2):202–223.LinkGoogle Scholar
  • He X, Pan J, Jin O, Xu T, Liu B, Xu T, Shi Y, et al. (2014) Practical lessons from predicting clicks on ads at Facebook. Saka E, Shen D, Lee K, Li Y, eds. Proc. 8th Internat. Workshop on Data Mining for Online Advertising (Association for Computing Machinery, New York), 1–9.Google Scholar
  • Hotz JV, Miller RA (1993) Conditional choice probabilities and the estimation of dynamic models. Rev. Econom. Stud. 60(3):497–529.CrossrefGoogle Scholar
  • Imaizumi M, Fukumizu K (2019) Deep neural networks learn non-smooth functions effectively. Chaudhuri K, Sugiyama M, eds. Proc. 22nd Internat. Conf. on Artificial Intelligence and Statist., vol. 89 (JMLR, Cambridge, MA), 869–878.Google Scholar
  • Jeuland AP (1979) Brand choice inertia as one aspect of the notion of brand loyalty. Management Sci. 25(7):671–682.LinkGoogle Scholar
  • Kahn BE, Kalwani MU, Morrison DG (1986) Measuring variety-seeking and reinforcement behaviors using panel data. J. Marketing Res. 23(2):89–100.CrossrefGoogle Scholar
  • Kim M, Sudhir K, Uetake K (2021) A structural model of a multitasking salesforce: Multidimensional incentives and plan design. Management Sci. 68(6):4602–4630.Google Scholar
  • Lucas RE (1976) Econometric policy evaluation: A critique. Carnegie-Rochester Conference Series on Public Policy, vol. 1, 19–46.Google Scholar
  • McCall JJ (1970) Economics of information and job search. Quart. J. Econom. 84(1):113–126.CrossrefGoogle Scholar
  • Misra K, Schwartz EM, Abernethy J (2019) Dynamic online pricing with incomplete information using multiarmed bandit experiments. Marketing Sci. 38(2):226–252.LinkGoogle Scholar
  • Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, et al. (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529.CrossrefGoogle Scholar
  • Oprescu M, Syrgkanis V, Wu ZS (2019) Orthogonal random forest for causal inference. Chaudhuri K, Salakhutdinov R, eds. Internat. Conf. Machine Learn. vol. 97 (PMLR, Cambridge, MA), 4932–4941.Google Scholar
  • Rajendran KN, Tellis GJ (1994) Contextual and temporal components of reference price. J. Marketing 58(1):22–34.CrossrefGoogle Scholar
  • Rhee E, Russell GJ (2009) Forecasting household response in database marketing: A latent trait approach. Lawrence KD, Klimberg RK, eds. Advances in Business and Management Forecasting, vol. 6 (Emerald, Bingley, UK), 109–131.CrossrefGoogle Scholar
  • Rossi PE, McCulloch RE, Allenby GM (1996) The value of purchase history data in target marketing. Marketing Sci. 15(4):321–340.LinkGoogle Scholar
  • Rust J (1996) Numerical dynamic programming in economics. Amman HM, Kendrick DA, Rust J, eds. Handbook of Computational Economics, vol. 1 (Elsevier, North Holland Publishing Co., Amsterdam, Netherlands), 619–729.Google Scholar
  • Seetharaman PB, Che H (2009) Price competition in markets with consumer variety seeking. Marketing Sci. 28(3):516–525.LinkGoogle Scholar
  • Seethu Seetharaman PB (2009) 17 dynamic pricing. Rao VR, ed. Handbook of Pricing Research in Marketing (Edward Elgar Publishing, Cheltenham, UK), 384.Google Scholar
  • Seiler S (2013) The impact of search costs on consumer behavior: A dynamic approach. Quant. Marketing Econom. 11(2):155–203.CrossrefGoogle Scholar
  • Sutton RS, Barto AG (2018) Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA).Google Scholar
  • UK Competition and Markets Authority (2018) Pricing algorithms: Economic working paper on the use of algorithms to facilitate collusion and personalised pricing. Working paper, UK Competition and Markets Authority, UK.Google Scholar
  • Urban GL, Liberali G, MacDonald E, Bordley R, Hauser JR (2013) Morphing banner advertising. Marketing Sci. 33(1):27–46.LinkGoogle Scholar
  • Van Heerde HJ, Neslin SA (2017) Sales promotion models. Handbook of Marketing Decision Models (Springer, Berlin), 13–77.CrossrefGoogle Scholar
  • Watkins CJCH (1989) Learning from delayed rewards. PhD thesis, Cambridge University, Cambridge, UK.Google Scholar
  • Wen H, Zhang J, Lin Q, Yang K, Huang P (2019) Multi-level deep cascade trees for conversion rate prediction in recommendation system. Proc. Conf. AAAI Artificial Intelligence 33:338–345.CrossrefGoogle Scholar
  • Winer RS (1986) A reference price model of brand choice for frequently purchased products. J. Consumer Res. 13(2):250–256.CrossrefGoogle Scholar
  • Zhang Q, Wang W, Chen Y (2019) In-consumption social listening with moment-to-moment unstructured data: The case of movie appreciation and live comments. Marketing Sci. 39(2):285–295.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.