Ensemble Experiments to Optimize Interventions Along the Customer Journey: A Reinforcement Learning Approach

Published Online:https://doi.org/10.1287/mnsc.2023.4914

References

  • Agarwal R, Schuurmans D, Norouzi M (2020) An optimistic perspective on offline reinforcement learning. Internat. Conf. Machine Learn. (PMLR, New York), 104–114.Google Scholar
  • Avalos E, Barrero JM, Davies E, Iacovone L, Torres J (2022) Measuring business uncertainty in developing and emerging economies (Brookings Institution), https://policycommons.net/artifacts/4141097/measuring-business-uncertainty-in-developing-and-emerging-economies/4949875/.Google Scholar
  • Azizzadenesheli K, Brunskill E, Anandkumar A (2018) Efficient exploration through Bayesian deep Q-networks. 2018 Inform. Theory Appl. Workshop ITA 2018 (IEEE, Piscataway, NJ), 1–9.Google Scholar
  • Bakshy E, Dworkin L, Karrer B, Kashin K, Letham B, Murthy A, Singh S (2018) AE: A domain-agnostic platform for adaptive experimentation. Conf. Neural Inform. Processing Systems (San Diego, CA), 1–8.Google Scholar
  • Bertsekas DP, Tsitsiklis JN (1995) Neuro-dynamic programming: An overview. Proc. 1995 34th IEEE Conf. Decision Control, vol. 1 (IEEE, Piscataway, NJ), 560–564.Google Scholar
  • Bronnenberg BJ, Kim JB, Mela CF (2016) Zooming in on choice: How do consumers search for cameras online? Marketing Sci. 35(5):693–712.LinkGoogle Scholar
  • Cassandra AR (1998) A survey of POMDP applications. Working Notes AAAI 1998 Fall Sympos. Planning Partially Observable Markov Decision Processes, vol. 1724 (AAAI Press, Palo Alto, CA).Google Scholar
  • Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. Proc. 22nd ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 785–794.Google Scholar
  • Dearden R, Friedman N, Russell SJ (1998) Bayesian Q-learning. Mostow J, Rich C, eds. AAAI 98 (AAAI Press/MIT Press, Cambridge, MA), 761–768.Google Scholar
  • Feng J, Li H, Huang M, Liu S, Ou W, Wang Z, Zhu X (2018) Learning to collaborate: Multi-scenario ranking via multi-agent reinforcement learning. Proc. 2018 World Wide Web Conf. (International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva), 1939–1948.Google Scholar
  • Frazier PI (2018) Bayesian optimization, chapter 11. Gel E, Ntaimo L, eds. Recent Advances in Optimization and Modeling of Contemporary Problems (INFORMS, Catonsville, MD), 255–278.LinkGoogle Scholar
  • Gallo A (2017) A refresher on A/B testing. Harvard Bus. Rev. (June 28), https://hbr.org/2017/06/a-refresher-on-ab-testing.Google Scholar
  • Ghose A, Yang S (2009) An empirical analysis of search engine advertising: Sponsored search in electronic markets. Management Sci. 55(10):1605–1622.LinkGoogle Scholar
  • Ghose A, Ipeirotis PG, Li B (2019) Modeling consumer footprints on search engines: An interplay with social media. Management Sci. 65(3):1363–1385.LinkGoogle Scholar
  • Hartigan J (1969) Linear Bayesian methods. J. Roy. Statist. Soc. B 31(3):446–454.CrossrefGoogle Scholar
  • Hauser JR, Liberali G, Urban GL (2014) Website morphing 2.0: Switching costs, partial exposure, random exit, and when to morph. Management Sci. 60(6):1594–1616.LinkGoogle Scholar
  • Hauser JR, Urban GL, Liberali G, Braun M (2009) Website morphing. Marketing Sci. 28(2):202–223.LinkGoogle Scholar
  • Hausknecht MJ, Stone P (2015) Deep recurrent Q-learning for partially observable MDPs. AAAI 2015 Fall Symposium (AAAI Press, Palo Alto, CA), 29–37.Google Scholar
  • Hessel M, Modayil J, Van Hasselt H, Schaul T, Ostrovski G, Dabney W, Horgan D, Piot B, Azar M, Silver D (2018) Rainbow: Combining improvements in deep reinforcement learning. Thirty-Second AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 3215–3222.Google Scholar
  • Huang N, Sun T, Chen P, Golden JM (2019) Word-of-mouth system implementation and customer conversion: A randomized field experiment. Inform. Systems Res. 30(3):805–818.LinkGoogle Scholar
  • Inman JJ, McAlister L (1994) Do coupon expiration dates affect consumer behavior? J. Marketing Res. 31(3):423–428.CrossrefGoogle Scholar
  • Katehakis MN, Veinott AF Jr (1987) The multi-armed bandit problem: Decomposition and computation. Math. Oper. Res. 12(2):262–268.LinkGoogle Scholar
  • Keinan A, Kivetz R (2008) Remedying hyperopia: The effects of self-control regret on consumer behavior. J. Marketing Res. 45(6):676–689.CrossrefGoogle Scholar
  • Kokkodis M, Ipeirotis PG (2021) Demand-aware career path recommendations: A reinforcement learning approach. Management Sci. 67(7):4362–4383.LinkGoogle Scholar
  • Kushner HJ (1964) A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise. J. Fluids Engrg. 86(1):97–106.Google Scholar
  • Lee D, Hosanagar K (2021) How do product attributes and reviews moderate the impact of recommender systems through purchase stages? Management Sci. 67(1):524–546.LinkGoogle Scholar
  • Li H, Kannan P (2014) Attributing conversions in a multichannel online marketing environment: An empirical model and a field experiment. J. Marketing Res. 51(1):40–56.CrossrefGoogle Scholar
  • Liebman E, Saar-Tsechansky M, Stone P (2019) The right music at the right time: Adaptive personalized playlists based on sequence modeling. MIS Quart. 43(3):765–786.CrossrefGoogle Scholar
  • Mandel T, Liu YE, Brunskill E, Popović Z (2016) Offline evaluation of online reinforcement learning algorithms. Proc. AAAI Conf. Artificial Intelligence, vol. 30 (AAAI Press, Palo Alto, CA), 1926–1933.Google Scholar
  • Mankiw NG (2020) Principles of Economics (Cengage Learning, Boston).Google Scholar
  • Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, et al. (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533.CrossrefGoogle Scholar
  • Močkus J (1975) On Bayesian methods for seeking the extremum. Optimization Techniques IFIP Tech. Conf. (Springer, Berlin), 400–404.CrossrefGoogle Scholar
  • Moe WW, Fader PS (2004) Dynamic conversion behavior at e-commerce sites. Management Sci. 50(3):326–335.LinkGoogle Scholar
  • Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, et al. (2019) Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inform. Processing Systems 32 (NIPS, San Diego, CA).Google Scholar
  • Peters M, Ketter W, Saar-Tsechansky M, Collins J (2013) A reinforcement learning approach to autonomous decision-making in smart electricity markets. Machine Learn. 92(1):5–39.CrossrefGoogle Scholar
  • Ribeiro AH, Tiels K, Aguirre LA, Schön T (2020) Beyond exploding and vanishing gradients: Analysing RNN training using attractors and smoothness. PMLR 2020 (PMLR), 2370–2380.Google Scholar
  • Schaul T, Quan J, Antonoglou I, Silver D (2015) Prioritized experience replay. Internat. Conf. Learn. Representations 2016 (ICLR, Appleton, WI).Google Scholar
  • Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, et al. (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484–489.CrossrefGoogle Scholar
  • Song Y, Sahoo N, Srinivasan S, Dellarocas C (2022) Uncovering characteristic response paths of a population. INFORMS J. Comput. 34(3):1661–1680.LinkGoogle Scholar
  • Sutton RS, Barto AG (2018) Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA).Google Scholar
  • Thompson WR (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4):285–294.CrossrefGoogle Scholar
  • Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double Q-learning. Proc. AAAI Conf. Artificial Intelligence, vol. 30 (AAAI Press, Palo Alto, CA), 2094–2100.Google Scholar
  • Wang W, Li B, Luo X, Wang X (2022) Deep reinforcement learning for sequential targeting. Management Sci. 69(9):5439–5460.Google Scholar
  • Watkins CJ, Dayan P (1992) Q-learning. Machine Learn. 8(3–4):279–292.CrossrefGoogle Scholar
  • Zhang DJ, Dai H, Dong L, Qi F, Zhang N, Liu X, Liu Z, Yang J (2020) The long-term and spillover effects of price promotions on retailing platforms: Evidence from a large randomized experiment on Alibaba. Management Sci. 66(6):2589–2609.LinkGoogle Scholar
  • Zhang Y, Li B, Luo X, Wang X (2019) Personalized mobile targeting with user engagement stages: Combining a structural hidden Markov model and field experiment. Inform. Systems Res. 30(3):787–804.LinkGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.