Ensemble Experiments to Optimize Interventions Along the Customer Journey: A Reinforcement Learning Approach

Yicheng Song
Yicheng Song
[email protected]
https://orcid.org/0000-0002-9107-814X
Carlson School of Management, University of Minnesota, Minneapolis, Minnesota 55455;
Search for more papers by this author
,
Tianshu Sun
Corresponding Author
Tianshu Sun
[email protected]
https://orcid.org/0000-0002-9786-044X
Center for Digital Transformation, Cheung Kong Graduate School of Business, Beijing 100006, China;Marshall School of Business, University of Southern California, Los Angeles, California 90089
Search for more papers by this author

Carlson School of Management, University of Minnesota, Minneapolis, Minnesota 55455;

Corresponding Author

Tianshu Sun

Center for Digital Transformation, Cheung Kong Graduate School of Business, Beijing 100006, China;Marshall School of Business, University of Southern California, Los Angeles, California 90089

Search for more papers by this author

Published Online:4 Oct 2023https://doi.org/10.1287/mnsc.2023.4914

References

Agarwal R, Schuurmans D, Norouzi M (2020) An optimistic perspective on offline reinforcement learning. Internat. Conf. Machine Learn. (PMLR, New York), 104–114.Google Scholar
Avalos E, Barrero JM, Davies E, Iacovone L, Torres J (2022) Measuring business uncertainty in developing and emerging economies (Brookings Institution), https://policycommons.net/artifacts/4141097/measuring-business-uncertainty-in-developing-and-emerging-economies/4949875/.Google Scholar
Azizzadenesheli K, Brunskill E, Anandkumar A (2018) Efficient exploration through Bayesian deep Q-networks. 2018 Inform. Theory Appl. Workshop ITA 2018 (IEEE, Piscataway, NJ), 1–9.Google Scholar
Bakshy E, Dworkin L, Karrer B, Kashin K, Letham B, Murthy A, Singh S (2018) AE: A domain-agnostic platform for adaptive experimentation. Conf. Neural Inform. Processing Systems (San Diego, CA), 1–8.Google Scholar
Bertsekas DP, Tsitsiklis JN (1995) Neuro-dynamic programming: An overview. Proc. 1995 34th IEEE Conf. Decision Control, vol. 1 (IEEE, Piscataway, NJ), 560–564.Google Scholar
Bronnenberg BJ, Kim JB, Mela CF (2016) Zooming in on choice: How do consumers search for cameras online? Marketing Sci. 35(5):693–712.Link, Google Scholar
Cassandra AR (1998) A survey of POMDP applications. Working Notes AAAI 1998 Fall Sympos. Planning Partially Observable Markov Decision Processes, vol. 1724 (AAAI Press, Palo Alto, CA).Google Scholar
Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. Proc. 22nd ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 785–794.Google Scholar
Dearden R, Friedman N, Russell SJ (1998) Bayesian Q-learning. Mostow J, Rich C, eds. AAAI 98 (AAAI Press/MIT Press, Cambridge, MA), 761–768.Google Scholar
Feng J, Li H, Huang M, Liu S, Ou W, Wang Z, Zhu X (2018) Learning to collaborate: Multi-scenario ranking via multi-agent reinforcement learning. Proc. 2018 World Wide Web Conf. (International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva), 1939–1948.Google Scholar
Frazier PI (2018) Bayesian optimization, chapter 11. Gel E, Ntaimo L, eds. Recent Advances in Optimization and Modeling of Contemporary Problems (INFORMS, Catonsville, MD), 255–278.Link, Google Scholar
Gallo A (2017) A refresher on A/B testing. Harvard Bus. Rev. (June 28), https://hbr.org/2017/06/a-refresher-on-ab-testing.Google Scholar
Ghose A, Yang S (2009) An empirical analysis of search engine advertising: Sponsored search in electronic markets. Management Sci. 55(10):1605–1622.Link, Google Scholar
Ghose A, Ipeirotis PG, Li B (2019) Modeling consumer footprints on search engines: An interplay with social media. Management Sci. 65(3):1363–1385.Link, Google Scholar
Hartigan J (1969) Linear Bayesian methods. J. Roy. Statist. Soc. B 31(3):446–454.Crossref, Google Scholar
Hauser JR, Liberali G, Urban GL (2014) Website morphing 2.0: Switching costs, partial exposure, random exit, and when to morph. Management Sci. 60(6):1594–1616.Link, Google Scholar
Hauser JR, Urban GL, Liberali G, Braun M (2009) Website morphing. Marketing Sci. 28(2):202–223.Link, Google Scholar
Hausknecht MJ, Stone P (2015) Deep recurrent Q-learning for partially observable MDPs. AAAI 2015 Fall Symposium (AAAI Press, Palo Alto, CA), 29–37.Google Scholar
Hessel M, Modayil J, Van Hasselt H, Schaul T, Ostrovski G, Dabney W, Horgan D, Piot B, Azar M, Silver D (2018) Rainbow: Combining improvements in deep reinforcement learning. Thirty-Second AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 3215–3222.Google Scholar
Huang N, Sun T, Chen P, Golden JM (2019) Word-of-mouth system implementation and customer conversion: A randomized field experiment. Inform. Systems Res. 30(3):805–818.Link, Google Scholar
Inman JJ, McAlister L (1994) Do coupon expiration dates affect consumer behavior? J. Marketing Res. 31(3):423–428.Crossref, Google Scholar
Katehakis MN, Veinott AF Jr (1987) The multi-armed bandit problem: Decomposition and computation. Math. Oper. Res. 12(2):262–268.Link, Google Scholar
Keinan A, Kivetz R (2008) Remedying hyperopia: The effects of self-control regret on consumer behavior. J. Marketing Res. 45(6):676–689.Crossref, Google Scholar
Kokkodis M, Ipeirotis PG (2021) Demand-aware career path recommendations: A reinforcement learning approach. Management Sci. 67(7):4362–4383.Link, Google Scholar
Kushner HJ (1964) A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise. J. Fluids Engrg. 86(1):97–106.Google Scholar
Lee D, Hosanagar K (2021) How do product attributes and reviews moderate the impact of recommender systems through purchase stages? Management Sci. 67(1):524–546.Link, Google Scholar
Li H, Kannan P (2014) Attributing conversions in a multichannel online marketing environment: An empirical model and a field experiment. J. Marketing Res. 51(1):40–56.Crossref, Google Scholar
Liebman E, Saar-Tsechansky M, Stone P (2019) The right music at the right time: Adaptive personalized playlists based on sequence modeling. MIS Quart. 43(3):765–786.Crossref, Google Scholar
Mandel T, Liu YE, Brunskill E, Popović Z (2016) Offline evaluation of online reinforcement learning algorithms. Proc. AAAI Conf. Artificial Intelligence, vol. 30 (AAAI Press, Palo Alto, CA), 1926–1933.Google Scholar
Mankiw NG (2020) Principles of Economics (Cengage Learning, Boston).Google Scholar
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, et al. (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533.Crossref, Google Scholar
Močkus J (1975) On Bayesian methods for seeking the extremum. Optimization Techniques IFIP Tech. Conf. (Springer, Berlin), 400–404.Crossref, Google Scholar
Moe WW, Fader PS (2004) Dynamic conversion behavior at e-commerce sites. Management Sci. 50(3):326–335.Link, Google Scholar
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, et al. (2019) Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inform. Processing Systems 32 (NIPS, San Diego, CA).Google Scholar
Peters M, Ketter W, Saar-Tsechansky M, Collins J (2013) A reinforcement learning approach to autonomous decision-making in smart electricity markets. Machine Learn. 92(1):5–39.Crossref, Google Scholar
Ribeiro AH, Tiels K, Aguirre LA, Schön T (2020) Beyond exploding and vanishing gradients: Analysing RNN training using attractors and smoothness. PMLR 2020 (PMLR), 2370–2380.Google Scholar
Schaul T, Quan J, Antonoglou I, Silver D (2015) Prioritized experience replay. Internat. Conf. Learn. Representations 2016 (ICLR, Appleton, WI).Google Scholar
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, et al. (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484–489.Crossref, Google Scholar
Song Y, Sahoo N, Srinivasan S, Dellarocas C (2022) Uncovering characteristic response paths of a population. INFORMS J. Comput. 34(3):1661–1680.Link, Google Scholar
Sutton RS, Barto AG (2018) Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA).Google Scholar
Thompson WR (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4):285–294.Crossref, Google Scholar
Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double Q-learning. Proc. AAAI Conf. Artificial Intelligence, vol. 30 (AAAI Press, Palo Alto, CA), 2094–2100.Google Scholar
Wang W, Li B, Luo X, Wang X (2022) Deep reinforcement learning for sequential targeting. Management Sci. 69(9):5439–5460.Google Scholar
Watkins CJ, Dayan P (1992) Q-learning. Machine Learn. 8(3–4):279–292.Crossref, Google Scholar
Zhang DJ, Dai H, Dong L, Qi F, Zhang N, Liu X, Liu Z, Yang J (2020) The long-term and spillover effects of price promotions on retailing platforms: Evidence from a large randomized experiment on Alibaba. Management Sci. 66(6):2589–2609.Link, Google Scholar
Zhang Y, Li B, Luo X, Wang X (2019) Personalized mobile targeting with user engagement stages: Combining a structural hidden Markov model and field experiment. Inform. Systems Res. 30(3):787–804.Link, Google Scholar

Volume 70, Issue 8

August 2024

Pages v-vii, 4953-5625, iii-v

Article Information

Supplemental Material

Metrics

Information

Received:December 06, 2021
Accepted:January 19, 2023
Published Online:October 04, 2023

Cite as

Yicheng Song, Tianshu Sun (2023) Ensemble Experiments to Optimize Interventions Along the Customer Journey: A Reinforcement Learning Approach. Management Science 70(8):5115-5130.

https://doi.org/10.1287/mnsc.2023.4914

Keywords

Acknowledgments

The authors acknowledge the support of the NVIDIA Corporation with the donation of the Graphics Processing Unit used for this research. The authors also thank the department editor, the associate editor, and the reviewers for their insightful comments and constructive suggestions throughout the review process.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Ensemble Experiments to Optimize Interventions Along the Customer Journey: A Reinforcement Learning Approach

References

Volume 70, Issue 8

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News