Customer Acquisition via Display Advertising Using Multi-Armed Bandit Experiments

Published Online:https://doi.org/10.1287/mksc.2016.1023

References

  • Agarwal D, Chen B-C, Elango P (2008) Explore/exploit schemes for web content optimization. Yahoo Research paper series.Google Scholar
  • Agrawal R (1995) Sample mean based index policies with O(log n) regret for the multi-armed bandit problem. Adv. Appl. Probab. 27(4):1054–1078.CrossrefGoogle Scholar
  • Agrawal S, Goyal N (2012) Analysis of Thompson sampling for the multi-armed bandit problem. J. Machine Learn. Res. Workshop Conf. Proc., Vol. 23, 39.1–39.26.Google Scholar
  • Anderson E, Simester D (2011) A step-by-step guide to smart business experiments. Harvard Bus. Rev. 89(3):98–105.Google Scholar
  • Auer P (2002) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3:397–422.Google Scholar
  • Bates D, Watts DG (1988) Nonlinear Regression Analysis and Its Applications (Wiley, New York).CrossrefGoogle Scholar
  • Bates D, Maechler M, Bolker B, Walker S (2013) R Package ’lme4’. http://cran.r-project.org/web/packages/lme4/lme4.pdf.Google Scholar
  • Berry DA (1972) A Bernoulli two-armed bandit. Ann. Math. Statist. 43(3):871–897.CrossrefGoogle Scholar
  • Berry DA (2004) Bayesian statistics and the efficiency and ethics of clinical trials. Statist. Sci. 19(1):175–187.CrossrefGoogle Scholar
  • Berry DA, Fristedt B (1985) Bandit Problems (Chapman & Hall, London).CrossrefGoogle Scholar
  • Bertsimas D, Mersereau AJ (2007) Learning approach for interactive marketing. Oper. Res. 55(6):1120–1135.LinkGoogle Scholar
  • Bradt RN, Johnson SM, Karlin S (1956) On sequential designs for maximizing the sum of n observations. Ann. Math. Statist. 27(4): 1060–1074.CrossrefGoogle Scholar
  • Braun M, Moe WW (2013) Online display advertising: Modeling the effects of multiple creatives and individual impression histories. Marketing Sci. 32(5):753–767.LinkGoogle Scholar
  • Brezzi M, Lai TL (2002) Optimal learning and experimentation in bandit problems. J. Econom. Dynam. Control 27:87–108.CrossrefGoogle Scholar
  • Chapelle O, Li L (2011) Advances in neural information processing systems. Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira F, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems, Vol. 24, 1–9.Google Scholar
  • Chick SE, Frazier P (2012) Sequential sampling with economics of selection procedures. Management Sci. 58(3):550–569.LinkGoogle Scholar
  • Chick SE, Gans N (2009) Economic analysis of simulation selection problems. Management Sci. 55(3):421–437.LinkGoogle Scholar
  • Chick SE, Inoue K (2001) New two-stage and sequential procedures for selecting the best simulated system. Oper. Res. 49(5):732–743.LinkGoogle Scholar
  • Chick SE, Branke J, Schmidt C (2010) Sequential sampling to myopically maximize the expected value of information. INFORMS J. Comput. 22(1):71–80.LinkGoogle Scholar
  • Dani V, Hayes TP, Kakade SM (2008) Stochastic linear optimization under bandit feedback. Conf. Learn. Theory, 355–366.Google Scholar
  • Davenport TH (2009) How to design smart business experiments. Harvard Bus. Rev. 87(2):1–9.Google Scholar
  • Donahoe J (2011) How eBay developed a culture of experimentation: HBR interview of John Donahoe. Havard Bus. Rev. 89(3):92–97.Google Scholar
  • Filippi S, Cappe O, Garivier A, Szepesvári C (2010) Parametric bandits: The generalized linear case. Lafferty J, Williams CKI, Shawe-Taylor J, Zemel RS, Culotta A, eds. Adv. Neural Inform. Processing Systems, Vol. 23, 1–9.Google Scholar
  • Frazier PI, Powell WB, Dayanik S (2009) The knowledge-gradient policy for correlated normal beliefs. INFORMS J. Comput. 21(4):599–613.LinkGoogle Scholar
  • Gelman A, Hill J (2007) Data Analysis Using Regression and Multilevel/Hierarchical Models (Cambridge University Press, New York).CrossrefGoogle Scholar
  • Gelman A, Carlin JB, Stern HS, Rubin DB (2004) Bayesian Data Analysis, 2 ed. (Chapman & Hall, New York).Google Scholar
  • Gittins JC (1979) Bandit processes and dynamic allocation indices. J. Royal Statist. Soc., Ser. B 41(2):148–177.Google Scholar
  • Gittins JC, Glazebrook K, Weber R (2011) Multi-Armed Bandit Allocation Indices, 2 ed. (John Wiley and Sons, New York).CrossrefGoogle Scholar
  • Goldfarb A, Tucker C (2011) Online display advertising: Targeting and obtrusiveness. Marketing Sci. 30(3):389–404.LinkGoogle Scholar
  • Granmo O-C (2010) Solving two-armed Bernoulli bandit problems using a Bayesian learning automaton. Internat. J. Intelligent Comput. Cybernetics 3(2):207–232.CrossrefGoogle Scholar
  • Hauser JR, Liberali G, Urban GL (2014) Website morphing 2.0: Technical and implementation advances and a field experiment. Management Sci. 60(6):1594–1616.LinkGoogle Scholar
  • Hauser JR, Urban GL, Liberali G, Braun M (2009) Website morphing. Marketing Sci. 28(2):202–223.LinkGoogle Scholar
  • Hoban P, Bucklin R (2015) Effects of Internet display advertising in the purchase funnel: Model-based insights from a randomized field experiment. J. Marketing Res. 52(3):375–393.CrossrefGoogle Scholar
  • Kaufmann E, Korda N, Munos R (2012) Thompson sampling: An asymptotically optimal finite time analysis. Bshouty NH, Stoltz G, Vayatis N, Zeugmann T, eds. Algorithmic Learning Theory (Springer-Verlag, Berlin Heidelberg), 199–213.CrossrefGoogle Scholar
  • Keller G, Oldale A (2003) Branching bandits: A sequential search process with correlated pay-offs. J. Econom. Theory 113(2):302–315.CrossrefGoogle Scholar
  • Krishnamurthy V, Wahlberg B (2009) Partially observed Markov decision process multiarmed bandits: Structural results. Math. Oper. Res. 34(2):287–302.LinkGoogle Scholar
  • Lai TL (1987) Adaptive treatment allocation and the multi-armed bandit problem. Ann. Statist. 15(3):1091–1114.CrossrefGoogle Scholar
  • Lambrecht A, Tucker C (2013) When does retargeting work? Information specificity in online advertising. J. Marketing Res. 50(5): 561–576.CrossrefGoogle Scholar
  • Lin S, Zhang J, Hauser J (2015) Learning from experience, simply. Marketing Sci. 34(1):1–19.LinkGoogle Scholar
  • Manchanda P, Dubé J-P, Goh KY, Chintagunta PK (2006) The effect of banner advertising on Internet purchasing. J. Marketing Res. 43(1):98–108.CrossrefGoogle Scholar
  • May BC, Korda N, Lee A, Leslie DS (2011) Optimistic Bayesian sampling in contextual bandit problems. Technical report, Department of Mathematics, University of Bristol, Bristol, UK.Google Scholar
  • Meyer RJ, Shi Y (1995) Sequential choice under ambiguity: Intuitive solutions to the armed-bandit problem. Management Sci. 41(5):817–834.LinkGoogle Scholar
  • Murphy SA (2005) An experimental design for the development of adaptive treatment strategies. Statist. Medicine 24:1455–1481.CrossrefGoogle Scholar
  • Ortega PA, Braun DA (2010) A minimum relative entropy principle for learning and acting. J. Artificial Intelligence Res. 38:475–511.CrossrefGoogle Scholar
  • Ortega PA, Braun DA (2014) Generalized Thompson sampling for sequential decision-making and causal inference. Complex Adaptive Systems Modeling 2(2).Google Scholar
  • Osband I, Russo D, Van Roy B (2013) (More) efficient reinforcement learning via posterior sampling. Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems, Vol. 26, 3003–3011.Google Scholar
  • Perchet V, Rigollet P, Chassang S, Snowberg E (2016) Batched bandit problems. Ann. Statist. 44(2):660–681.CrossrefGoogle Scholar
  • Powell WB (2011) Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley, Hoboken, NJ).CrossrefGoogle Scholar
  • Reiley D, Lewis RA, Papadimitriou P, Garcia-Molina H, Krishnamurthy P (2011) Display advertising impact: Search lift and social influence. Proc. 17th ACM SIGKDD Conf. Knowledge Discovery Data Mining (ACM, New York), 1019–1027.Google Scholar
  • Robbins H (1952) Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58(5):527–535.CrossrefGoogle Scholar
  • Rubin D (1990) Estimating causal effects of treatments in randomized and nonrandomized studies. J. Ed. Psych. 66(5):688–701.CrossrefGoogle Scholar
  • Rusmevichientong P, Tsitsiklis JN (2010) Linearly parameterized bandits. Math. Oper. Res. 35(2):395–411.LinkGoogle Scholar
  • Russo D, Van Roy B (2014) Learning to optimize via posterior sampling. Math. Oper. Res. 39(4):1221–1243.LinkGoogle Scholar
  • Scott SL (2010) A modern Bayesian look at the multi-armed bandit. Appl. Stochastic Models Bus. Indust. 26(6):639–658.CrossrefGoogle Scholar
  • Sutton RS, Barto AG (1998) Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA).Google Scholar
  • Thompson WR (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3):285–294.CrossrefGoogle Scholar
  • Tsitsiklis JN (1986) A lemma on the multi-armed bandit problem. IEEE Trans. Automatic Control 31(6):576–577.CrossrefGoogle Scholar
  • Urban GL, Liberali G, Bordley R, MacDonald E, Hauser JR (2014) Morphing banner advertising. Marketing Sci. 33(1):27–46.LinkGoogle Scholar
  • Wahrenberger DL, Antle CE, Klimko LA (1977) Bayesian rules for the two-armed bandit problem. Biometrika 64(1):172–174.CrossrefGoogle Scholar
  • White JM (2012) Bandit Algorithms for Website Optimization (O’Reilly Media, Sebastopol, CA).Google Scholar
  • Whittle P (1980) Multi-armed bandits and the Gittins index. J. Royal Statist. Soc., Ser. B 42(2):143–149.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.