A Multiarmed Bandit Approach for House Ads Recommendations

Published Online:https://doi.org/10.1287/mksc.2022.1378

References

  • Agarwal A, Basu S, Schnabel T, Joachims T (2017) Effective evaluation using logged bandit feedback from multiple loggers. Proc. 23rd ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 687–696.Google Scholar
  • Agrawal P, Avadhanula V, Tulabandhula T (2020) A tractable online learning algorithm for the multinomial logit contextual bandit. Preprint, submitted March 7, https://arxiv.org/abs/2011.14033.Google Scholar
  • Agrawal R, Gupta A, Prabhu Y, Varma M (2013) Multi-label learning with millions of labels: Recommending advertiser bid phrases for web pages. Proc. 22nd Internat. Conf. World Wide Web, 13–24.Google Scholar
  • Agrawal S, Goyal N (2012) Analysis of Thompson sampling for the multi-armed bandit problem. Proc. Conf. on Learn. Theory, 39–41.Google Scholar
  • Agrawal S, Goyal N (2013) Thompson sampling for contextual bandits with linear payoffs. Proc. Internat. Conf. on Machine Learn., 127–135.Google Scholar
  • Agrawal S, Avadhanula V, Goyal V, Zeevi A (2019) MNL-bandit: A dynamic learning approach to assortment selection. Oper. Res. 67(5):1453–1485.LinkGoogle Scholar
  • Awerbuch B, Kleinberg RD (2004) Adaptive routing with end-to-end feedback: Distributed learning and geometric approaches. Proc. 36th Annual ACM Sympos. on Theory of Comput. (ACM, New York), 45–53.CrossrefGoogle Scholar
  • Bergemann D, Hege U (2005) The financing of innovation: Learning and stopping. RAND J. Econom. 36(4):719–752.Google Scholar
  • Bergemann D, Välimäki J (2002) Information acquisition and efficient mechanism design. Econometrica 70(3):1007–1033.CrossrefGoogle Scholar
  • Bertsimas D, Mersereau AJ (2007) A learning approach for interactive marketing to a customer segment. Oper. Res. 55(6):1120–1135.LinkGoogle Scholar
  • Besbes O, Gur Y, Zeevi A (2014) Stochastic multi-armed-bandit problem with non-stationary rewards. Advances in Neural Information Processing Systems, 199–207.Google Scholar
  • Bleier A, Eisenbeiss M (2015) The importance of trust for personalized online advertising. J. Retailing 91(3):390–409.CrossrefGoogle Scholar
  • Bottou L (2010) Large-scale machine learning with stochastic gradient descent. Proc. COMPSTAT (Springer, Berlin), 177–186.CrossrefGoogle Scholar
  • Braun M, Moe WW (2013) Online display advertising: Modeling the effects of multiple creatives and individual impression histories. Marketing Sci. 32(5):753–767.LinkGoogle Scholar
  • Breuer R, Brettel M (2012) Short-and long-term effects of online advertising: Differences between new and existing customers. J. Interactive Marketing 26(3):155–166.CrossrefGoogle Scholar
  • Burges CJ, Ragno R, Le QV (2007) Learning to rank with nonsmooth cost functions. Advances in Neural Information Processing Systems, 193–200.Google Scholar
  • Carmona CJ, Ramírez-Gallego S, Torres F, Bernal E, del Jesus MJ, García S (2012) Web usage mining to improve the design of an e-commerce website: OrOliveSur.com. Expert Systems Appl. 39(12):11243–11249.Google Scholar
  • Caro F, Gallien J (2007) Dynamic assortment with demand learning for seasonal consumer goods. Management Sci. 53(2):276–292.LinkGoogle Scholar
  • Chae I, Bruno HA, Feinberg FM (2019) Wearout or weariness? Measuring potential negative consequences of online ad volume and placement on website visits. J. Marketing Res. 56(1):57–75.CrossrefGoogle Scholar
  • Chakrabarti D, Kumar R, Radlinski F, Upfal E (2008) Mortal multi-armed bandits. Adv. Neural Inform. Processing Systems 21:273–280.Google Scholar
  • Chapelle O, Li L (2011) An empirical evaluation of Thompson sampling. Advances in Neural Information Processing Systems, 2249–2257.Google Scholar
  • Chen J, Yang X, Smith RE (2016) The effects of creativity on advertising wear-in and wear-out. J. Acad. Marketing Sci. 44(3):334–349.CrossrefGoogle Scholar
  • Chen W, Wang Y, Yuan Y (2013) Combinatorial multi-armed bandit: General framework and applications. Proc. Internat. Conf. on Machine Learn., 151–159.Google Scholar
  • Chen Y, Yang B, Dong J, Abraham A (2005) Time-series forecasting using flexible neural tree model. Inform. Sci. 174(3-4):219–235.Google Scholar
  • Chernev A, Hamilton R (2009) Assortment size and option attractiveness in consumer choice among retailers. J. Marketing Res. 46(3):410–420.CrossrefGoogle Scholar
  • Feit EM, Berman R (2019) Test & roll: Profit-maximizing A/B tests. Marketing Sci. 38(6):1038–1058.Google Scholar
  • Foster D, Rakhlin A (2020) Beyond UCB: Optimal and efficient contextual bandits with regression oracles. Proc. Internat. Conf. on Machine Learn., 3199–3210.Google Scholar
  • Gal Y, Ghahramani Z (2016) Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. Proc. Internat. Conf. on Machine Learn., 1050–1059.Google Scholar
  • Gilks WR, Wild P (1992) Adaptive rejection sampling for Gibbs sampling. J. Royal Statist. Soc. Ser. C 41(2):337–348.Google Scholar
  • Goic M, Olivares M (2019) Omnichannel analytics. Operations in an Omnichannel World (Springer, Berlin), 115–150.CrossrefGoogle Scholar
  • Goic M, Álvarez R, Montoya R (2018) The effect of house ads on multichannel sales. J. Interactive Marketing 42:32–45.CrossrefGoogle Scholar
  • Goić M, Jerath K, Kalyanam K (2022) The roles of multiple channels in predicting website visits and purchases: Engagers versus closers. Internat. J. Res. Marketing 39:3.CrossrefGoogle Scholar
  • Goic M, Rojas A, Saavedra I (2021) The effectiveness of triggered email marketing in addressing browse abandonments. J. Interactive Marketing 55:118–145.CrossrefGoogle Scholar
  • Goldstein A, Hajaj C (2022) The hidden conversion funnel of mobile vs. desktop consumers. Electronic Commerce Research and Applications, 101135.CrossrefGoogle Scholar
  • Han Y, Wang Y, Chen X (2021) Adversarial combinatorial bandits with general non-linear reward functions. Proc. Internat. Conf. on Machine Learn., 4030–4039.Google Scholar
  • He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. Proc. IEEE Internat. Conf. Comput. Vision (IEEE, New York), 1026–1034.Google Scholar
  • Kahn BE, Wansink B (2004) The influence of assortment structure on perceived variety and consumption quantities. J. Consumer Res. 30(4):519–533.CrossrefGoogle Scholar
  • Kireyev P, Pauwels K, Gupta S (2016) Do display ads influence search? Attribution and dynamics in online advertising. Internat. J. Res. Marketing 33(3):475–490.CrossrefGoogle Scholar
  • Kleinberg R, Leighton T (2003) The value of knowing a demand curve: Bounds on regret for online posted-price auctions. Proc. 44th Annual IEEE Sympos. on Foundations of Comput. Sci. (IEEE, New York), 594–605.Google Scholar
  • Koren Y, Bell R, Volinsky C (2009) Matrix factorization techniques for recommender systems. Computer 8:30–37.CrossrefGoogle Scholar
  • Koulouriotis DE, Xanthopoulos A (2008) Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems. Appl. Math. Comput. 196(2):913–922.CrossrefGoogle Scholar
  • Kuleshov V, Precup D (2014) Algorithms for multi-armed bandit problems. Preprint, submitted February 25, https://arxiv.org/abs/1402.6028.Google Scholar
  • Levine N, Crammer K, Mannor S (2017) Rotting bandits. Adv. Neural Inform. Processing Systems, 30.Google Scholar
  • Li L, Chu W, Langford J, Schapire RE (2010) A contextual-bandit approach to personalized news article recommendation. Proc. 19th Internat. Conf. World Wide Web (ACM, New York), 661–670.Google Scholar
  • Li L, Chu W, Langford J, Wang X (2011) Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. Proc. 4th ACM Internat. Conf. Web Search Data Mining (ACM, New York), 297–306.Google Scholar
  • Manchanda P, Dubé JP, Goh KY, Chintagunta PK (2006) The effect of banner advertising on internet purchasing. J. Marketing Res. 43(1):98–108.CrossrefGoogle Scholar
  • Marszałkowski J, Drozdowski M (2013) Optimization of column width in website layout for advertisement fit. Eur. J. Oper. Res. 226(3):592–601.CrossrefGoogle Scholar
  • Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing Atari with deep reinforcement learning. Preprint, submitted December 19, https://arxiv.org/abs/1312.5602.Google Scholar
  • Montgomery AL, Smith MD (2009) Prospects for personalization on the internet. J. Interactive Marketing 23(2):130–137.CrossrefGoogle Scholar
  • Oh M, Iyengar G (2021) Multinomial logit contextual bandits: Provable optimality and practicality. Proc. AAAI Conf. on Artificial Intelligence, vol. 35, 9205–9213.Google Scholar
  • Ontanón S (2017) Combinatorial multi-armed bandits for real-time strategy games. J. Artificial Intelligence Res. 58:665–702.CrossrefGoogle Scholar
  • Pandey S, Agarwal D, Chakrabarti D, Josifovski V (2007) Bandits for taxonomies: A model-based approach. Proc. 2007 SIAM Internat. Conf. Data Mining (SIAM, Philadelphia), 216–227.Google Scholar
  • Park CH, Park Y-H (2016) Investigating purchase conversion by uncovering online visit patterns. Marketing Sci. 35(6):894–914.LinkGoogle Scholar
  • Powell WB (2007) Approximate Dynamic Programming: Solving the Curses of Dimensionality, vol. 703 (John Wiley & Sons, Hoboken, NJ).CrossrefGoogle Scholar
  • Riquelme C, Tucker G, Snoek J (2018) Deep Bayesian bandits showdown: An empirical comparison of Bayesian deep networks for Thompson sampling. Preprint, submitted February 26, https://arxiv.org/abs/1802.09127.Google Scholar
  • Robbins H (1952) Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. (New Ser.) 58(5):527–535.CrossrefGoogle Scholar
  • Rossi PE, McCulloch RE, Allenby GM (1996) The value of purchase history data in target marketing. Marketing Sci. 15(4):321–340.LinkGoogle Scholar
  • Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536.CrossrefGoogle Scholar
  • Russac Y, Vernade C, Cappé O (2019) Weighted linear bandits for non-stationary environments. Adv. Neural Inform. Processing Systems, 32.Google Scholar
  • Russo D, Van Roy B (2014) Learning to optimize via posterior sampling. Math. Oper. Res. 39(4):1221–1243.LinkGoogle Scholar
  • Rutz OJ, Bucklin RE (2012) Does banner advertising affect browsing for brands? Clickstream choice model says yes, for some. Quant. Marketing Econom. 10(2):231–257.CrossrefGoogle Scholar
  • Sauré D, Zeevi A (2013) Optimal dynamic assortment planning with demand learning. Manufacturing Service Oper. Management 15(3):387–404.LinkGoogle Scholar
  • Schmidhuber J (2015) Deep learning in neural networks: An overview. Neural Networks 61:85–117.CrossrefGoogle Scholar
  • Schwartz EM, Bradlow ET, Fader PS (2017) Customer acquisition via display advertising using multi-armed bandit experiments. Marketing Sci. 36(4):500–522.LinkGoogle Scholar
  • Seznec J, Locatelli A, Carpentier A, Lazaric A, Valko M (2019) Rotting bandits are no harder than stochastic ones. 22nd Internat. Conf. Artificial Intelligence Statistics (PMLR, Long Beach, CA), 2564–2572.Google Scholar
  • Shepperd M, Cartwright M (2001) Predicting with sparse data. IEEE Trans. Software Engrg. 27(11):987–998.CrossrefGoogle Scholar
  • Simchi-Levi D, Xu Y (2022) Bypassing the monster: A faster and simpler optimal algorithm for contextual bandits under realizability. Math. Oper. Res. Forthcoming.Google Scholar
  • Simonson I, Tversky A (1992) Choice in context: Tradeoff contrast and extremeness aversion. J. Marketing Res. 29(3):281–295.CrossrefGoogle Scholar
  • Slivkins A (2019) Introduction to multi-armed bandits. Foundations Trends Machine Learn. 12(1–2):1–286.CrossrefGoogle Scholar
  • Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: A simple way to prevent neural networks from overfitting. J. Machine Learn. Res. 15(1):1929–1958.Google Scholar
  • Sutton RS, Barto AG (2018) Reinforcement learning: An introduction, 2nd ed. (MIT Press, Cambridge, MA).Google Scholar
  • Swaminathan A, Joachims T (2015) Counterfactual risk minimization: Learning from logged bandit feedback. Proc. Internat. Conf. on Machine Learn., 814–823.Google Scholar
  • Tang L, Rosales R, Singh A, Agarwal D (2013) Automatic ad format selection via contextual bandits. Proc. 22nd ACM Internat. Conf. Inform. Knowledge Management (ACM, New York), 1587–1594.Google Scholar
  • Tang L, Jiang Y, Li L, Zeng C, Li T (2015) Personalized recommendation via parameter-free contextual bandits. Proc. 38th Internat. ACM SIGIR Conf. Res. Development Inform. Retrieval (ACM, New York), 323–332.Google Scholar
  • Thomas P, Brunskill E (2016) Data-efficient off-policy policy evaluation for reinforcement learning. Proc. Internat. Conf. on Machine Learn., 2139–2148.Google Scholar
  • Thompson WR (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4):285–294.CrossrefGoogle Scholar
  • Tversky A, Simonson I (1993) Context-dependent preferences. Management Sci. 39(10):1179–1189.LinkGoogle Scholar
  • van Emden, Kaptein M (2018) Contextual: Evaluating contextual multi-armed bandit problems in R. Preprint, submitted July 8, https://arxiv.org/abs/1811.01926.Google Scholar
  • Verhoef PC, Kannan PK, Inman JJ (2015) From multi-channel retailing to omni-channel retailing: Introduction to the special issue on multi-channel retailing. J. Retailing 91(2):174–181.Google Scholar
  • Vermorel J, Mohri M (2005) Multi-armed bandit algorithms and empirical evaluation. Proc. Eur. Internat. on Machine Learn. (Springer, Berlin), 437–448.Google Scholar
  • Villar SS, Bowden J, Wason J (2015) Multi-armed bandit models for the optimal design of clinical trials: Benefits and challenges. Statist. Sci. 30(2):199.CrossrefGoogle Scholar
  • Wang Y, Ouyang H, Wang C, Chen J, Asamov T, Chang Y (2017) Efficient ordered combinatorial semi-bandits for whole-page recommendation. Proc. AAAI Conf. Artificial Intelligence, vol. 31, no. 1, 2746–2753.Google Scholar
  • Wen Z, Kveton B, Ashkan A (2015) Efficient learning in large-scale combinatorial semi-bandits. Proc. Internat. Conf. on Machine Learn., 1113–1122.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.