LOLA: LLM-Assisted Online Learning Algorithm for Content Experiments

Published Online:

References

  • Angelopoulos P, Lee K, Misra S (2024) Causal alignment: Augmenting language models with A/B tests. Preprint, submitted April 15, http://dx.doi.org/10.2139/ssrn.4781850.Google Scholar
  • Aramayo N, Schiappacasse M, Goic M (2023) A multiarmed bandit approach for house ads recommendations. Marketing Sci. 42(2):271–292.LinkGoogle Scholar
  • Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learning 47:235–256.CrossrefGoogle Scholar
  • Banerjee A, Urminsky O (2024) The language that drives engagement: A systematic large-scale analysis of headline experiments. Marketing Sci., ePub ahead of print November 4, https://doi.org/10.1287/mksc.2021.0018.Google Scholar
  • Brand J, Israeli A, Ngwe D (2023) Using LLMs for market research. Preprint, submitted March 30, http://dx.doi.org/10.2139/ssrn.4395751.Google Scholar
  • Brucks M, Toubia O (2023) Prompt architecture can induce methodological artifacts in large language models. Preprint, submitted June 25, http://dx.doi.org/10.2139/ssrn.4484416.Google Scholar
  • Bubeck S, Liu C-Y (2013) Prior-free and prior-dependent regret bounds for Thompson sampling. NIPS’13: Proc. 27th Internat. Conf. Neural Inform. Processing Systems, vol. 1 (Curran Associates Inc., Red Hook, NY), 638–646.Google Scholar
  • Chu W, Li L, Reyzin L, Schapire R (2011) Contextual bandits with linear payoff functions. Geoffrey G, David D, Miroslav D, eds. Proc. 14th Internat. Conf. Artificial Intelligence Statist., vol. 15 (PMLR, New York), 208–214.Google Scholar
  • Coenen A (2019) How New York Times is experimenting with recommendation algorithms. NYT Open (October 17), https://open.nytimes.com/how-the-new-york-times-is-experimenting-with-recommendation-algorithms-562f78624d26.Google Scholar
  • Fiez T, Nassif H, Chen Y-C, Gamez S, Jain L (2024) Best of three worlds: Adaptive experimentation for digital marketing in practice. Proc. ACM Web Conf. 2024 (Association for Computing Machinery, New York), 3586–3597Google Scholar
  • Garivier A, Kaufmann E (2016) Optimal best arm identification with fixed confidence. Vitaly F, Alexander R, Ohad S, eds. Conf. Learning Theory, vol 49 (PMLR, New York), 998–1027.Google Scholar
  • Gui G, Toubia O (2023) The challenge of using LLMs to simulate human behavior: A causal inference perspective. Preprint, submitted December 24, https://arxiv.org/abs/2312.15524.Google Scholar
  • Gur Y, Momeni A (2022) Adaptive sequential experiments with unknown information arrival processes. Manufacturing Service Oper. Management 24(5):2666–2684.LinkGoogle Scholar
  • Hauser JR, Urban GL, Liberali G, Braun M (2009) Website morphing. Marketing Sci. 28(2):202–223.LinkGoogle Scholar
  • Houlsby N, Giurgiu A, Jastrzebski S, Morrone B, De Laroussilhe Q, Gesmundo A, Attariyan M, Gelly S (2019) Parameter-efficient transfer learning for NLP. Kamalika C, Ruslan S, eds. Proc. 36th Internat. Conf. Machine Learning, vol. 97 (PMLR, New York), 2790–2799.Google Scholar
  • Hu EJ, Shen Y, Wallis P, Allen-Zhu Z, Li Y, Wang S, Wang L, Chen W (2022) LoRA: Low-rank adaptation of large language models. Conf. Paper at ICLR 2022, vol. 2 (OpenReview.net), 3.Google Scholar
  • Jain L, Li Z, Loghmani E, Mason B, Yoganarasimhan H (2024) Effective adaptive exploration of prices and promotions in choice-based demand models. Marketing Sci. 43(5):1002–1030.LinkGoogle Scholar
  • Kumar M, Kapoor A (2023) Generative AI and personalized video advertisements. Preprint, submitted November 17, http://dx.doi.org/10.2139/ssrn.4614118.Google Scholar
  • Lattimore T, Szepesvári C (2020) Bandit Algorithms (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Li XL, Liang P (2021) Prefix-tuning: Optimizing continuous prompts for generation. Preprint, submitted January 1, https://arxiv.org/abs/2101.00190.Google Scholar
  • Li L, Chu W, Langford J, Schapire RE (2010) A contextual-bandit approach to personalized news article recommendation. WWW’10: Proc. 19th Internat. Conf. World Wide Web (Association for Computing Machinery, New York), 661–670.Google Scholar
  • Li P, Castelo N, Katona Z, Sarvary M (2024) Frontiers: Determining the validity of large language models for automated perceptual analysis. Marketing Sci. 43(2):254–266.LinkGoogle Scholar
  • Liberali G, Ferecatu A (2022) Morphing for consumer dynamics: Bandits meet hidden Markov models. Marketing Sci. 41(4):769–794.LinkGoogle Scholar
  • Mason B, Jain L, Tripathy A, Nowak R (2020) Finding all ϵ-good arms in stochastic bandits. NIPS’20: Proc. 34th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 20707–20718.Google Scholar
  • Matias JN, Munger K, Le Quere MA, Ebersole C (2021) The Upworthy Research Archive, a time series of 32,487 experiments in US media. Sci. Data 8(1):195.CrossrefGoogle Scholar
  • Min S, Moallemi CC, Russo DJ (2020) Policy gradient optimization of Thompson sampling policies. Preprint, submitted June 30, https://arxiv.org/abs/2006.16507.Google Scholar
  • Min S, Lyu X, Holtzman A, Artetxe M, Lewis M, Hajishirzi H, Zettlemoyer L (2022) Rethinking the role of demonstrations: What makes in-context learning work? Preprint, submitted February 25, https://arxiv.org/abs/2202.12837.Google Scholar
  • Misra K, Schwartz EM, Abernethy J (2019) Dynamic online pricing with incomplete information using multiarmed bandit experiments. Marketing Sci. 38(2):226–252.LinkGoogle Scholar
  • Patil R, Boit S, Gudivada V, Nandigam J (2023) A survey of text representation and embedding techniques in NLP. IEEE Access 11:36120–36146.CrossrefGoogle Scholar
  • Rafieian O, Yoganarasimhan H (2021) Targeting and privacy in mobile advertising. Marketing Sci. 40(2):193–218.LinkGoogle Scholar
  • Rafieian O, Yoganarasimhan H (2023) AI and personalization. Artificial Intelligence in Marketing (Emerald Publishing Limited, Leeds, UK), 77–102.CrossrefGoogle Scholar
  • Reneau A (2023) Study of Upworthy headlines claims negativity drives website clicks. We have some thoughts. Upworthy (March 22), https://www.upworthy.com/upworthy-negative-headlines-study.Google Scholar
  • Russo D, Van Roy B (2014) Learning to optimize via posterior sampling. Math. Oper. Res. 39(4):1221–1243.LinkGoogle Scholar
  • Saunshi N, Malladi S, Arora S (2020) A mathematical exploration of why language models help solve downstream tasks. Preprint, submitted October 7, https://arxiv.org/abs/2010.03648.Google Scholar
  • Schulhoff S, Ilie M, Balepur N, Kahadze K, Liu A, Si C, Li Y, et al. (2024) The prompt report: A systematic survey of prompting techniques. Preprint, submitted June 6, https://arxiv.org/abs/2406.06608.Google Scholar
  • Schwartz EM, Bradlow ET, Fader PS (2017) Customer acquisition via display advertising using multi-armed bandit experiments. Marketing Sci. 36(4):500–522.LinkGoogle Scholar
  • Simchi-Levi D, Xu Y (2022) Bypassing the monster: A faster and simpler optimal algorithm for contextual bandits under realizability. Math. Oper. Res. 47(3):1904–1931.LinkGoogle Scholar
  • Symonds A (2017) When a headline makes headlines of its own. New York Times (March 23), https://www.nytimes.com/2017/03/23/insider/headline-trump-time-interview.html.Google Scholar
  • Xie SM, Min S (2022) How does in-context learning work? A framework for understanding the differences from traditional supervised learning. SAIL Blog (August 1), http://ai.stanford.edu/blog/understanding-incontext/.Google Scholar
  • Xie SM, Raghunathan A, Liang P, Ma T (2021) An explanation of in-context learning as implicit Bayesian inference. Preprint, submitted November 3, https://arxiv.org/abs/2111.02080.Google Scholar
  • Yang K (2024) Milestones on our journey to standardize experimentation at New York Times. NYT Open (March 26), https://open.nytimes.com/milestones-on-our-journey-to-standardize-experimentation-at-the-new-york-times-2c6d32db0281.Google Scholar
  • Yoganarasimhan H (2020) Search personalization using machine learning. Management Sci. 66(3):1045–1070.LinkGoogle Scholar
  • Yoganarasimhan H, Yakovetskaya I (2024) From feeds to inboxes: A comparative study of polarization in Facebook and email news sharing. Management Sci. 70(9):6461–6472.AbstractGoogle Scholar
  • Zaken EB, Ravfogel S, Goldberg Y (2021) Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. Preprint, submitted June 18, https://arxiv.org/abs/2106.10199.Google Scholar
  • Zhao J, Wang T, Abid W, Angus G, Garg A, Kinnison J, Sherstinsky A, Molino P, Addair T, Rishi D (2024) LoRA land: 310 fine-tuned LLMs that rival GPT-4, a technical report. Preprint, submitted April 29, https://arxiv.org/abs/2405.00732.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.