Large Language Models for Market Research: A Data-Augmentation Approach

Published Online:https://doi.org/10.1287/mksc.2025.0009

References

  • Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, Aleman FL, Almeida D, et al. (2023) GPT-4 technical report. Preprint, submitted March 15, https://arxiv.org/abs/2303.08774.Google Scholar
  • Allenby GM, Rossi PE (2006) Hierarchical Bayes models. The Handbook of Marketing Research: Uses, Misuses, and Future Advances (SAGE Publications, Thousand Oaks, CA), 418–440.CrossrefGoogle Scholar
  • Angelopoulos AN, Duchi JC, Zrnic T (2023a) PPI++: Efficient prediction-powered inference. Preprint, submitted November 2, https://arxiv.org/abs/2311.01453.Google Scholar
  • Angelopoulos AN, Bates S, Fannjiang C, Jordan MI, Zrnic T (2023b) Prediction-powered inference. Science 382(6671):669–674.CrossrefGoogle Scholar
  • Argyle LP, Busby EC, Fulda N, Gubler JR, Rytting C, Wingate D (2023) Out of one, many: Using language models to simulate human samples. Political Anal. 31(3):337–351.CrossrefGoogle Scholar
  • Bastani H, Zhang DJ, Zhang H (2022) Applied machine learning in operations management. Innovative Technology at the Interface of Finance and Operations: Volume I (Springer Nature), 189–222.CrossrefGoogle Scholar
  • Beltagy I, Lo K, Cohan A (2019) Scibert: A pretrained language model for scientific text. Preprint, submitted March 26, https://arxiv.org/abs/1903.10676.Google Scholar
  • Bound J, Brown C, Mathiowetz N (2001) Measurement error in survey data. Handbook of Econometrics, vol. 5 (Elsevier, Amsterdam), 3705–3843.CrossrefGoogle Scholar
  • Brand J, Israeli A, Ngwe D (2023) Using LLMs for market research. Preprint, submitted March 30, https://doi.org/10.2139/ssrn.4395751.Google Scholar
  • Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, et al. (2020) Language models are few-shot learners. Preprint, submitted May 28, https://arxiv.org/abs/2005.14165.Google Scholar
  • Chardon H, Lerasle M, Mourtada J (2024) Finite-sample performance of the maximum likelihood estimator in logistic regression. Preprint, submitted November 4, https://arxiv.org/abs/2411.02137.Google Scholar
  • Chen Y, Liu TX, Shan Y, Zhong S (2023) The emergence of economic rationality of GPT. Proc. Natl. Acad. Sci. USA 120(51):e2316205120.CrossrefGoogle Scholar
  • Chen X, Owen Z, Pixton C, Simchi-Levi D (2022) A statistical learning approach to personalization in revenue management. Management Sci. 68(3):1923–1937.LinkGoogle Scholar
  • Choi T-M, Kumar S, Yue X, Chan H-L (2022) Disruptive technologies and operations management in the industry 4.0 era and beyond. Production Oper. Management 31(1):9–31.CrossrefGoogle Scholar
  • Chomsky N (1956) Three models for the description of language. IEEE Trans. Inform. Theory 2(3):113–124.CrossrefGoogle Scholar
  • Connell P, Choi JH (2024) Estimating and correcting for misclassification error in empirical textual research. Preprint, submitted September 5, https://doi.org/10.2139/ssrn.4913179.Google Scholar
  • Devlin J, Chang M-W, Lee K, Toutanova K (2018) BERT: Pre-training of deep bidirectional transformers for language understanding. Preprint, submitted October 11, https://arxiv.org/abs/1810.04805.Google Scholar
  • Dzyabura D, Jagabathula S (2018) Offline assortment optimization in the presence of an online channel. Management Sci. 64(6):2767–2786.LinkGoogle Scholar
  • Eggers F, Sattler H, Teichert T, Völckner F (2022) Choice-based conjoint analysis. Handbook of Market Research (Springer, Cham, Switzerland), 781–819.CrossrefGoogle Scholar
  • Feller W (1971) An Introduction to Probability Theory and Its Applications, Volume II, 2nd ed. (John Wiley & Sons, New York).Google Scholar
  • Girotra K, Meincke L, Terwiesch C, Ulrich KT (2023) Ideas are dimes a dozen: Large language models for idea generation in innovation. Preprint, submitted August 2, https://doi.org/10.2139/ssrn.4526071.Google Scholar
  • Goli A, Singh A (2024) Frontiers: Can large language models capture human preferences? Marketing Sci. 43(4):709–722.LinkGoogle Scholar
  • Green PE, Srinivasan V (1978) Conjoint analysis in consumer research: Issues and outlook. J. Consumer Res. 5(2):103–123.CrossrefGoogle Scholar
  • Green PE, Srinivasan V (1990) Conjoint analysis in marketing: New developments with implications for research and practice. J. Marketing 54(4):3–19.CrossrefGoogle Scholar
  • Gui G, Toubia O (2023) The challenge of using LLMs to simulate human behavior: A causal inference perspective. Preprint, submitted December 24, https://arxiv.org/abs/2312.15524.Google Scholar
  • Gururangan S, Marasović A, Swayamdipta S, Lo K, Beltagy I, Downey D, Smith NA (2020) Don’t stop pretraining: Adapt language models to domains and tasks. Preprint, submitted April 23, https://arxiv.org/abs/2004.10964.Google Scholar
  • Hair J Jr, Page M, Brunsveld N (2019) Essentials of Business Research Methods, 4th ed. (Routledge, New York).CrossrefGoogle Scholar
  • Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. Preprint, submitted March 9, https://arxiv.org/abs/1503.02531.Google Scholar
  • Horton JJ (2023) Large language models as simulated economic agents: What can we learn from homo silicus? NBER Working Paper No. 31122, National Bureau of Economic Research, Cambridge, MA.Google Scholar
  • Huang Y, Yuan Z, Zhou Y, Guo K, Wang X, Zhuang H, Sun W, et al. (2024) Social science meets LLMs: How reliable are large language models in social simulations? Preprint, submitted October 30, https://arxiv.org/abs/2410.23426.Google Scholar
  • HuggingFace (2024) Meta-LLaMA. Accessed August 31, 2024, https://huggingface.co/meta-llama/Meta-Llama-3-8B#:∼:text=Training%20Data,over%2010M%20human%2Dannotated%20examples.Google Scholar
  • Ji W, Lei L, Zrnic T (2025) Predictions as surrogates: Revisiting surrogate outcomes in the age of AI. Preprint, submitted January 16, https://arxiv.org/abs/2501.09731.Google Scholar
  • Kessels R, Goos P, Vandebroek M (2008) Optimal designs for conjoint experiments. Comput. Statist. Data Anal. 52(5):2369–2387.CrossrefGoogle Scholar
  • Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. Preprint, submitted December 22, https://arxiv.org/abs/1412.6980.Google Scholar
  • Kohli R, Sukumar R (1990) Heuristics for product-line design using conjoint analysis. Management Sci. 36(12):1464–1478.LinkGoogle Scholar
  • Kreps S, Prasad S, Brownstein JS, Hswen Y, Garibaldi BT, Zhang B, Kriner DL (2020) Factors associated with US adults’ likelihood of accepting COVID-19 vaccination. JAMA Network Open 3(10):e2025594.CrossrefGoogle Scholar
  • Ludwig J, Mullainathan S, Rambachan A (2024) Large language models: An applied econometric framework. Preprint, submitted December 9, https://arxiv.org/abs/2412.07031.Google Scholar
  • Naveed H, Ullah Khan A, Qiu S, Saqib M, Anwar S, Usman M, Akhtar N, et al. (2023) A comprehensive overview of large language models. Preprint, submitted July 12, https://arxiv.org/abs/2307.06435.Google Scholar
  • Newey WK, McFadden D (1994) Large sample estimation and hypothesis testing. Handbook of Econometrics, vol. 4 (North Holland, Amsterdam), 2111–2245.CrossrefGoogle Scholar
  • Olsen TL, Tomlin B (2020) Industry 4.0: Opportunities and challenges for operations management. Manufacturing Service Oper. Management 22(1):113–122.LinkGoogle Scholar
  • Pan SJ, Yang Q (2009) A survey on transfer learning. IEEE Trans. Knowledge Data Engrg. 22(10):1345–1359.CrossrefGoogle Scholar
  • Parthasarathy VB, Zafar A, Khan A, Shahid A (2024) The ultimate guide to fine-tuning LLMs from basics to breakthroughs: An exhaustive review of technologies, research, best practices, applied research challenges and opportunities. Preprint, submitted August 23, https://arxiv.org/abs/2408.13296.Google Scholar
  • Peng A, Allard J, Heidel S (2024) Fine-tuning now available for GPT-4o. Accessed December 15, 2024, https://openai.com/index/gpt-4o-fine-tuning/.Google Scholar
  • Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training. OpenAI. Accessed June 3, 2025, https://openai.com/index/language-unsupervised/.Google Scholar
  • Raschka S (2018) Model evaluation, model selection, and algorithm selection in machine learning. Preprint, submitted November 13, https://arxiv.org/abs/1811.12808.Google Scholar
  • Shane SA, Ulrich KT (2004) 50th anniversary article: Technological innovation, product development, and entrepreneurship in management science. Management Sci. 50(2):133–144.LinkGoogle Scholar
  • Solomon MR (2020) Consumer Behavior: Buying, Having, and Being (Pearson, Harlow, England).Google Scholar
  • Spencer V (2019) Choice modeling sports cars. Accessed October 9, 2024, https://github.com/spensorflow/Marketing-Analytics---Choice-Modeling-Sports-Car-Sales.Google Scholar
  • Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. Preprint, submitted September 10, https://arxiv.org/abs/1409.3215.Google Scholar
  • Teixeira L (2023) Prompt engineering: Compressing text to ideas and decompressing back with sparse priming representations. Accessed December 29, 2024, https://medium.com/@lawrenceteixeira/prompt-engineering-compressing-text-to-ideas-and-decompressing-back.Google Scholar
  • Terwiesch C (2019) Om forum—Empirical research in operations management: From field studies to analyzing digital exhaust. Manufacturing Service Oper. Management 21(4):713–722.LinkGoogle Scholar
  • Van der Vaart AW (2000) Asymptotic Statistics, vol. 3 (Cambridge University Press).Google Scholar
  • Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, et al. (2017) Attention is all you need. Guyon I, von Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan SVN, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 30 (Curran Associates, Inc., Red Hook, NY), 5998–6008.Google Scholar
  • Wainwright MJ (2019) High-Dimensional Statistics: A Non-Asymptotic Viewpoint, vol. 48 (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Wang X, Camm JD, Curry DJ (2009) A branch-and-price approach to the share-of-choice product line design problem. Management Sci. 55(10):1718–1728.LinkGoogle Scholar
  • Wei J, Wang X, Schuurmans D, Bosma M, Xia F, Chi E, Le QV, et al. (2022) Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inform. Processing Systems, vol. 35 (Curran Associates Inc., Red Hook, NY), 24824–24837.Google Scholar
  • Yang K, Li H, Wen H, Peng T-Q, Tang J, Liu H (2024) Are large language models (LLMs) good social predictors? Preprint, submitted February 20, https://arxiv.org/abs/2402.12620.Google Scholar
  • Yao S, Yu D, Zhao J, Shafran I, Griffiths T, Cao Y, Narasimhan K (2023) Tree of thoughts: Deliberate problem solving with large language models. Oh A, Naumann T, Globerson A, Saenko K, Hardt M, Levine S, eds. Adv. Neural Inform. Processing Systems, vol. 36 (Curran Associates, Inc., Red Hook, NY), 11809–11822.Google Scholar
  • Yoo Y, Henfridsson O, Kallinikos J, Gregory R, Burtch G, Chatterjee S, Sarker S (2024) The next frontiers of digital innovation research. Inform. Systems Res. 35(4):1507–1523.LinkGoogle Scholar
  • Zhang J, Xue W, Yu Y, Tan Y (2023) Debiasing ML-or AI-generated regressors in partial linear models. Preprint, submitted November 30, https://doi.org/10.2139/ssrn.4636026.Google Scholar
  • Zhuang F, Qi Z, Duan K, Xi D, Zhu Y, Zhu H, Xiong H, et al. (2020) A comprehensive survey on transfer learning. Proc. IEEE 109(1):43–76.CrossrefGoogle Scholar
  • Ziems C, Held W, Shaikh O, Chen J, Zhang Z, Yang D (2024) Can large language models transform computational social science? Comput. Linguist. 50(1):237–291.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.