Large Language Models for Market Research: A Data-Augmentation Approach
Published Online:17 Mar 2026https://doi.org/10.1287/mksc.2025.0009
References
- (2023) GPT-4 technical report. Preprint, submitted March 15, https://arxiv.org/abs/2303.08774.Google Scholar
- (2006) Hierarchical Bayes models. The Handbook of Marketing Research: Uses, Misuses, and Future Advances (SAGE Publications, Thousand Oaks, CA), 418–440.Crossref, Google Scholar
- (2023a) PPI++: Efficient prediction-powered inference. Preprint, submitted November 2, https://arxiv.org/abs/2311.01453.Google Scholar
- (2023b) Prediction-powered inference. Science 382(6671):669–674.Crossref, Google Scholar
- (2023) Out of one, many: Using language models to simulate human samples. Political Anal. 31(3):337–351.Crossref, Google Scholar
- (2022) Applied machine learning in operations management. Innovative Technology at the Interface of Finance and Operations: Volume I (Springer Nature), 189–222.Crossref, Google Scholar
- (2019) Scibert: A pretrained language model for scientific text. Preprint, submitted March 26, https://arxiv.org/abs/1903.10676.Google Scholar
- (2001) Measurement error in survey data. Handbook of Econometrics, vol. 5 (Elsevier, Amsterdam), 3705–3843.Crossref, Google Scholar
- (2023) Using LLMs for market research. Preprint, submitted March 30, https://doi.org/10.2139/ssrn.4395751.Google Scholar
- (2020) Language models are few-shot learners. Preprint, submitted May 28, https://arxiv.org/abs/2005.14165.Google Scholar
- (2024) Finite-sample performance of the maximum likelihood estimator in logistic regression. Preprint, submitted November 4, https://arxiv.org/abs/2411.02137.Google Scholar
- (2023) The emergence of economic rationality of GPT. Proc. Natl. Acad. Sci. USA 120(51):e2316205120.Crossref, Google Scholar
- (2022) A statistical learning approach to personalization in revenue management. Management Sci. 68(3):1923–1937.Link, Google Scholar
- (2022) Disruptive technologies and operations management in the industry 4.0 era and beyond. Production Oper. Management 31(1):9–31.Crossref, Google Scholar
- (1956) Three models for the description of language. IEEE Trans. Inform. Theory 2(3):113–124.Crossref, Google Scholar
- (2024) Estimating and correcting for misclassification error in empirical textual research. Preprint, submitted September 5, https://doi.org/10.2139/ssrn.4913179.Google Scholar
- (2018) BERT: Pre-training of deep bidirectional transformers for language understanding. Preprint, submitted October 11, https://arxiv.org/abs/1810.04805.Google Scholar
- (2018) Offline assortment optimization in the presence of an online channel. Management Sci. 64(6):2767–2786.Link, Google Scholar
- (2022) Choice-based conjoint analysis. Handbook of Market Research (Springer, Cham, Switzerland), 781–819.Crossref, Google Scholar
- (1971) An Introduction to Probability Theory and Its Applications, Volume II, 2nd ed. (John Wiley & Sons, New York).Google Scholar
- (2023) Ideas are dimes a dozen: Large language models for idea generation in innovation. Preprint, submitted August 2, https://doi.org/10.2139/ssrn.4526071.Google Scholar
- (2024) Frontiers: Can large language models capture human preferences? Marketing Sci. 43(4):709–722.Link, Google Scholar
- (1978) Conjoint analysis in consumer research: Issues and outlook. J. Consumer Res. 5(2):103–123.Crossref, Google Scholar
- (1990) Conjoint analysis in marketing: New developments with implications for research and practice. J. Marketing 54(4):3–19.Crossref, Google Scholar
- (2023) The challenge of using LLMs to simulate human behavior: A causal inference perspective. Preprint, submitted December 24, https://arxiv.org/abs/2312.15524.Google Scholar
- (2020) Don’t stop pretraining: Adapt language models to domains and tasks. Preprint, submitted April 23, https://arxiv.org/abs/2004.10964.Google Scholar
- (2019) Essentials of Business Research Methods, 4th ed. (Routledge, New York).Crossref, Google Scholar
- (2015) Distilling the knowledge in a neural network. Preprint, submitted March 9, https://arxiv.org/abs/1503.02531.Google Scholar
- (2023) Large language models as simulated economic agents: What can we learn from homo silicus? NBER Working Paper No. 31122, National Bureau of Economic Research, Cambridge, MA.Google Scholar
- (2024) Social science meets LLMs: How reliable are large language models in social simulations? Preprint, submitted October 30, https://arxiv.org/abs/2410.23426.Google Scholar
- HuggingFace (2024) Meta-LLaMA. Accessed August 31, 2024, https://huggingface.co/meta-llama/Meta-Llama-3-8B#:∼:text=Training%20Data,over%2010M%20human%2Dannotated%20examples.Google Scholar
- (2025) Predictions as surrogates: Revisiting surrogate outcomes in the age of AI. Preprint, submitted January 16, https://arxiv.org/abs/2501.09731.Google Scholar
- (2008) Optimal designs for conjoint experiments. Comput. Statist. Data Anal. 52(5):2369–2387.Crossref, Google Scholar
- (2014) Adam: A method for stochastic optimization. Preprint, submitted December 22, https://arxiv.org/abs/1412.6980.Google Scholar
- (1990) Heuristics for product-line design using conjoint analysis. Management Sci. 36(12):1464–1478.Link, Google Scholar
- (2020) Factors associated with US adults’ likelihood of accepting COVID-19 vaccination. JAMA Network Open 3(10):e2025594.Crossref, Google Scholar
- (2024) Large language models: An applied econometric framework. Preprint, submitted December 9, https://arxiv.org/abs/2412.07031.Google Scholar
- (2023) A comprehensive overview of large language models. Preprint, submitted July 12, https://arxiv.org/abs/2307.06435.Google Scholar
- (1994) Large sample estimation and hypothesis testing. Handbook of Econometrics, vol. 4 (North Holland, Amsterdam), 2111–2245.Crossref, Google Scholar
- (2020) Industry 4.0: Opportunities and challenges for operations management. Manufacturing Service Oper. Management 22(1):113–122.Link, Google Scholar
- (2009) A survey on transfer learning. IEEE Trans. Knowledge Data Engrg. 22(10):1345–1359.Crossref, Google Scholar
- (2024) The ultimate guide to fine-tuning LLMs from basics to breakthroughs: An exhaustive review of technologies, research, best practices, applied research challenges and opportunities. Preprint, submitted August 23, https://arxiv.org/abs/2408.13296.Google Scholar
- (2024) Fine-tuning now available for GPT-4o. Accessed December 15, 2024, https://openai.com/index/gpt-4o-fine-tuning/.Google Scholar
- (2018) Improving language understanding by generative pre-training. OpenAI. Accessed June 3, 2025, https://openai.com/index/language-unsupervised/.Google Scholar
- (2018) Model evaluation, model selection, and algorithm selection in machine learning. Preprint, submitted November 13, https://arxiv.org/abs/1811.12808.Google Scholar
- (2004) 50th anniversary article: Technological innovation, product development, and entrepreneurship in management science. Management Sci. 50(2):133–144.Link, Google Scholar
- (2020) Consumer Behavior: Buying, Having, and Being (Pearson, Harlow, England).Google Scholar
- (2019) Choice modeling sports cars. Accessed October 9, 2024, https://github.com/spensorflow/Marketing-Analytics---Choice-Modeling-Sports-Car-Sales.Google Scholar
- (2014) Sequence to sequence learning with neural networks. Preprint, submitted September 10, https://arxiv.org/abs/1409.3215.Google Scholar
- (2023) Prompt engineering: Compressing text to ideas and decompressing back with sparse priming representations. Accessed December 29, 2024, https://medium.com/@lawrenceteixeira/prompt-engineering-compressing-text-to-ideas-and-decompressing-back.Google Scholar
- (2019) Om forum—Empirical research in operations management: From field studies to analyzing digital exhaust. Manufacturing Service Oper. Management 21(4):713–722.Link, Google Scholar
- (2000) Asymptotic Statistics, vol. 3 (Cambridge University Press).Google Scholar
- (2017) Attention is all you need. Guyon I, von Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan SVN, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 30 (Curran Associates, Inc., Red Hook, NY), 5998–6008.Google Scholar
- (2019) High-Dimensional Statistics: A Non-Asymptotic Viewpoint, vol. 48 (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- (2009) A branch-and-price approach to the share-of-choice product line design problem. Management Sci. 55(10):1718–1728.Link, Google Scholar
- (2022) Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inform. Processing Systems, vol. 35 (Curran Associates Inc., Red Hook, NY), 24824–24837.Google Scholar
- (2024) Are large language models (LLMs) good social predictors? Preprint, submitted February 20, https://arxiv.org/abs/2402.12620.Google Scholar
- (2023) Tree of thoughts: Deliberate problem solving with large language models. Oh A, Naumann T, Globerson A, Saenko K, Hardt M, Levine S, eds. Adv. Neural Inform. Processing Systems, vol. 36 (Curran Associates, Inc., Red Hook, NY), 11809–11822.Google Scholar
- (2024) The next frontiers of digital innovation research. Inform. Systems Res. 35(4):1507–1523.Link, Google Scholar
- (2023) Debiasing ML-or AI-generated regressors in partial linear models. Preprint, submitted November 30, https://doi.org/10.2139/ssrn.4636026.Google Scholar
- (2020) A comprehensive survey on transfer learning. Proc. IEEE 109(1):43–76.Crossref, Google Scholar
- (2024) Can large language models transform computational social science? Comput. Linguist. 50(1):237–291.Crossref, Google Scholar

