A Manager and an AI Walk into a Bar: Does ChatGPT Make Biased Decisions Like We Do?

Published Online:https://doi.org/10.1287/msom.2023.0279

References

  • Agrawal A, Gans J, Goldfarb A (2022) Prediction Machines, Updated and Expanded: The Simple Economics of Artificial Intelligence (Harvard Business Press, Cambridge, MA).Google Scholar
  • Akata E, Schulz L, Coda-Forno J, Oh SJ, Bethge M, Schulz E (2023) Playing repeated games with large language models. Preprint, submitted May 26, https://arxiv.org/abs/2305.16867.Google Scholar
  • Argyle LP, Busby EC, Fulda N, Gubler JR, Rytting C, Wingate D (2023) Out of one, many: Using language models to simulate human samples. Political Anal. 31(3):337–351.CrossrefGoogle Scholar
  • Baucells M, Osadchiy N, Ovchinnikov A (2017) Behavioral anomalies in consumer wait-or-buy decisions and their implications for markdown management. Oper. Res. 65(2):357–378.LinkGoogle Scholar
  • Becker-Peth M, Katok E, Thonemann UW (2013) Designing buyback contracts for irrational but predictable newsvendors. Management Sci. 59(8):1800–1816.LinkGoogle Scholar
  • Becker-Peth M, Thonemann UW, Gully T (2018) A note on the risk aversion of informed newsvendors. J. Oper. Res. Soc. 69(7):1135–1145.CrossrefGoogle Scholar
  • Binz M, Schulz E (2023) Using cognitive psychology to understand GPT-3. Proc. Natl. Acad. Sci. USA 120(6):e2218523120.CrossrefGoogle Scholar
  • Bolton GE, Katok E (2008) Learning by doing in the newsvendor problem: A laboratory investigation of the role of experience and feedback. Manufacturing Service Oper. Management. 10(3):519–538.LinkGoogle Scholar
  • Brand J, Israeli A, Ngwe D (2023) Using LLMs for market research. Preprint, submitted March 30, http://dx.doi.org/10.2139/ssrn.4395751.Google Scholar
  • Brookins P, DeBacker JM (2023) Playing games with GPT: What can we learn about a large language model from canonical strategic games? Preprint, submitted July 10, http://dx.doi.org/10.2139/ssrn.4493398.Google Scholar
  • Chen Y, Liu TX, Shan Y, Zhong S (2023) The emergence of economic rationality of GPT. Preprint, submitted May 22, https://arxiv.org/abs/2305.12763.Google Scholar
  • Dasgupta I, Lampinen AK, Chan SC, Creswell A, Kumaran D, McClelland JL, Hill F (2022) Language models show human-like content effects on reasoning tasks. Preprint, submitted July 14, https://arxiv.org/abs/2207.07051.Google Scholar
  • Davis AM (2018) Biases in individual decision-making. Donohue K, Katok E, Leider S, eds. The Handbook of Behavioral Operations (Wiley, Hoboken, NJ), 149–198.CrossrefGoogle Scholar
  • Davis AM, Mankad S, Corbett CJ, Katok E (2024) OM Forum—The best of both worlds: Machine learning and behavioral science in operations management. Manufacturing Service Oper. Management. 26(5):1605–1621.LinkGoogle Scholar
  • Davis AM, Flicker B, Hyndman K, Katok E, Keppler S, Leider S, Long X, Tong JD (2023) A replication study of operations management experiments in management science. Management Sci. 69(9):4977–4991.LinkGoogle Scholar
  • Dou Z (2023) Exploring GPT-3 model’s capability in passing the Sally-Anne test a preliminary study in two languages. Preprint, submitted February 9, https://doi.org/10.31219/osf.io/8r3ma.Google Scholar
  • Fennell E (2023) Action identification characteristics and priming effects in ChatGPT. Preprint, submitted May 12, https://doi.org/10.31234/osf.io/aqbvk.Google Scholar
  • Greenland S, Senn SJ, Rothman KJ, Carlin JB, Poole C, Goodman SN, Altman DG (2016) Statistical tests, P values, confidence intervals, and power: A guide to misinterpretations. Eur. J. Epidemiology 31(4):337–350.CrossrefGoogle Scholar
  • Hagendorff T, Fabi S, Kosinski M (2023) Human-like intuitive behavior and reasoning biases emerged in large language models but disappeared in ChatGPT. Nature Comput. Sci. 3(10):833–838.CrossrefGoogle Scholar
  • Horton JJ (2023) Large language models as simulated economic agents: What can we learn from Homo silicus? Preprint, submitted January 18, https://arxiv.org/abs/2301.07543.Google Scholar
  • Jackson I, Ivanov D, Dolgui A, Namdar J (2024) Generative artificial intelligence in supply chain and operations management: A capability-based framework for analysis and implementation. Internat. J. Production Res. 62(17):6120–6145.CrossrefGoogle Scholar
  • Kirshner SN (2024a) Artificial agents and operations management decision-making. Preprint, submitted March 13, http://dx.doi.org/10.2139/ssrn.4726933.Google Scholar
  • Kirshner SN (2024b) GPT and CLT: The impact of ChatGPT’s level of abstraction on consumer recommendations. J. Retailing Consumer Services 76:103580.CrossrefGoogle Scholar
  • Kremer M, Debo L (2016) Inferring quality from wait time. Management Sci. 62(10):3023–3038.LinkGoogle Scholar
  • Leng Y (2024) Can LLMs mimic human-like mental accounting and behavioral biases? Preprint, submitted February 13, http://dx.doi.org/10.2139/ssrn.4705130.Google Scholar
  • Leng Y, Yuan Y (2024) Do LLM agents exhibit social behavior? Preprint, submitted December 23, https://arxiv.org/abs/2312.15198.Google Scholar
  • Li P, Castelo N, Katona Z, Sarvary M (2024) Frontiers: Determining the validity of large language models for automated perceptual analysis. Marketing Sci. 43(2):254–266.LinkGoogle Scholar
  • Long X, Nasiry J (2015) Prospect theory explains newsvendor behavior: The role of reference points. Management Sci. 61(12):3009–3012.LinkGoogle Scholar
  • Ma D, Zhang T, Saunders M (2023) Is ChatGPT humanly irrational? Preprint, submitted September 22, https://doi.org/10.21203/rs.3.rs-3220513/v1.Google Scholar
  • Macmillan-Scott O, Musolesi M (2024) (Ir)rationality and cognitive biases in large language models. Preprint, submitted February 14, https://arxiv.org/abs/2402.09193.Google Scholar
  • Mei Q, Xie Y, Yuan W, Jackson MO (2024) A Turing test of whether AI chatbots are behaviorally similar to humans. Proc. Natl. Acad. Sci. USA 121(9):e2313925121.CrossrefGoogle Scholar
  • Meng J (2024) AI emerges as the frontier in behavioral science. Proc. Natl. Acad. Sci. USA 121(10):e2401336121.CrossrefGoogle Scholar
  • Noy S, Zhang W (2023) Experimental evidence on the productivity effects of generative artificial intelligence. Science 381(6654):187–192.CrossrefGoogle Scholar
  • Ovchinnikov A, Moritz B, Quiroga BF (2015) How to compete against a behavioral newsvendor. Production Oper. Management 24(11):1783–1793.CrossrefGoogle Scholar
  • Özer Ö, Zheng Y (2016) Markdown or everyday low price? The role of behavioral motives. Management Sci. 62(2):326–346.LinkGoogle Scholar
  • Park PS, Schoenegger P, Zhu C (2024) Diminished diversity-of-thought in a standard large language model. Behav. Res. Methods 56(6):5754–5770.CrossrefGoogle Scholar
  • Phelps S, Russell YI (2023) The Machine Psychology of Cooperation: Can GPT models operationalise prompts for altruism, cooperation, competitiveness and selfishness in economic games? Preprint, submitted May 13, https://arxiv.org/abs/2305.07970.Google Scholar
  • Ren Y, Croson R (2013) Overconfidence in newsvendor orders: An experimental study. Management Sci. 59(11):2502–2517.LinkGoogle Scholar
  • Su J, Lang Y, Chen KY (2023) Can AI solve newsvendor problem without making biased decisions? A behavioral experimental study. Preprint, submitted September 14, http://dx.doi.org/10.2139/ssrn.4567157.Google Scholar
  • Suri G, Slater LR, Ziaee A, Nguyen M (2024) Do large language models show decision heuristics similar to humans? A case study using GPT-3.5. J. Experiment. Psych. General 153(4):1066–1075.CrossrefGoogle Scholar
  • Terwiesch C (2023) Would ChatGPT3 get a Wharton MBA? A prediction based on its performance in the operations management course. Mack Institute for Innovation Management at the Wharton School, University of Pennsylvania, Philadelphia.Google Scholar
  • Wamba SF, Queiroz MM, Jabbour CJC, Shi CV (2023) Are both generative AI and ChatGPT game changers for 21st-century operations and supply chain excellence? Internat. J. Production Econom. 265:109015.CrossrefGoogle Scholar
  • Wang P, Xiao Z, Chen H, Oswald FL (2024) Will the real Linda please stand up … to large language models? Examining the representativeness heuristic in LLMs. Preprint, submitted April 1, https://arxiv.org/abs/2404.01461.Google Scholar
  • Wasserstein RL, Lazar NA (2016) The ASA statement on p-values: Context, process, and purpose. Amer. Statistician 70(2):129–133.CrossrefGoogle Scholar
  • Xie C, Chen C, Jia F, Ye Z, Lai S, Shu K, Gu J, et al. (2024) Can large language model agents simulate human trust behaviors? Preprint, submitted February 7, https://arxiv.org/abs/2402.04559.Google Scholar
  • Xu R, Sun Y, Ren M, Guo S, Pan R, Lin H, Sun L, Han X (2024) AI for social science and social science of AI: A survey. Inform. Processing Management 61(3):103665.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.