How Well Can AI Do Strategy? Empirical Benchmarking Using Strategy Simulations

Published Online:https://doi.org/10.1287/stsc.2025.0444

References

  • Adner R (2002) When are technologies disruptive? A demand-based view of the emergence of competition. Strategic Management J. 23:667–688.CrossrefGoogle Scholar
  • Adner R, Zemsky P (2005) Disruptive technologies and the emergence of competition. RAND J. Econom. 36(2):229–254.Google Scholar
  • Adner R, Csaszar FA, Zemsky PB (2014) Positioning on a multiattribute landscape. Management Sci. 60(11):2794–2815.LinkGoogle Scholar
  • Allen RT (2025) Leap of faith? How diffusion dynamics obfuscate the commercial potential of novel innovations. Preprint, submitted February 14, http://dx.doi.org/10.2139/ssrn.5084612.Google Scholar
  • Allen RT, Choudhury P. (2022) Algorithm-augmented work and domain experience: The countervailing forces of ability and aversion. Organ. Sci. 33(1):149–169.LinkGoogle Scholar
  • Allen RT, McDonald RM (2025) Methodological pluralism and innovation in data-driven organizations. Admin. Sci. Quart. 70(2):403–443.CrossrefGoogle Scholar
  • Allen RT, Bremner R, McDonald RM (2026) Listen to your users? Self-selection in user community feedback and commercial success. Acad. Management J. Forthcoming.Google Scholar
  • Anthropic (2024) Introducing the next generation of Claude. https://www.anthropic.com/news/claude-3-family.Google Scholar
  • Arthur F, Hossein KR (2019) Deep learning in medical image analysis: A third eye for doctors. J. Stomatology Oral Maxillofacial Surgery 120(4):279–288.CrossrefGoogle Scholar
  • Baer M, Dirks KT, Nickerson JA (2013) Microfoundations of strategic problem formulation. Strategic Management J. 34(2):197–214.CrossrefGoogle Scholar
  • Benner MJ, Tushman ML (2003) Exploitation, exploration, and process management: The productivity dilemma revisited. Acad. Management Rev. 28(2):238–256.CrossrefGoogle Scholar
  • Boussioux L, Lane JN, Zhang M, Jacimovic V, Lakhani KR (2024) The crowdless future? Generative AI and creative problem-solving. Organ. Sci. 35(5):1589–1607.LinkGoogle Scholar
  • Bubeck S, Chadrasekaran V, Eldan R, Gehrke J, Horvitz E, Kamar E, Lee P, et al. (2023) Sparks of artificial general intelligence: Early experiments with GPT-4. Preprint, submitted March 22, https://arxiv.org/abs/2303.12712.Google Scholar
  • Chen EL, Katila R, McDonald R, Eisenhardt KM (2010) Life in the fast lane: Origins of competitive interaction in new vs. Established markets. Strategic Management J. 31(13):1527–1547.CrossrefGoogle Scholar
  • Chen M, Tworek J, Jun H, Yuan Q, Pinto HPDO, Kaplan J, Edwards H, et al. (2021) Evaluating large language models trained on code. Preprint, submitted July 7, https://arxiv.org/abs/2107.03374.Google Scholar
  • Choudhury P, Allen RT, Endres MG (2021) Machine learning for pattern discovery in management research. Strategic Management J. 42(1):30–57.CrossrefGoogle Scholar
  • Choudhury P, Starr E, Agarwal R (2020) Machine learning and human capital complementarities: Experimental evidence on bias mitigation. Strategic Management J.CrossrefGoogle Scholar
  • Christensen CM, Bower JL (1996) Customer power, strategic investment, and the failure of leading firms. Strategic Management J. 17:197–218.CrossrefGoogle Scholar
  • Christensen CM, Shih WC (2019) Strategic Innovation Simulation: Back Bay Battery v3 (Harvard Business School Publishing, Cambridge, MA).Google Scholar
  • Christensen CM, McDonald R, Altman EJ, Palmer JE (2018) Disruptive innovation: An intellectual history and directions for future research. J. Management Stud. 55(7):1043–1078.CrossrefGoogle Scholar
  • Chu LY, Li G, Wu A, Wu B (2025) Disruptive timing. Management Sci., ePub ahead of print October 30, https://doi.org/10.1287/mnsc.2023.01734.LinkGoogle Scholar
  • ClaudePlaysPokemon (2025) Retrieved August 18, 2025, https://www.twitch.tv/claudeplayspokemon.Google Scholar
  • Clough DR, Wu A (2022) Artificial intelligence, data-driven learning, and the decentralized structure of platform ecosystems. Acad. Management Rev. 47:184–189.CrossrefGoogle Scholar
  • Csaszar FA (2018) What makes a decision strategic? Strategy Sci. 3:606–619.LinkGoogle Scholar
  • Csaszar FA (2025) Unbounding rationality: Why AI is a fundamental issue for strategy. Preprint, submitted September 8, https://doi.org/10.2139/ssrn.5454634.Google Scholar
  • Csaszar FA, Levinthal DA (2016) Mental representation and the discovery of new strategies. Strategic Management J. 37:2031–2049.CrossrefGoogle Scholar
  • Csaszar FA, Ketkar H, Kim H (2024) Artificial intelligence and strategic decision-making: Evidence from entrepreneurs and investors. Strategy Sci. 9:322–345.LinkGoogle Scholar
  • Dell’Acqua F, McFowland E, Mollick ER, Lifshitz-Assaf H, Kellogg K, Rajendran S, Lakhani KR (2023) Navigating the jagged technological frontier: Field experimental evidence of the effects of AI on knowledge worker productivity and quality. Preprint, submitted September 18, https://doi.org/10.2139/ssrn.4573321.Google Scholar
  • Dell’Acqua F, Ayoubi C, Lifshitz-Assaf H, Sadun R, Mollick ER, Mollick L, et al. (2025) The cybernetic teammate: A field experiment on generative AI reshaping teamwork and expertise. NBER Working Paper No. 33641, National Bureau of Economic Research, Cambridge, MA.Google Scholar
  • Doshi AR, Bell JJ, Mirzayev E, Vanneste BS (2025) Generative artificial intelligence and evaluating strategic decisions. Strategic Management J. 46:583–610.CrossrefGoogle Scholar
  • Eisenhardt KM (1989) Making fast strategic decisions in high-velocity environments. Acad. Management J. 32:543–576.CrossrefGoogle Scholar
  • Eisenhardt KM, Bourgeois LJ III (1988) Politics of strategic decision making in high-velocity environments: Toward a midrange theory. Acad. Management J. 31:737–770.CrossrefGoogle Scholar
  • Eisenhardt KM, Zbaracki MJ (1992) Strategic decision making. Strategic Management J. 13:17–37.CrossrefGoogle Scholar
  • Eisenmann T (2025) Scaling Tech Ventures Simulation (Harvard Business School Publishing, Cambridge, MA).Google Scholar
  • Felin T, Holweg M (2024) Theory is all you need: AI, human cognition, and causal reasoning. Strategy Sci. 9:346–371.LinkGoogle Scholar
  • Felin T, Zenger T (2017) The theory-based view: Economic actors as theorists. Strategy Sci. 2:258–271.LinkGoogle Scholar
  • Felin T, Gambardella A, Zenger T (2024) Theory-based decisions: Foundations and introduction. Strategy Sci. 9:297–310.LinkGoogle Scholar
  • Felin T, Foss NJ, Heimeriks KH, Madsen TL (2012) Microfoundations of routines and capabilities: Individuals, processes, and structure. J. Management Stud. 49:1351–1374.CrossrefGoogle Scholar
  • Fleming L, Mingo S, Chen D (2007) Collaborative brokerage, generative creativity, and creative success. Admin. Sci. Quart. 52:443–475.CrossrefGoogle Scholar
  • Gaessler F, Piezunka H (2023) Training with AI: Evidence from chess computers. Strategic Management J. 44:2724–2750.CrossrefGoogle Scholar
  • Ghemawat P (1991) Commitment: The Dynamics of Strategy (Free Press, New York).Google Scholar
  • Google (2025) Gemini 2.5: Our most intelligent AI model. (March 25), https://blog.google/innovation-and-ai/models-and-research/google-deepmind/gemini-model-thinking-updates-march-2025/.Google Scholar
  • GPQA Leaderboard (2025) Retrieved December 17, 2025, https://llm-stats.com/benchmarks/gpqa.Google Scholar
  • Gupta AK, Smith KG, Shalley CE (2006) The interplay between exploration and exploitation. Acad. Management J. 49:693–706.CrossrefGoogle Scholar
  • Hong L, Lamberson PJ, Page SE (2021) Hybrid predictive ensembles: Synergies between human and computational forecasts. J. Soc. Comput. 2:89–102.CrossrefGoogle Scholar
  • Kleinberg J, Lakkaraju H, Leskovec J, Ludwig J, Mullainathan S (2018) Human decisions and machine predictions. Quart. J. Econom. 133:237–293.CrossrefGoogle Scholar
  • Kolkman D, Lee GK, van Witteloostuijn A (2024) Data science and automation in the process of theorizing: Machine learning’s power of induction in the co-duction cycle. PLoS One 19(11):e0309318.CrossrefGoogle Scholar
  • Kwa T, West B, Becker J, Deng A, Garcia K, Hasin M, et al. (2025) Measuring AI ability to complete long tasks. Preprint, submitted March 18, https://arxiv.org/abs/2503.14499.Google Scholar
  • Levinthal DA (1997) Adaptation on rugged landscapes. Management Sci. 43:934–950.LinkGoogle Scholar
  • Levinthal DA (2017) Mendel in the c-suite: Design and the evolution of strategies. Strategy Sci. 2:282–287.LinkGoogle Scholar
  • Li D (2017) Expertise versus bias in evaluation: Evidence from the NIH. Amer. Econom. J. Appl. Econom. 9:60–92.CrossrefGoogle Scholar
  • March J (1991) Exploration and exploitation in organizational learning. Organ. Sci. 2:71–87.LinkGoogle Scholar
  • Maslej N, Fattorini L, Perrault R, Gil Y, Parli V, Kariuki N, Oak S (2025) Artificial intelligence index report 2025. Accessed April 30, 2025, https://hai.stanford.edu/ai-index/2025-ai-index-report.Google Scholar
  • McDonald RM, Allen RT (2022) A spanner in the works: Category-spanning entrants and audience valuation of incumbents. Strategy Sci. 7:190–209.LinkGoogle Scholar
  • McDonald RM, Eisenhardt KM (2020) Parallel play: Startups, nascent markets, and effective business-model design. Admin. SSci.Quart. 65(2):483–523.CrossrefGoogle Scholar
  • Meincke L, Mollick E, Mollick L, Shapiro D (2025) Prompting science report 1: Prompt engineering is complicated and contingent. Preprint, submitted March 4, https://arxiv.org/abs/2503.04818.Google Scholar
  • Mintzberg H, Raisinghani D, Theoret A (1976) The structure of” unstructured” decision processes. Admin. Sci. Quart. 246–275.CrossrefGoogle Scholar
  • Mollick E (2024) Reinventing the organization for GenAI and LLMs. MIT Sloan Management Rev., 1–4.Google Scholar
  • Morris S, Oldroyd J, Allen RT, Chng DHM, Han J (2023) From local modification to global innovation: How research units in emerging economies innovate for the world. J. Internat. Bus. Stud. 54:418–440.CrossrefGoogle Scholar
  • Murray A, Rhymer J, Sirmon DG (2021) Humans and technology: Forms of conjoined agency in organizations. Acad. Management Rev. 46(3):552–571.CrossrefGoogle Scholar
  • Newborn M (2012) Kasparov Versus Deep Blue: Computer Chess Comes of Age (Springer Science & Business Media, New York).Google Scholar
  • Nickerson JA, Zenger TR (2004) A knowledge-based theory of the firm—The problem-solving perspective. Organ. Sci. 15:617–632.LinkGoogle Scholar
  • Open LLM Leaderboard (2025) Retrieved September 15, 2025, https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard.Google Scholar
  • OpenAI (2024) Introducing OpenAI o1. Accessed April 15, 2025, https://openai.com/o1/.Google Scholar
  • OpenAI (2025) Introducing OpenAI o3 and o4-mini. Accessed April 30, 2025, https://openai.com/index/introducing-o3-and-o4-mini/.Google Scholar
  • OpenAI Pioneers Program (2025) Retrieved September 15, 2025, https://openai.com/index/openai-pioneers-program/.Google Scholar
  • OpenAI, Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, Aleman FL, et al. (2023) GPT-4 technical report. Preprint, submitted March 15, https://arxiv.org/abs/2303.08774.Google Scholar
  • Overview Leaderboard | LMArena (n.d.) Retrieved December 17, 2025, https://lmarena.ai/leaderboard.Google Scholar
  • Peterson A, Wu A (2021) Entrepreneurial learning and strategic foresight. Strategic Management J. 42:2357–2388.CrossrefGoogle Scholar
  • Randazzo S, Lifshitz H, Kellogg KC, Dell’Acqua F, Mollick E, Candelon F, Lakhani KR (2025) Cyborgs, centaurs and self-automators: The three modes of human-GenAI knowledge work and their implications for skilling and the future of expertise. Harvard Bus. Rev., https://www.hbs.edu/ris/Publication%20Files/26-036_e7d0e59a-904c-49f1-b610-56eb2bdfe6f9.pdf.Google Scholar
  • Rein D, Hou BL, Stickland AC, Petty J, Pang RY, Dirani J, Bowman SR (2024) GPQA: A graduate-level Google-proof Q&A benchmark. Proc. 1st Conf. Language Modeling.Google Scholar
  • Rivkin JW (2000) Imitation of complex strategies. Management Sci. 46(6):824–844.LinkGoogle Scholar
  • Rivkin JW, Siggelkow N (2006) Organizing to strategize in the face of interactions: Preventing premature lock-in. Long Range Planning 39:591–614.CrossrefGoogle Scholar
  • Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, et al. (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484–489. CrossrefGoogle Scholar
  • UC Berkeley SkyLab (2025) LMArena. https://arena.ai/about.Google Scholar
  • Vinyals O, Babuschkin I, Czarnecki WM, Mathieu M, Dudzik A, Chung J, Choi DH, et al. (2019) Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575(7782):350–354. CrossrefGoogle Scholar
  • Wang Y, Ma X, Zhang G, Ni Y, Chandra A, Guo S, Ren W, et al. (2024) MMLU-PRO: A more robust and challenging multi-task language understanding benchmark. Adv. Neural Inform. Processing Systems 37:95266–95290. Google Scholar
  • Wei J, Wang X, Schuurmans D, Bosma M, Xia F, Chi E, et al. (2022) Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inform. Processing Systems 35:24824–24837.Google Scholar
  • White C, Dooley S, Roberts M, Pal A, Feuer B, Jain S, Shwartz-Ziv R, et al. (2024) LiveBench: A challenging, contamination-free LLM benchmark. Preprint, submitted June 27, https://arxiv.org/abs/2406.19314.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.