Open Access

How Well Can AI Do Strategy? Empirical Benchmarking Using Strategy Simulations

Ryan T. Allen
Corresponding Author
Ryan T. Allen
[email protected]
https://orcid.org/0000-0002-8227-8844
Management Department, Marriott School of Business, Brigham Young University, Provo, Utah 84602
Search for more papers by this author
,
Rory M. McDonald
Rory M. McDonald
[email protected]
https://orcid.org/0000-0003-4404-9212
Strategy, Ethics, and Entrepreneurship, Darden School of Business, University of Virginia, Charlottesville, Virginia 22903
Search for more papers by this author

Ryan T. Allen

Corresponding Author

Ryan T. Allen

[email protected]

https://orcid.org/0000-0002-8227-8844

Management Department, Marriott School of Business, Brigham Young University, Provo, Utah 84602

Search for more papers by this author

Rory M. McDonald

[email protected]

https://orcid.org/0000-0003-4404-9212

Strategy, Ethics, and Entrepreneurship, Darden School of Business, University of Virginia, Charlottesville, Virginia 22903

Search for more papers by this author

Published Online:11 Mar 2026https://doi.org/10.1287/stsc.2025.0444

References

Adner R (2002) When are technologies disruptive? A demand-based view of the emergence of competition. Strategic Management J. 23:667–688.Crossref, Google Scholar
Adner R, Zemsky P (2005) Disruptive technologies and the emergence of competition. RAND J. Econom. 36(2):229–254.Google Scholar
Adner R, Csaszar FA, Zemsky PB (2014) Positioning on a multiattribute landscape. Management Sci. 60(11):2794–2815.Link, Google Scholar
Allen RT (2025) Leap of faith? How diffusion dynamics obfuscate the commercial potential of novel innovations. Preprint, submitted February 14, http://dx.doi.org/10.2139/ssrn.5084612.Google Scholar
Allen RT, Choudhury P. (2022) Algorithm-augmented work and domain experience: The countervailing forces of ability and aversion. Organ. Sci. 33(1):149–169.Link, Google Scholar
Allen RT, McDonald RM (2025) Methodological pluralism and innovation in data-driven organizations. Admin. Sci. Quart. 70(2):403–443.Crossref, Google Scholar
Allen RT, Bremner R, McDonald RM (2026) Listen to your users? Self-selection in user community feedback and commercial success. Acad. Management J. Forthcoming.Google Scholar
Anthropic (2024) Introducing the next generation of Claude. https://www.anthropic.com/news/claude-3-family.Google Scholar
Arthur F, Hossein KR (2019) Deep learning in medical image analysis: A third eye for doctors. J. Stomatology Oral Maxillofacial Surgery 120(4):279–288.Crossref, Google Scholar
Baer M, Dirks KT, Nickerson JA (2013) Microfoundations of strategic problem formulation. Strategic Management J. 34(2):197–214.Crossref, Google Scholar
Benner MJ, Tushman ML (2003) Exploitation, exploration, and process management: The productivity dilemma revisited. Acad. Management Rev. 28(2):238–256.Crossref, Google Scholar
Boussioux L, Lane JN, Zhang M, Jacimovic V, Lakhani KR (2024) The crowdless future? Generative AI and creative problem-solving. Organ. Sci. 35(5):1589–1607.Link, Google Scholar
Bubeck S, Chadrasekaran V, Eldan R, Gehrke J, Horvitz E, Kamar E, Lee P, et al. (2023) Sparks of artificial general intelligence: Early experiments with GPT-4. Preprint, submitted March 22, https://arxiv.org/abs/2303.12712.Google Scholar
Chen EL, Katila R, McDonald R, Eisenhardt KM (2010) Life in the fast lane: Origins of competitive interaction in new vs. Established markets. Strategic Management J. 31(13):1527–1547.Crossref, Google Scholar
Chen M, Tworek J, Jun H, Yuan Q, Pinto HPDO, Kaplan J, Edwards H, et al. (2021) Evaluating large language models trained on code. Preprint, submitted July 7, https://arxiv.org/abs/2107.03374.Google Scholar
Choudhury P, Allen RT, Endres MG (2021) Machine learning for pattern discovery in management research. Strategic Management J. 42(1):30–57.Crossref, Google Scholar
Choudhury P, Starr E, Agarwal R (2020) Machine learning and human capital complementarities: Experimental evidence on bias mitigation. Strategic Management J.Crossref, Google Scholar
Christensen CM, Bower JL (1996) Customer power, strategic investment, and the failure of leading firms. Strategic Management J. 17:197–218.Crossref, Google Scholar
Christensen CM, Shih WC (2019) Strategic Innovation Simulation: Back Bay Battery v3 (Harvard Business School Publishing, Cambridge, MA).Google Scholar
Christensen CM, McDonald R, Altman EJ, Palmer JE (2018) Disruptive innovation: An intellectual history and directions for future research. J. Management Stud. 55(7):1043–1078.Crossref, Google Scholar
Chu LY, Li G, Wu A, Wu B (2025) Disruptive timing. Management Sci., ePub ahead of print October 30, https://doi.org/10.1287/mnsc.2023.01734.Link, Google Scholar
ClaudePlaysPokemon (2025) Retrieved August 18, 2025, https://www.twitch.tv/claudeplayspokemon.Google Scholar
Clough DR, Wu A (2022) Artificial intelligence, data-driven learning, and the decentralized structure of platform ecosystems. Acad. Management Rev. 47:184–189.Crossref, Google Scholar
Csaszar FA (2018) What makes a decision strategic? Strategy Sci. 3:606–619.Link, Google Scholar
Csaszar FA (2025) Unbounding rationality: Why AI is a fundamental issue for strategy. Preprint, submitted September 8, https://doi.org/10.2139/ssrn.5454634.Google Scholar
Csaszar FA, Levinthal DA (2016) Mental representation and the discovery of new strategies. Strategic Management J. 37:2031–2049.Crossref, Google Scholar
Csaszar FA, Ketkar H, Kim H (2024) Artificial intelligence and strategic decision-making: Evidence from entrepreneurs and investors. Strategy Sci. 9:322–345.Link, Google Scholar
Dell’Acqua F, McFowland E, Mollick ER, Lifshitz-Assaf H, Kellogg K, Rajendran S, Lakhani KR (2023) Navigating the jagged technological frontier: Field experimental evidence of the effects of AI on knowledge worker productivity and quality. Preprint, submitted September 18, https://doi.org/10.2139/ssrn.4573321.Google Scholar
Dell’Acqua F, Ayoubi C, Lifshitz-Assaf H, Sadun R, Mollick ER, Mollick L, et al. (2025) The cybernetic teammate: A field experiment on generative AI reshaping teamwork and expertise. NBER Working Paper No. 33641, National Bureau of Economic Research, Cambridge, MA.Google Scholar
Doshi AR, Bell JJ, Mirzayev E, Vanneste BS (2025) Generative artificial intelligence and evaluating strategic decisions. Strategic Management J. 46:583–610.Crossref, Google Scholar
Eisenhardt KM (1989) Making fast strategic decisions in high-velocity environments. Acad. Management J. 32:543–576.Crossref, Google Scholar
Eisenhardt KM, Bourgeois LJ III (1988) Politics of strategic decision making in high-velocity environments: Toward a midrange theory. Acad. Management J. 31:737–770.Crossref, Google Scholar
Eisenhardt KM, Zbaracki MJ (1992) Strategic decision making. Strategic Management J. 13:17–37.Crossref, Google Scholar
Eisenmann T (2025) Scaling Tech Ventures Simulation (Harvard Business School Publishing, Cambridge, MA).Google Scholar
Felin T, Holweg M (2024) Theory is all you need: AI, human cognition, and causal reasoning. Strategy Sci. 9:346–371.Link, Google Scholar
Felin T, Zenger T (2017) The theory-based view: Economic actors as theorists. Strategy Sci. 2:258–271.Link, Google Scholar
Felin T, Gambardella A, Zenger T (2024) Theory-based decisions: Foundations and introduction. Strategy Sci. 9:297–310.Link, Google Scholar
Felin T, Foss NJ, Heimeriks KH, Madsen TL (2012) Microfoundations of routines and capabilities: Individuals, processes, and structure. J. Management Stud. 49:1351–1374.Crossref, Google Scholar
Fleming L, Mingo S, Chen D (2007) Collaborative brokerage, generative creativity, and creative success. Admin. Sci. Quart. 52:443–475.Crossref, Google Scholar
Gaessler F, Piezunka H (2023) Training with AI: Evidence from chess computers. Strategic Management J. 44:2724–2750.Crossref, Google Scholar
Ghemawat P (1991) Commitment: The Dynamics of Strategy (Free Press, New York).Google Scholar
Google (2025) Gemini 2.5: Our most intelligent AI model. (March 25), https://blog.google/innovation-and-ai/models-and-research/google-deepmind/gemini-model-thinking-updates-march-2025/.Google Scholar
GPQA Leaderboard (2025) Retrieved December 17, 2025, https://llm-stats.com/benchmarks/gpqa.Google Scholar
Gupta AK, Smith KG, Shalley CE (2006) The interplay between exploration and exploitation. Acad. Management J. 49:693–706.Crossref, Google Scholar
Hong L, Lamberson PJ, Page SE (2021) Hybrid predictive ensembles: Synergies between human and computational forecasts. J. Soc. Comput. 2:89–102.Crossref, Google Scholar
Kleinberg J, Lakkaraju H, Leskovec J, Ludwig J, Mullainathan S (2018) Human decisions and machine predictions. Quart. J. Econom. 133:237–293.Crossref, Google Scholar
Kolkman D, Lee GK, van Witteloostuijn A (2024) Data science and automation in the process of theorizing: Machine learning’s power of induction in the co-duction cycle. PLoS One 19(11):e0309318.Crossref, Google Scholar
Kwa T, West B, Becker J, Deng A, Garcia K, Hasin M, et al. (2025) Measuring AI ability to complete long tasks. Preprint, submitted March 18, https://arxiv.org/abs/2503.14499.Google Scholar
Levinthal DA (1997) Adaptation on rugged landscapes. Management Sci. 43:934–950.Link, Google Scholar
Levinthal DA (2017) Mendel in the c-suite: Design and the evolution of strategies. Strategy Sci. 2:282–287.Link, Google Scholar
Li D (2017) Expertise versus bias in evaluation: Evidence from the NIH. Amer. Econom. J. Appl. Econom. 9:60–92.Crossref, Google Scholar
March J (1991) Exploration and exploitation in organizational learning. Organ. Sci. 2:71–87.Link, Google Scholar
Maslej N, Fattorini L, Perrault R, Gil Y, Parli V, Kariuki N, Oak S (2025) Artificial intelligence index report 2025. Accessed April 30, 2025, https://hai.stanford.edu/ai-index/2025-ai-index-report.Google Scholar
McDonald RM, Allen RT (2022) A spanner in the works: Category-spanning entrants and audience valuation of incumbents. Strategy Sci. 7:190–209.Link, Google Scholar
McDonald RM, Eisenhardt KM (2020) Parallel play: Startups, nascent markets, and effective business-model design. Admin. SSci.Quart. 65(2):483–523.Crossref, Google Scholar
Meincke L, Mollick E, Mollick L, Shapiro D (2025) Prompting science report 1: Prompt engineering is complicated and contingent. Preprint, submitted March 4, https://arxiv.org/abs/2503.04818.Google Scholar
Mintzberg H, Raisinghani D, Theoret A (1976) The structure of” unstructured” decision processes. Admin. Sci. Quart. 246–275.Crossref, Google Scholar
Mollick E (2024) Reinventing the organization for GenAI and LLMs. MIT Sloan Management Rev., 1–4.Google Scholar
Morris S, Oldroyd J, Allen RT, Chng DHM, Han J (2023) From local modification to global innovation: How research units in emerging economies innovate for the world. J. Internat. Bus. Stud. 54:418–440.Crossref, Google Scholar
Murray A, Rhymer J, Sirmon DG (2021) Humans and technology: Forms of conjoined agency in organizations. Acad. Management Rev. 46(3):552–571.Crossref, Google Scholar
Newborn M (2012) Kasparov Versus Deep Blue: Computer Chess Comes of Age (Springer Science & Business Media, New York).Google Scholar
Nickerson JA, Zenger TR (2004) A knowledge-based theory of the firm—The problem-solving perspective. Organ. Sci. 15:617–632.Link, Google Scholar
Open LLM Leaderboard (2025) Retrieved September 15, 2025, https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard.Google Scholar
OpenAI (2024) Introducing OpenAI o1. Accessed April 15, 2025, https://openai.com/o1/.Google Scholar
OpenAI (2025) Introducing OpenAI o3 and o4-mini. Accessed April 30, 2025, https://openai.com/index/introducing-o3-and-o4-mini/.Google Scholar
OpenAI Pioneers Program (2025) Retrieved September 15, 2025, https://openai.com/index/openai-pioneers-program/.Google Scholar
OpenAI, Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, Aleman FL, et al. (2023) GPT-4 technical report. Preprint, submitted March 15, https://arxiv.org/abs/2303.08774.Google Scholar
Overview Leaderboard | LMArena (n.d.) Retrieved December 17, 2025, https://lmarena.ai/leaderboard.Google Scholar
Peterson A, Wu A (2021) Entrepreneurial learning and strategic foresight. Strategic Management J. 42:2357–2388.Crossref, Google Scholar
Randazzo S, Lifshitz H, Kellogg KC, Dell’Acqua F, Mollick E, Candelon F, Lakhani KR (2025) Cyborgs, centaurs and self-automators: The three modes of human-GenAI knowledge work and their implications for skilling and the future of expertise. Harvard Bus. Rev., https://www.hbs.edu/ris/Publication%20Files/26-036_e7d0e59a-904c-49f1-b610-56eb2bdfe6f9.pdf.Google Scholar
Rein D, Hou BL, Stickland AC, Petty J, Pang RY, Dirani J, Bowman SR (2024) GPQA: A graduate-level Google-proof Q&A benchmark. Proc. 1st Conf. Language Modeling.Google Scholar
Rivkin JW (2000) Imitation of complex strategies. Management Sci. 46(6):824–844.Link, Google Scholar
Rivkin JW, Siggelkow N (2006) Organizing to strategize in the face of interactions: Preventing premature lock-in. Long Range Planning 39:591–614.Crossref, Google Scholar
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, et al. (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484–489. Crossref, Google Scholar
UC Berkeley SkyLab (2025) LMArena. https://arena.ai/about.Google Scholar
Vinyals O, Babuschkin I, Czarnecki WM, Mathieu M, Dudzik A, Chung J, Choi DH, et al. (2019) Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575(7782):350–354. Crossref, Google Scholar
Wang Y, Ma X, Zhang G, Ni Y, Chandra A, Guo S, Ren W, et al. (2024) MMLU-PRO: A more robust and challenging multi-task language understanding benchmark. Adv. Neural Inform. Processing Systems 37:95266–95290. Google Scholar
Wei J, Wang X, Schuurmans D, Bosma M, Xia F, Chi E, et al. (2022) Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inform. Processing Systems 35:24824–24837.Google Scholar
White C, Dooley S, Roberts M, Pal A, Feuer B, Jain S, Shwartz-Ziv R, et al. (2024) LiveBench: A challenging, contamination-free LLM benchmark. Preprint, submitted June 27, https://arxiv.org/abs/2406.19314.Google Scholar

Volume 11, Issue 1

March 2026

Pages 1-179, ii

Article Information

Supplemental Material

Metrics

Information

Received:May 15, 2025
Accepted:December 15, 2025
Published Online:March 11, 2026

Cite as

Ryan T. Allen, Rory M. McDonald (2026) How Well Can AI Do Strategy? Empirical Benchmarking Using Strategy Simulations. Strategy Science 11(1):93-117.

https://doi.org/10.1287/stsc.2025.0444

Keywords

PDF download

Available Issues

Available Issues

Available Issues

How Well Can AI Do Strategy? Empirical Benchmarking Using Strategy Simulations

References

Volume 11, Issue 1

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News