How Well Can AI Do Strategy? Empirical Benchmarking Using Strategy Simulations
Published Online:11 Mar 2026https://doi.org/10.1287/stsc.2025.0444
References
- (2002) When are technologies disruptive? A demand-based view of the emergence of competition. Strategic Management J. 23:667–688.Crossref, Google Scholar
- (2005) Disruptive technologies and the emergence of competition. RAND J. Econom. 36(2):229–254.Google Scholar
- (2014) Positioning on a multiattribute landscape. Management Sci. 60(11):2794–2815.Link, Google Scholar
- (2025) Leap of faith? How diffusion dynamics obfuscate the commercial potential of novel innovations. Preprint, submitted February 14, http://dx.doi.org/10.2139/ssrn.5084612.Google Scholar
- . (2022) Algorithm-augmented work and domain experience: The countervailing forces of ability and aversion. Organ. Sci. 33(1):149–169.Link, Google Scholar
- (2025) Methodological pluralism and innovation in data-driven organizations. Admin. Sci. Quart. 70(2):403–443.Crossref, Google Scholar
- (2026) Listen to your users? Self-selection in user community feedback and commercial success. Acad. Management J. Forthcoming.Google Scholar
- Anthropic (2024) Introducing the next generation of Claude. https://www.anthropic.com/news/claude-3-family.Google Scholar
- (2019) Deep learning in medical image analysis: A third eye for doctors. J. Stomatology Oral Maxillofacial Surgery 120(4):279–288.Crossref, Google Scholar
- (2013) Microfoundations of strategic problem formulation. Strategic Management J. 34(2):197–214.Crossref, Google Scholar
- (2003) Exploitation, exploration, and process management: The productivity dilemma revisited. Acad. Management Rev. 28(2):238–256.Crossref, Google Scholar
- (2024) The crowdless future? Generative AI and creative problem-solving. Organ. Sci. 35(5):1589–1607.Link, Google Scholar
- , Lee P, (2023) Sparks of artificial general intelligence: Early experiments with GPT-4. Preprint, submitted March 22, https://arxiv.org/abs/2303.12712.Google Scholar
- (2010) Life in the fast lane: Origins of competitive interaction in new vs. Established markets. Strategic Management J. 31(13):1527–1547.Crossref, Google Scholar
- , Edwards H, (2021) Evaluating large language models trained on code. Preprint, submitted July 7, https://arxiv.org/abs/2107.03374.Google Scholar
- (2021) Machine learning for pattern discovery in management research. Strategic Management J. 42(1):30–57.Crossref, Google Scholar
- (2020) Machine learning and human capital complementarities: Experimental evidence on bias mitigation. Strategic Management J.Crossref, Google Scholar
- (1996) Customer power, strategic investment, and the failure of leading firms. Strategic Management J. 17:197–218.Crossref, Google Scholar
- (2019) Strategic Innovation Simulation: Back Bay Battery v3 (Harvard Business School Publishing, Cambridge, MA).Google Scholar
- (2018) Disruptive innovation: An intellectual history and directions for future research. J. Management Stud. 55(7):1043–1078.Crossref, Google Scholar
- (2025) Disruptive timing. Management Sci., ePub ahead of print October 30, https://doi.org/10.1287/mnsc.2023.01734.Link, Google Scholar
- ClaudePlaysPokemon (2025) Retrieved August 18, 2025, https://www.twitch.tv/claudeplayspokemon.Google Scholar
- (2022) Artificial intelligence, data-driven learning, and the decentralized structure of platform ecosystems. Acad. Management Rev. 47:184–189.Crossref, Google Scholar
- (2018) What makes a decision strategic? Strategy Sci. 3:606–619.Link, Google Scholar
- (2025) Unbounding rationality: Why AI is a fundamental issue for strategy. Preprint, submitted September 8, https://doi.org/10.2139/ssrn.5454634.Google Scholar
- (2016) Mental representation and the discovery of new strategies. Strategic Management J. 37:2031–2049.Crossref, Google Scholar
- (2024) Artificial intelligence and strategic decision-making: Evidence from entrepreneurs and investors. Strategy Sci. 9:322–345.Link, Google Scholar
- (2023) Navigating the jagged technological frontier: Field experimental evidence of the effects of AI on knowledge worker productivity and quality. Preprint, submitted September 18, https://doi.org/10.2139/ssrn.4573321.Google Scholar
- (2025) The cybernetic teammate: A field experiment on generative AI reshaping teamwork and expertise. NBER Working Paper No. 33641, National Bureau of Economic Research, Cambridge, MA.Google Scholar
- (2025) Generative artificial intelligence and evaluating strategic decisions. Strategic Management J. 46:583–610.Crossref, Google Scholar
- (1989) Making fast strategic decisions in high-velocity environments. Acad. Management J. 32:543–576.Crossref, Google Scholar
- (1988) Politics of strategic decision making in high-velocity environments: Toward a midrange theory. Acad. Management J. 31:737–770.Crossref, Google Scholar
- (1992) Strategic decision making. Strategic Management J. 13:17–37.Crossref, Google Scholar
- (2025) Scaling Tech Ventures Simulation (Harvard Business School Publishing, Cambridge, MA).Google Scholar
- (2024) Theory is all you need: AI, human cognition, and causal reasoning. Strategy Sci. 9:346–371.Link, Google Scholar
- (2017) The theory-based view: Economic actors as theorists. Strategy Sci. 2:258–271.Link, Google Scholar
- (2024) Theory-based decisions: Foundations and introduction. Strategy Sci. 9:297–310.Link, Google Scholar
- (2012) Microfoundations of routines and capabilities: Individuals, processes, and structure. J. Management Stud. 49:1351–1374.Crossref, Google Scholar
- (2007) Collaborative brokerage, generative creativity, and creative success. Admin. Sci. Quart. 52:443–475.Crossref, Google Scholar
- (2023) Training with AI: Evidence from chess computers. Strategic Management J. 44:2724–2750.Crossref, Google Scholar
- (1991) Commitment: The Dynamics of Strategy (Free Press, New York).Google Scholar
- Google (2025) Gemini 2.5: Our most intelligent AI model. (March 25), https://blog.google/innovation-and-ai/models-and-research/google-deepmind/gemini-model-thinking-updates-march-2025/.Google Scholar
- GPQA Leaderboard (2025) Retrieved December 17, 2025, https://llm-stats.com/benchmarks/gpqa.Google Scholar
- (2006) The interplay between exploration and exploitation. Acad. Management J. 49:693–706.Crossref, Google Scholar
- (2021) Hybrid predictive ensembles: Synergies between human and computational forecasts. J. Soc. Comput. 2:89–102.Crossref, Google Scholar
- (2018) Human decisions and machine predictions. Quart. J. Econom. 133:237–293.Crossref, Google Scholar
- (2024) Data science and automation in the process of theorizing: Machine learning’s power of induction in the co-duction cycle. PLoS One 19(11):e0309318.Crossref, Google Scholar
- (2025) Measuring AI ability to complete long tasks. Preprint, submitted March 18, https://arxiv.org/abs/2503.14499.Google Scholar
- (1997) Adaptation on rugged landscapes. Management Sci. 43:934–950.Link, Google Scholar
- (2017) Mendel in the c-suite: Design and the evolution of strategies. Strategy Sci. 2:282–287.Link, Google Scholar
- (2017) Expertise versus bias in evaluation: Evidence from the NIH. Amer. Econom. J. Appl. Econom. 9:60–92.Crossref, Google Scholar
- (1991) Exploration and exploitation in organizational learning. Organ. Sci. 2:71–87.Link, Google Scholar
- (2025) Artificial intelligence index report 2025. Accessed April 30, 2025, https://hai.stanford.edu/ai-index/2025-ai-index-report.Google Scholar
- (2022) A spanner in the works: Category-spanning entrants and audience valuation of incumbents. Strategy Sci. 7:190–209.Link, Google Scholar
- (2020) Parallel play: Startups, nascent markets, and effective business-model design. Admin. SSci.Quart. 65(2):483–523.Crossref, Google Scholar
- (2025) Prompting science report 1: Prompt engineering is complicated and contingent. Preprint, submitted March 4, https://arxiv.org/abs/2503.04818.Google Scholar
- (1976) The structure of” unstructured” decision processes. Admin. Sci. Quart. 246–275.Crossref, Google Scholar
- (2024) Reinventing the organization for GenAI and LLMs. MIT Sloan Management Rev., 1–4.Google Scholar
- (2023) From local modification to global innovation: How research units in emerging economies innovate for the world. J. Internat. Bus. Stud. 54:418–440.Crossref, Google Scholar
- (2021) Humans and technology: Forms of conjoined agency in organizations. Acad. Management Rev. 46(3):552–571.Crossref, Google Scholar
- (2012) Kasparov Versus Deep Blue: Computer Chess Comes of Age (Springer Science & Business Media, New York).Google Scholar
- (2004) A knowledge-based theory of the firm—The problem-solving perspective. Organ. Sci. 15:617–632.Link, Google Scholar
- Open LLM Leaderboard (2025) Retrieved September 15, 2025, https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard.Google Scholar
- OpenAI (2024) Introducing OpenAI o1. Accessed April 15, 2025, https://openai.com/o1/.Google Scholar
- OpenAI (2025) Introducing OpenAI o3 and o4-mini. Accessed April 30, 2025, https://openai.com/index/introducing-o3-and-o4-mini/.Google Scholar
- OpenAI Pioneers Program (2025) Retrieved September 15, 2025, https://openai.com/index/openai-pioneers-program/.Google Scholar
- (2023) GPT-4 technical report. Preprint, submitted March 15, https://arxiv.org/abs/2303.08774.Google Scholar
- Overview Leaderboard | LMArena (n.d.) Retrieved December 17, 2025, https://lmarena.ai/leaderboard.Google Scholar
- (2021) Entrepreneurial learning and strategic foresight. Strategic Management J. 42:2357–2388.Crossref, Google Scholar
- (2025) Cyborgs, centaurs and self-automators: The three modes of human-GenAI knowledge work and their implications for skilling and the future of expertise. Harvard Bus. Rev., https://www.hbs.edu/ris/Publication%20Files/26-036_e7d0e59a-904c-49f1-b610-56eb2bdfe6f9.pdf.Google Scholar
- (2024) GPQA: A graduate-level Google-proof Q&A benchmark. Proc. 1st Conf. Language Modeling.Google Scholar
- (2000) Imitation of complex strategies. Management Sci. 46(6):824–844.Link, Google Scholar
- (2006) Organizing to strategize in the face of interactions: Preventing premature lock-in. Long Range Planning 39:591–614.Crossref, Google Scholar
- , Schrittwieser J, (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484–489. Crossref, Google Scholar
- UC Berkeley SkyLab (2025) LMArena. https://arena.ai/about.Google Scholar
- , Choi DH, (2019) Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575(7782):350–354. Crossref, Google Scholar
- , Ren W, (2024) MMLU-PRO: A more robust and challenging multi-task language understanding benchmark. Adv. Neural Inform. Processing Systems 37:95266–95290. Google Scholar
- (2022) Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inform. Processing Systems 35:24824–24837.Google Scholar
- (2024) LiveBench: A challenging, contamination-free LLM benchmark. Preprint, submitted June 27, https://arxiv.org/abs/2406.19314.Google Scholar

