Deep Reinforcement Learning for Equilibrium Computation in Multistage Auctions and Contests
Published Online:5 Dec 2025https://doi.org/10.1287/mnsc.2024.06771
References
- (2011) Noncomputable conditional distributions. Proc. IEEE 26th Annual Sympos. Logic Comput. Sci. (IEEE Computer Society, Washington, DC), 107–116.Google Scholar
- (2025) On the uniqueness of Bayesian coarse correlated equilibria in standard first-price and all-pay auctions. Azar Y, Panigrahi D, eds. Proc. Sympos. Discrete Algorithms (SIAM, Philadelphia), 2491–2537.Google Scholar
- (1996) Asymmetric all-pay auctions with incomplete information: The two-player case. Games Econom. Behav. 14(1):1–18.Crossref, Google Scholar
- (2009) Simultaneous vs. sequential price competition with incomplete information. Econom. Lett. 104(1):23–26.Google Scholar
- (1971) Essays in the Theory of Risk-Bearing (North-Holland, Amsterdam).Google Scholar
- (2006) Robust learning equilibrium. Dechter R, Richardson T, eds. Proc. 22nd Conf. Uncertainty Artificial Intelligence (AUAI Press, Corvallis, OR), 7–14.Google Scholar
- (2019) An infinite dimensional purification principle without saturation. J. Math. Anal. Appl. 472(2):1331–1345.Crossref, Google Scholar
- (1995) Commitment and observability in games. Games Econom. Behav. 8(2):271–280.Crossref, Google Scholar
- (1994) Effort levels in contests with two asymmetric players. South. Econom. J. 61(2):367–378.Google Scholar
- (2018) Multiplicative weights update in zero-sum games. Tardos É, Elkind E, Vohra R, eds. Proc. ACM Conf. Econom. Comput. (ACM, Ithaca, NY), 321–338.Google Scholar
- (2019) Dota 2 with Large Scale Deep Reinforcement Learning.Google Scholar
- (1738) Exposition of a new theory on the measurement of risk. Econometrica 22(1):23.Crossref, Google Scholar
- (2025) Computing Bayes–Nash equilibrium strategies in auction games via simultaneous online dual averaging. Oper. Res. 73(2):1102–1127. Link, Google Scholar
- (2023b) Learning equilibria in asymmetric auction games. INFORMS J. Comput. 35(3):523–542.Link, Google Scholar
- (2023) Learning equilibrium in bilateral bargaining games. Eur. J. Oper. Res. 311(2):660–678. Google Scholar
- (2021) Learning equilibria in symmetric auction games using artificial neural networks. Natural Machine Intelligence 3(8):687–695.Google Scholar
- (2023c) Beyond monotonicity: On the convergence of learning algorithms in standard auction games. Walsh T, Shah J, Kolter Z, eds. Proc. Thirty-Ninth AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 13649–13657.Google Scholar
- (1988) Reputation in repeated second-price auctions. J. Econom. Theory 46(1):97–119.Crossref, Google Scholar
- (2003) Approximating game-theoretic optimal strategies for full-scale poker. Proc. Internat. Joint Conf. Artificial Intelligence (Morgan Kaufmann, San Francisco), 661–667.Google Scholar
- (2004) Memory and perfect recall in extensive games. Games Econom. Behav. 47(2):237–256.Google Scholar
- (2020) Computing Bayes-Nash equilibria in combinatorial auctions with verification. J. Artificial Intelligence Res. 69:531–570.Crossref, Google Scholar
- (2018) Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science 359(6374):418–424.Crossref, Google Scholar
- (2019) Superhuman AI for multiplayer poker. Science (1979) 365(6456):885–890.Google Scholar
- (2019) Deep counterfactual regret minimization. Chaudhuri K, Salakhutdinov R, eds. Proc. 36th Internat. Conf. Machine Learn. (PMLR, New York), 793–802.Google Scholar
- (2000) Discrete time dynamic game model for price competition in an oligopoly. Ann. Oper. Res. 97(1):69–89.Google Scholar
- (2014) Simultaneous Bayesian auctions and computational complexity. Proc. 15th ACM Conf. Econom. Comput. (Association for Computing Machinery, New York), 895–910.Google Scholar
- (2007) The non-existence of equilibrium in sequential auctions when bids are revealed. J. Electronic Commerce Res. 8(2):141–157.Google Scholar
- (2018) Auctions with unique equilibria. Proc. 14th ACM Conf. Electronic Commerce (Association for Computing Machinery, New York), 181–196.Google Scholar
- (2020) Chaos, extremism and optimism: Volume analysis of learning in games. Adv. Neural Inform. Processing Systems 33:9039–9049.Google Scholar
- (1987) Signaling games and stable equilibria. Quart. J. Econom. 102(2):179–221.Crossref, Google Scholar
- (2018) Contest theory. Handbook of Game Theory and Industrial Organization, vol. 2 (Edward Elgar Publishing, Inc., Cheltenham, UK), 125–146. Google Scholar
- (2023) Differentiable economics for randomized affine maximizer auctions. Proc. Internat. Joint Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 2633–2641.Google Scholar
- (2020) Certifying strategyproof auction networks. Adv. Neural Inform. Processing Systems (NeurIPS), 2020 (Curran Associates Inc., Red Hook, NY), 4987–4998.Google Scholar
- (2009) The complexity of computing a Nash equilibrium. SIAM J. Comput. 39(1):195–259.Crossref, Google Scholar
- (2023) The complexity of markov equilibrium in stochastic games. Neu G, Rosasco L, eds. Proc. 36th Annual Conf. Learn. Theory (PMLR, New York), 4180–4234.Google Scholar
- (2010) On learning algorithms for Nash equilibria. Proc. Internat. Sympos. Algorithmic Game Theory (Springer, New York), 114–125.Google Scholar
- (1986) Von Stackelberg and Cournot duopoly: Choosing roles. RAND J. Econom. 17(2):251–260.Crossref, Google Scholar
- (2019) Optimal auctions through deep learning: Advances in dfferentiable economics. J. ACM 71(1):1–53.Google Scholar
- (2020) Unique equilibrium in contests with incomplete information. Econom. Theory 70(1):243–271.Crossref, Google Scholar
- (2018) Deep learning for revenue-optimal auctions with budgets. Proc. Internat. Conf. Autonomous Agents Multiagent Systems (International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC), 354–362.Google Scholar
- (2011) Numerical simulations of asymmetric first-price auctions. Games Econom. Behav. 73(2):479–495.Crossref, Google Scholar
- (2021) On the complexity of equilibrium computation in first-price auctions. Proc. 22nd ACM Conf. Econom. Comput. (ACM, New York), 454–476.Google Scholar
- (1998) The Theory of Learning in Games, vol. 2 (MIT Press, Cambridge, MA).Google Scholar
- (1987) First mover disadvantages with private information. Rev. Econom. Stud. 54(2):279–292.Crossref, Google Scholar
- (2012) New results on the verification of Nash refinements for extensive-form games. Proc. 11th Internat. Conf. Autonomous Agents Multiagent Systems (International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC), 813–820.Google Scholar
- (1998) Utility functions: From risk theory to finance. North Amer. Actuarial J. 2(3):74–91.Google Scholar
- (2023) Oracles & followers: Stackelberg equilibria in deep multi-agent reinforcement learning. Krause A, Brunskill E, Cho K, Engelhardt B, Sabato S, Scarlett J, eds. Proc. 40th Internat. Conf. Machine Learn. (PMLR, New York), 11213–11236.Google Scholar
- (2006) A competitive Texas Hold’em poker player via automated abstraction and real-time equilibrium computation. Proc. Natl. Conf. Artificial Intelligence, 1007–1013.Google Scholar
- (2007) Lossless abstraction of imperfect information games. J. ACM. 54(5):25–es. Crossref, Google Scholar
- (1952) A further generalization of the Kakutani fixed theorem, with application to Nash equilibrium points. Proc. Amer. Math. Soc., vol. 3, 170–174.Google Scholar
- (2018) Deep learning for multi-facility location mechanism design. Proc. Internat. Joint Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 261–267.Google Scholar
- (2014) On the notion of perfect Bayesian equilibrium. TOP 22(1):128–143.Google Scholar
- (2010) The computational complexity of trembling hand perfection and other equilibrium refinements. Proc. 3rd Internat. Conf. Algorithmic Game Theory (Springer-Verlag, Berlin), 198–209.Google Scholar
- (2003) Uncoupled dynamics do not lead to Nash equilibrium. Amer. Econom. Rev. 93(5):1830–1836.Crossref, Google Scholar
- (1986) Multi-object auctions: Sequential vs. simultaneous sales. Management Sci. 32(12):1599–1610.Link, Google Scholar
- (1988) A model of sequential auctions. Econom. Lett. 26(3):227–233.Crossref, Google Scholar
- (1991) Approximation capabilities of multilayer feedforward networks. Neural Networks 4(2):251–257.Crossref, Google Scholar
- (2022) On the approximate purification of mixed strategies in games with infinite action sets. Econom. Theory Bull. 10(1):69–93.Crossref, Google Scholar
- (2011) Accelerating best response calculation in large extensive games. Proc. 22nd Internat. Joint Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 258–265.Google Scholar
- (1995) Behavior strategies, mixed strategies and perfect recall. Internat. J. Game Theory 24(2):127–145.Google Scholar
- (1999) A two stage sequential auction with multi-unit demands. J. Econom. Theory 86(1):77–99.Crossref, Google Scholar
- (2003) Why every economist should learn some auction theory. Hansen LP, Dewatripont M, Turnovsky SJ, eds. Advances in Economics and Econometrics: Theory and Applications, Eighth World Congress (Cambridge University Press, Cambridge, UK), 25–55. Google Scholar
- (2023) Enabling first-order gradient-based learning for equilibrium computation in markets. Krause A, Brunskill E, Cho K, Engelhardt B, Sabato S, Scarlett J, eds. Proc. Internat. Conf. Machine Learn., vol. 202 (PMLR, New York), 17327–17342.Google Scholar
- (2009) Strategy and Dynamics in Contests (Oxford University Press, Oxford, UK).Crossref, Google Scholar
- (1982) Sequential equilibria. Econometrica 50(4):863–894. Crossref, Google Scholar
- (2009) Auction Theory (Academic Press, New York).Google Scholar
- (2014) Extensive-form game abstraction with bounds. Proc. ACM Conf. Econom. Comput. (ACM, New York), 621–638.Google Scholar
- (2015) Discretization of continuous action spaces in extensive-form games. Proc. Internat. Conf. Autonomous Agents Multi-Agent Systems (International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC), 47–56.Google Scholar
- (2016) Imperfect-recall abstractions with bounds in games. Proc. ACM Conf. Econom. Comput. (ACM, New York), 459–476.Google Scholar
- (2018) A unified framework for extensive-form game abstraction with bounds. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Proc. Conf. Neural Inform. Processing Systems, vol. 31 (Curran Associates Inc., Red Hook, NY), 613–624.Google Scholar
- (1989) Equilibria of the sealed-bid mechanism for bargaining with incomplete information. J. Econom. Theory 48(1):63–106.Crossref, Google Scholar
- (1985) Cournot oligopoly with information sharing. RAND J. Econom. 16(4):521.Crossref, Google Scholar
- (1994) Markov games as a framework for multi-agent reinforcement learning. Cohen WW, Hirsh H, eds. Proc. 11th Internat. Conf. Machine Learn. (Morgan Kaufmann Publishers, San Francisco), 157–163.Google Scholar
- (1999) The value of commitment with imperfect observability and private information. RAND J. Econom. 30(4):555–574.Crossref, Google Scholar
- (2019) Policy-gradient algorithms have no guarantees of convergence in linear quadratic games. Preprint, submitted July 8, https://arxiv.org/abs/1907.03712.Google Scholar
- (2020) Policy-gradient algorithms have no guarantees of convergence in linear quadratic games. Proc. 19th Internat. Conf. Autonomous Agents MultiAgent Systems (International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC), 860–868.Google Scholar
- (2003) Synergies and price trends in sequential auctions. Rev. Econom. Design 8(1):85–98.Google Scholar
- (2018) Cycles in adversarial regularized learning. Proc. 29th Annual ACM-SIAM Sympos. Discrete Algorithms (SIAM, Philadelphia), 2703–2717.Google Scholar
- (2011) Sequential auctions with informational externalities and aversion to price risk: Decreasing and increasing price sequences. Econom. J. 121(555):990–1016.Google Scholar
- (1982) A theory of auctions and competitive bidding. Econometrica 50(5):1089–1122.Crossref, Google Scholar
- (2000) A theory of auctions and competitive bidding II. Internat. Library Critical Writings Econom. 113:179–194.Google Scholar
- (2022) Nash, Conley, and computation: Impossibility and incompleteness in game dynamics. Preprint, submitted March 26, https://arxiv.org/abs/2203.14129.Google Scholar
- (2020) Monte Carlo gradient estimation in machine learning. J. Machine Learn. Res. 21(1):132:5183–132:5244.Google Scholar
- (2006) Contest architecture. J. Econom. Theory 126(1):70–96.Crossref, Google Scholar
- (1996) Potential games. Games Econom. Behav. 14(1):124–143.Crossref, Google Scholar
- (2020) Perfect conditional ϵ-equilibria of multi-stage games with infinite sets of signals and actions. Econometrica 88(2):495–531.Crossref, Google Scholar
- (1950) Equilibrium points in N-person games. Proc. Natl. Acad. Sci. USA 36(1):48–49.Crossref, Google Scholar
- (2006) Behavior in all-pay auctions with incomplete information. Games Econom. Behav. 55(1):189–206.Crossref, Google Scholar
- (2007) Allocating defensive resources with private information about vulnerability. Amer. Political Sci. Rev. 101(4):799–809.Crossref, Google Scholar
- (1964) Risk aversion in the small and in the large. Econometrica 32(1/2):122–136.Crossref, Google Scholar
- (1978) Competitive bidding for offshore petroleum leases. Bell J. Econom. 9(2):369–384.Google Scholar
- (1999) On the existence of pure and mixed strategy Nash equilibria in discontinuous games. Econometrica 67(5):1029–1056.Crossref, Google Scholar
- (2011) On the existence of monotone pure-strategy equilibria in Bayesian games. Econometrica 79(2):499–553.Crossref, Google Scholar
- (2023) Loss aversion in sequential auctions. Theoret. Econom. 18(2):561–596.Crossref, Google Scholar
- (2018) The prevalence of chaotic dynamics in games with many players. Sci. Rep. 8(1):1–13.Crossref, Google Scholar
- (2012) Lossy stochastic game abstraction with bounds. Proc. ACM Conf. Electronic Commerce (ACM, New York), 880–897.Google Scholar
- (2017) Proximal policy optimization algorithms. Preprint, submitted July 20, https://arxiv.org/abs/1707.06347.Google Scholar
- (2000) Abstraction methods for game theoretic poker. Marsland T, Frank I, eds. Proc. 2nd Internat. Conf. Comput. Games (Springer-Verlag, Berlin), 333–345.Google Scholar
- (2008) Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484–489.Crossref, Google Scholar
- (2018) A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science (New York) 362(6419):1140–1144.Google Scholar
- (1995) Risk aversion in contests. Econom. J. 105(431):951–962.Google Scholar
- (1976) The logic of asymmetric contests. Animal Behav. 24(1):159–175.Google Scholar
- (1978) Job market signaling. Diamond P, Rothschild M, eds. Uncertainty in Economics (Academic Press, New York), 281–306.Crossref, Google Scholar
- (2018) Actor-critic policy optimization in partially observable multiagent environments. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Proc. Conf. Neural Inform. Processing Systems, vol. 31 (Curran Associates Inc., Red Hook, NY), 3422–3435.Google Scholar
- (2014) Optimal prizes in dynamic elimination contests: Theory and experimental evidence. J. Econom. Behav. Organ. 102:43–58.Google Scholar
- (2018) Reinforcement Learning: An Introduction, Adaptive Computation and Machine Learning Series, 2nd ed. (MIT Press, Cambridge, MA).Google Scholar
- (1999) Policy gradient methods for reinforcement learning with function approximation. Advances in Neural Information Processing Systems, vol. 12 (MIT Press, Cambridge, MA).Google Scholar
- (2020) Approximate exploitability: Learning a best response in large games. Preprint, submitted April 20, https://arxiv.org/abs/2004.09677.Google Scholar
- (2014) Sequential auctions and price anomalies. Econom. Ann. 59(200):7–42.Google Scholar
- (2016) Bayesian Nash equilibrium and variational inequalities. J. Math. Econom. 63:139–146.Crossref, Google Scholar
- (1961) Counterspeculation, auctions, and competitive sealed tenders. J. Finance 16(1):8–37.Crossref, Google Scholar
- (2019) Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575(7782):350–354.Crossref, Google Scholar
- (2020) No-regret learning and mixed Nash equilibria: They do not mix. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. Proc. Annual Conf. Neural Inform. Processing Systems, vol. 33 (Curran Associates Inc., Red Hook, NY), 1380–1391.Google Scholar
- (2016) Contest Theory: Incentive Mechanisms and Ranking Methods (Cambridge University Press, Cambridge, UK).Google Scholar
- (2011) Market Structure and Equilibrium (Springer, Berlin).Google Scholar
- (2024) Gemnet: Menu-based, strategy-proof multi-bidder auctions through deep learning. Proc. 25th ACM Conf. Econom. Comput. (ACM, New York), 1100.Google Scholar
- (2017) A general, practicable definition of perfect Bayesian equilibrium. Working paper, University of California, San Diego. Google Scholar
- (2009) Abstraction pathologies in extensive games. Proc. Internat. Joint Conf. Autonomous Agents Multiagent Systems (International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC), 781–788.Google Scholar
- (1981) Multiple-object auctions. Technical report, Northwestern University, Evanston, IL.Google Scholar
- (1969) Competitive bidding with disparate information. Management Sci. 15(7):446–448.Link, Google Scholar
- (2005) Contests with multiple rounds. Games Econom. Behav. 51(1):213–227.Crossref, Google Scholar
- (2004) Strategic Learning and Its Limits (OUP Oxford).Crossref, Google Scholar
- (2022) The surprising effectiveness of PPO in cooperative multi-agent games. Adv. Neural Inform. Processing Systems 35:24611–24624.Google Scholar
- (2008) Simultaneous signaling in elimination contests. Technical report, Queen’s University, Department of Economics, Kingston, ON.Google Scholar
- (2023) Computing optimal equilibria and mechanisms via learning in zero-sum extensive-form games. Oh A, Neumann T, Globerson A, Saenko K, Hardt M, Levine S, eds. Advances in Neural Information Processing Systems, vol. 36 (Curran Associates Inc., Red Hook, NY), 2659–2699.Google Scholar
- (2007) Regret minimization in games with incomplete information. Platt J, Koller D, Singer Y, Roweis S, eds. Adv. Neural Inform. Processing Systems, vol. 20 (Curran Associates Inc., Red Hook, NY). Google Scholar

