Deep Reinforcement Learning for Equilibrium Computation in Multistage Auctions and Contests

Fabian R. Pieroth
Corresponding Author
Fabian R. Pieroth
[email protected]
https://orcid.org/0000-0002-5712-1706
School of Computation, Information and Technology, Technical University of Munich
Search for more papers by this author
,
Nils Kohring
Nils Kohring
[email protected]
https://orcid.org/0000-0002-1952-167X
School of Computation, Information and Technology, Technical University of Munich
Search for more papers by this author
,
Martin Bichler
Martin Bichler
[email protected]
https://orcid.org/0000-0001-5491-2935
School of Computation, Information and Technology, Technical University of Munich
Search for more papers by this author

Fabian R. Pieroth

Corresponding Author

Fabian R. Pieroth

[email protected]

https://orcid.org/0000-0002-5712-1706

School of Computation, Information and Technology, Technical University of Munich

Search for more papers by this author

Nils Kohring

[email protected]

https://orcid.org/0000-0002-1952-167X

School of Computation, Information and Technology, Technical University of Munich

Search for more papers by this author

Martin Bichler

[email protected]

https://orcid.org/0000-0001-5491-2935

School of Computation, Information and Technology, Technical University of Munich

Search for more papers by this author

Published Online:5 Dec 2025https://doi.org/10.1287/mnsc.2024.06771

References

Ackerman NL, Freer CE, Roy DM (2011) Noncomputable conditional distributions. Proc. IEEE 26th Annual Sympos. Logic Comput. Sci. (IEEE Computer Society, Washington, DC), 107–116.Google Scholar
Ahunbay M, Bichler M (2025) On the uniqueness of Bayesian coarse correlated equilibria in standard first-price and all-pay auctions. Azar Y, Panigrahi D, eds. Proc. Sympos. Discrete Algorithms (SIAM, Philadelphia), 2491–2537.Google Scholar
Amann E, Leininger W (1996) Asymmetric all-pay auctions with incomplete information: The two-player case. Games Econom. Behav. 14(1):1–18.Crossref, Google Scholar
Arozamena L, Weinschelbaum F (2009) Simultaneous vs. sequential price competition with incomplete information. Econom. Lett. 104(1):23–26.Google Scholar
Arrow KJ (1971) Essays in the Theory of Risk-Bearing (North-Holland, Amsterdam).Google Scholar
Ashlagi I, Monderer D, Tennenholtz M (2006) Robust learning equilibrium. Dechter R, Richardson T, eds. Proc. 22nd Conf. Uncertainty Artificial Intelligence (AUAI Press, Corvallis, OR), 7–14.Google Scholar
Askoura Y (2019) An infinite dimensional purification principle without saturation. J. Math. Anal. Appl. 472(2):1331–1345.Crossref, Google Scholar
Bagwell K (1995) Commitment and observability in games. Games Econom. Behav. 8(2):271–280.Crossref, Google Scholar
Baik KH (1994) Effort levels in contests with two asymmetric players. South. Econom. J. 61(2):367–378.Google Scholar
Bailey JP, Piliouras G (2018) Multiplicative weights update in zero-sum games. Tardos É, Elkind E, Vohra R, eds. Proc. ACM Conf. Econom. Comput. (ACM, Ithaca, NY), 321–338.Google Scholar
Berner C, Brockman G, Chan B, Cheung V, Debiak P, Dennison C, Farhi D, et al. (2019) Dota 2 with Large Scale Deep Reinforcement Learning.Google Scholar
Bernoulli D (1738) Exposition of a new theory on the measurement of risk. Econometrica 22(1):23.Crossref, Google Scholar
Bichler M, Fichtl MAX, Oberlechner M (2025) Computing Bayes–Nash equilibrium strategies in auction games via simultaneous online dual averaging. Oper. Res. 73(2):1102–1127. Link, Google Scholar
Bichler M, Kohring N, Heidekrüger S (2023b) Learning equilibria in asymmetric auction games. INFORMS J. Comput. 35(3):523–542.Link, Google Scholar
Bichler M, Kohring N, Oberlechner M, Pieroth FR (2023) Learning equilibrium in bilateral bargaining games. Eur. J. Oper. Res. 311(2):660–678. Google Scholar
Bichler M, Fichtl M, Heidekrüger S, Kohring N, Sutterer P (2021) Learning equilibria in symmetric auction games using artificial neural networks. Natural Machine Intelligence 3(8):687–695.Google Scholar
Bichler M, Lunowa S, Pieroth F, Oberlechner M, Wohlmuth B (2023c) Beyond monotonicity: On the convergence of learning algorithms in standard auction games. Walsh T, Shah J, Kolter Z, eds. Proc. Thirty-Ninth AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 13649–13657.Google Scholar
Bikhchandani S (1988) Reputation in repeated second-price auctions. J. Econom. Theory 46(1):97–119.Crossref, Google Scholar
Billings D, Burch N, Davidson A, Holte R, Schaeffer J, Schauenberg T, Szafron D (2003) Approximating game-theoretic optimal strategies for full-scale poker. Proc. Internat. Joint Conf. Artificial Intelligence (Morgan Kaufmann, San Francisco), 661–667.Google Scholar
Bonanno G (2004) Memory and perfect recall in extensive games. Games Econom. Behav. 47(2):237–256.Google Scholar
Bosshard V, Bünz B, Lubin B, Seuken S (2020) Computing Bayes-Nash equilibria in combinatorial auctions with verification. J. Artificial Intelligence Res. 69:531–570.Crossref, Google Scholar
Brown N, Sandholm T (2018) Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science 359(6374):418–424.Crossref, Google Scholar
Brown N, Sandholm T (2019) Superhuman AI for multiplayer poker. Science (1979) 365(6456):885–890.Google Scholar
Brown N, Lerer A, Gross S, Sandholm T (2019) Deep counterfactual regret minimization. Chaudhuri K, Salakhutdinov R, eds. Proc. 36th Internat. Conf. Machine Learn. (PMLR, New York), 793–802.Google Scholar
Bylka S, Ambroszkiewicz S, Komar J (2000) Discrete time dynamic game model for price competition in an oligopoly. Ann. Oper. Res. 97(1):69–89.Google Scholar
Cai Y, Papadimitriou C (2014) Simultaneous Bayesian auctions and computational complexity. Proc. 15th ACM Conf. Econom. Comput. (Association for Computing Machinery, New York), 895–910.Google Scholar
Cai G, Wurman PR, Chao X (2007) The non-existence of equilibrium in sequential auctions when bids are revealed. J. Electronic Commerce Res. 8(2):141–157.Google Scholar
Chawla S, Hartline JD (2018) Auctions with unique equilibria. Proc. 14th ACM Conf. Electronic Commerce (Association for Computing Machinery, New York), 181–196.Google Scholar
Cheung YK, Piliouras G (2020) Chaos, extremism and optimism: Volume analysis of learning in games. Adv. Neural Inform. Processing Systems 33:9039–9049.Google Scholar
Cho IK, Kreps DM (1987) Signaling games and stable equilibria. Quart. J. Econom. 102(2):179–221.Crossref, Google Scholar
Corchón LC, Serena M (2018) Contest theory. Handbook of Game Theory and Industrial Organization, vol. 2 (Edward Elgar Publishing, Inc., Cheltenham, UK), 125–146. Google Scholar
Curry MJ, Sandholm T, Dickerson JP (2023) Differentiable economics for randomized affine maximizer auctions. Proc. Internat. Joint Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 2633–2641.Google Scholar
Curry M, Py C, Goldstein T, Dickerson J (2020) Certifying strategyproof auction networks. Adv. Neural Inform. Processing Systems (NeurIPS), 2020 (Curran Associates Inc., Red Hook, NY), 4987–4998.Google Scholar
Daskalakis C, Goldberg P, Papadimitriou C (2009) The complexity of computing a Nash equilibrium. SIAM J. Comput. 39(1):195–259.Crossref, Google Scholar
Daskalakis C, Golowich N, Zhang K (2023) The complexity of markov equilibrium in stochastic games. Neu G, Rosasco L, eds. Proc. 36th Annual Conf. Learn. Theory (PMLR, New York), 4180–4234.Google Scholar
Daskalakis C, Frongillo R, Papadimitriou CH, Pierrakos G, Valiant G (2010) On learning algorithms for Nash equilibria. Proc. Internat. Sympos. Algorithmic Game Theory (Springer, New York), 114–125.Google Scholar
Dowrick S (1986) Von Stackelberg and Cournot duopoly: Choosing roles. RAND J. Econom. 17(2):251–260.Crossref, Google Scholar
Duetting P, Feng Z, Narasimhan H, Parkes D, Ravindranath SS (2019) Optimal auctions through deep learning: Advances in dfferentiable economics. J. ACM 71(1):1–53.Google Scholar
Ewerhart C, Quartieri F (2020) Unique equilibrium in contests with incomplete information. Econom. Theory 70(1):243–271.Crossref, Google Scholar
Feng Z, Narasimhan H, Parkes DC (2018) Deep learning for revenue-optimal auctions with budgets. Proc. Internat. Conf. Autonomous Agents Multiagent Systems (International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC), 354–362.Google Scholar
Fibich G, Gavish N (2011) Numerical simulations of asymmetric first-price auctions. Games Econom. Behav. 73(2):479–495.Crossref, Google Scholar
Filos-Ratsikas A, Giannakopoulos Y, Hollender A, Lazos P, Poças D (2021) On the complexity of equilibrium computation in first-price auctions. Proc. 22nd ACM Conf. Econom. Comput. (ACM, New York), 454–476.Google Scholar
Fudenberg D, Levine DK (1998) The Theory of Learning in Games, vol. 2 (MIT Press, Cambridge, MA).Google Scholar
Gal-Or E (1987) First mover disadvantages with private information. Rev. Econom. Stud. 54(2):279–292.Crossref, Google Scholar
Gatti N, Panozzo F (2012) New results on the verification of Nash refinements for extensive-form games. Proc. 11th Internat. Conf. Autonomous Agents Multiagent Systems (International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC), 813–820.Google Scholar
Gerber HU, Pafum G (1998) Utility functions: From risk theory to finance. North Amer. Actuarial J. 2(3):74–91.Google Scholar
Gerstgrasser M, Parkes DC (2023) Oracles & followers: Stackelberg equilibria in deep multi-agent reinforcement learning. Krause A, Brunskill E, Cho K, Engelhardt B, Sabato S, Scarlett J, eds. Proc. 40th Internat. Conf. Machine Learn. (PMLR, New York), 11213–11236.Google Scholar
Gilpin A, Sandholm T (2006) A competitive Texas Hold’em poker player via automated abstraction and real-time equilibrium computation. Proc. Natl. Conf. Artificial Intelligence, 1007–1013.Google Scholar
Gilpin A, Sandholm T (2007) Lossless abstraction of imperfect information games. J. ACM. 54(5):25–es. Crossref, Google Scholar
Glicksberg IL (1952) A further generalization of the Kakutani fixed theorem, with application to Nash equilibrium points. Proc. Amer. Math. Soc., vol. 3, 170–174.Google Scholar
Golowich N, Narasimhan H, Parkes DC (2018) Deep learning for multi-facility location mechanism design. Proc. Internat. Joint Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 261–267.Google Scholar
González-Díaz J, Meléndez-Jiménez MA (2014) On the notion of perfect Bayesian equilibrium. TOP 22(1):128–143.Google Scholar
Hansen KA, Miltersen PB, Sørensen TB (2010) The computational complexity of trembling hand perfection and other equilibrium refinements. Proc. 3rd Internat. Conf. Algorithmic Game Theory (Springer-Verlag, Berlin), 198–209.Google Scholar
Hart S, Mas-Colell A (2003) Uncoupled dynamics do not lead to Nash equilibrium. Amer. Econom. Rev. 93(5):1830–1836.Crossref, Google Scholar
Hausch DB (1986) Multi-object auctions: Sequential vs. simultaneous sales. Management Sci. 32(12):1599–1610.Link, Google Scholar
Hausch DB (1988) A model of sequential auctions. Econom. Lett. 26(3):227–233.Crossref, Google Scholar
Hornik K (1991) Approximation capabilities of multilayer feedforward networks. Neural Networks 4(2):251–257.Crossref, Google Scholar
Hosoya Y, Yu C (2022) On the approximate purification of mixed strategies in games with infinite action sets. Econom. Theory Bull. 10(1):69–93.Crossref, Google Scholar
Johanson M, Waugh K, Bowling M, Zinkevich M (2011) Accelerating best response calculation in large extensive games. Proc. 22nd Internat. Joint Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 258–265.Google Scholar
Kaneko M, Kline JJ (1995) Behavior strategies, mixed strategies and perfect recall. Internat. J. Game Theory 24(2):127–145.Google Scholar
Katzman B (1999) A two stage sequential auction with multi-unit demands. J. Econom. Theory 86(1):77–99.Crossref, Google Scholar
Klemperer P (2003) Why every economist should learn some auction theory. Hansen LP, Dewatripont M, Turnovsky SJ, eds. Advances in Economics and Econometrics: Theory and Applications, Eighth World Congress (Cambridge University Press, Cambridge, UK), 25–55. Google Scholar
Kohring N, Pieroth FR, Bichler M (2023) Enabling first-order gradient-based learning for equilibrium computation in markets. Krause A, Brunskill E, Cho K, Engelhardt B, Sabato S, Scarlett J, eds. Proc. Internat. Conf. Machine Learn., vol. 202 (PMLR, New York), 17327–17342.Google Scholar
Konrad KA (2009) Strategy and Dynamics in Contests (Oxford University Press, Oxford, UK).Crossref, Google Scholar
Kreps DM, Wilson R (1982) Sequential equilibria. Econometrica 50(4):863–894. Crossref, Google Scholar
Krishna V (2009) Auction Theory (Academic Press, New York).Google Scholar
Kroer C, Sandholm T (2014) Extensive-form game abstraction with bounds. Proc. ACM Conf. Econom. Comput. (ACM, New York), 621–638.Google Scholar
Kroer C, Sandholm T (2015) Discretization of continuous action spaces in extensive-form games. Proc. Internat. Conf. Autonomous Agents Multi-Agent Systems (International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC), 47–56.Google Scholar
Kroer C, Sandholm T (2016) Imperfect-recall abstractions with bounds in games. Proc. ACM Conf. Econom. Comput. (ACM, New York), 459–476.Google Scholar
Kroer C, Sandholm T (2018) A unified framework for extensive-form game abstraction with bounds. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Proc. Conf. Neural Inform. Processing Systems, vol. 31 (Curran Associates Inc., Red Hook, NY), 613–624.Google Scholar
Leininger W, Linhart P, Radner R (1989) Equilibria of the sealed-bid mechanism for bargaining with incomplete information. J. Econom. Theory 48(1):63–106.Crossref, Google Scholar
Li L (1985) Cournot oligopoly with information sharing. RAND J. Econom. 16(4):521.Crossref, Google Scholar
Littman ML (1994) Markov games as a framework for multi-agent reinforcement learning. Cohen WW, Hirsh H, eds. Proc. 11th Internat. Conf. Machine Learn. (Morgan Kaufmann Publishers, San Francisco), 157–163.Google Scholar
Maggi G (1999) The value of commitment with imperfect observability and private information. RAND J. Econom. 30(4):555–574.Crossref, Google Scholar
Mazumdar E, Ratliff LJ, Jordan MI, Sastry SS (2019) Policy-gradient algorithms have no guarantees of convergence in linear quadratic games. Preprint, submitted July 8, https://arxiv.org/abs/1907.03712.Google Scholar
Mazumdar E, Ratliff LJ, Jordan MI, Sastry SS (2020) Policy-gradient algorithms have no guarantees of convergence in linear quadratic games. Proc. 19th Internat. Conf. Autonomous Agents MultiAgent Systems (International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC), 860–868.Google Scholar
Menezes FM, Monteiro PK (2003) Synergies and price trends in sequential auctions. Rev. Econom. Design 8(1):85–98.Google Scholar
Mertikopoulos P, Papadimitriou C, Piliouras G (2018) Cycles in adversarial regularized learning. Proc. 29th Annual ACM-SIAM Sympos. Discrete Algorithms (SIAM, Philadelphia), 2703–2717.Google Scholar
Mezzetti C (2011) Sequential auctions with informational externalities and aversion to price risk: Decreasing and increasing price sequences. Econom. J. 121(555):990–1016.Google Scholar
Milgrom PR, Weber RJ (1982) A theory of auctions and competitive bidding. Econometrica 50(5):1089–1122.Crossref, Google Scholar
Milgrom P, Weber R (2000) A theory of auctions and competitive bidding II. Internat. Library Critical Writings Econom. 113:179–194.Google Scholar
Milionis J, Papadimitriou C, Piliouras G, Spendlove K (2022) Nash, Conley, and computation: Impossibility and incompleteness in game dynamics. Preprint, submitted March 26, https://arxiv.org/abs/2203.14129.Google Scholar
Mohamed S, Rosca M, Figurnov M, Mnih A (2020) Monte Carlo gradient estimation in machine learning. J. Machine Learn. Res. 21(1):132:5183–132:5244.Google Scholar
Moldovanu B, Sela A (2006) Contest architecture. J. Econom. Theory 126(1):70–96.Crossref, Google Scholar
Monderer D, Shapley LS (1996) Potential games. Games Econom. Behav. 14(1):124–143.Crossref, Google Scholar
Myerson RB, Reny PJ (2020) Perfect conditional ϵ-equilibria of multi-stage games with infinite sets of signals and actions. Econometrica 88(2):495–531.Crossref, Google Scholar
Nash JF (1950) Equilibrium points in N-person games. Proc. Natl. Acad. Sci. USA 36(1):48–49.Crossref, Google Scholar
Noussair C, Silver J (2006) Behavior in all-pay auctions with incomplete information. Games Econom. Behav. 55(1):189–206.Crossref, Google Scholar
Powell R (2007) Allocating defensive resources with private information about vulnerability. Amer. Political Sci. Rev. 101(4):799–809.Crossref, Google Scholar
Pratt JW (1964) Risk aversion in the small and in the large. Econometrica 32(1/2):122–136.Crossref, Google Scholar
Reece DK (1978) Competitive bidding for offshore petroleum leases. Bell J. Econom. 9(2):369–384.Google Scholar
Reny PJ (1999) On the existence of pure and mixed strategy Nash equilibria in discontinuous games. Econometrica 67(5):1029–1056.Crossref, Google Scholar
Reny PJ (2011) On the existence of monotone pure-strategy equilibria in Bayesian games. Econometrica 79(2):499–553.Crossref, Google Scholar
Rosato A (2023) Loss aversion in sequential auctions. Theoret. Econom. 18(2):561–596.Crossref, Google Scholar
Sanders JB, Farmer JD, Galla T (2018) The prevalence of chaotic dynamics in games with many players. Sci. Rep. 8(1):1–13.Crossref, Google Scholar
Sandholm T, Singh S (2012) Lossy stochastic game abstraction with bounds. Proc. ACM Conf. Electronic Commerce (ACM, New York), 880–897.Google Scholar
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. Preprint, submitted July 20, https://arxiv.org/abs/1707.06347.Google Scholar
Shi J, Littman M (2000) Abstraction methods for game theoretic poker. Marsland T, Frank I, eds. Proc. 2nd Internat. Conf. Comput. Games (Springer-Verlag, Berlin), 333–345.Google Scholar
Shoham Y, Leyton-Brown K (2008) Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, van den Driessche G, Schrittwieser J, et al. (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484–489.Crossref, Google Scholar
Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, et al. (2018) A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science (New York) 362(6419):1140–1144.Google Scholar
Skaperdas S, Gan L (1995) Risk aversion in contests. Econom. J. 105(431):951–962.Google Scholar
Smith JM, Parker GA (1976) The logic of asymmetric contests. Animal Behav. 24(1):159–175.Google Scholar
Spence M (1978) Job market signaling. Diamond P, Rothschild M, eds. Uncertainty in Economics (Academic Press, New York), 281–306.Crossref, Google Scholar
Srinivasan S, Lanctot M, Zambaldi V, Pérolat J, Tuyls K, Munos R, Bowling M (2018) Actor-critic policy optimization in partially observable multiagent environments. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Proc. Conf. Neural Inform. Processing Systems, vol. 31 (Curran Associates Inc., Red Hook, NY), 3422–3435.Google Scholar
Stracke R, Höchtl W, Kerschbamer R, Sunde U (2014) Optimal prizes in dynamic elimination contests: Theory and experimental evidence. J. Econom. Behav. Organ. 102:43–58.Google Scholar
Sutton RS, Barto AG (2018) Reinforcement Learning: An Introduction, Adaptive Computation and Machine Learning Series, 2nd ed. (MIT Press, Cambridge, MA).Google Scholar
Sutton RS, McAllester D, Singh S, Mansour Y (1999) Policy gradient methods for reinforcement learning with function approximation. Advances in Neural Information Processing Systems, vol. 12 (MIT Press, Cambridge, MA).Google Scholar
Timbers F, Bard N, Lockhart E, Lanctot M, Schmid M, Burch N, Schrittwieser J, et al. (2020) Approximate exploitability: Learning a best response in large games. Preprint, submitted April 20, https://arxiv.org/abs/2004.09677.Google Scholar
Trifunović D (2014) Sequential auctions and price anomalies. Econom. Ann. 59(200):7–42.Google Scholar
Ui T (2016) Bayesian Nash equilibrium and variational inequalities. J. Math. Econom. 63:139–146.Crossref, Google Scholar
Vickrey W (1961) Counterspeculation, auctions, and competitive sealed tenders. J. Finance 16(1):8–37.Crossref, Google Scholar
Vinyals O, Babuschkin I, Czarnecki WM, Mathieu M, Dudzik A, Chung J, Choi DH, et al. (2019) Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575(7782):350–354.Crossref, Google Scholar
Vlatakis-Gkaragkounis EV, Flokas L, Mertikopoulos P, Piliouras G (2020) No-regret learning and mixed Nash equilibria: They do not mix. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. Proc. Annual Conf. Neural Inform. Processing Systems, vol. 33 (Curran Associates Inc., Red Hook, NY), 1380–1391.Google Scholar
Vojnović M (2016) Contest Theory: Incentive Mechanisms and Ranking Methods (Cambridge University Press, Cambridge, UK).Google Scholar
Von Stackelberg H (2011) Market Structure and Equilibrium (Springer, Berlin).Google Scholar
Wang T, Jiang Y, Parkes DC (2024) Gemnet: Menu-based, strategy-proof multi-bidder auctions through deep learning. Proc. 25th ACM Conf. Econom. Comput. (ACM, New York), 1100.Google Scholar
Watson J (2017) A general, practicable definition of perfect Bayesian equilibrium. Working paper, University of California, San Diego. Google Scholar
Waugh K, Schnizlein D, Bowling MH, Szafron D (2009) Abstraction pathologies in extensive games. Proc. Internat. Joint Conf. Autonomous Agents Multiagent Systems (International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC), 781–788.Google Scholar
Weber RJ (1981) Multiple-object auctions. Technical report, Northwestern University, Evanston, IL.Google Scholar
Wilson RB (1969) Competitive bidding with disparate information. Management Sci. 15(7):446–448.Link, Google Scholar
Yildirim H (2005) Contests with multiple rounds. Games Econom. Behav. 51(1):213–227.Crossref, Google Scholar
Young HP (2004) Strategic Learning and Its Limits (OUP Oxford).Crossref, Google Scholar
Yu C, Velu A, Vinitsky E, Gao J, Wang Y, Bayen A, Wu Y (2022) The surprising effectiveness of PPO in cooperative multi-agent games. Adv. Neural Inform. Processing Systems 35:24611–24624.Google Scholar
Zhang J (2008) Simultaneous signaling in elimination contests. Technical report, Queen’s University, Department of Economics, Kingston, ON.Google Scholar
Zhang B, Farina G, Anagnostides I, Cacciamani F, McAleer S, Haupt A, Celli A, et al. (2023) Computing optimal equilibria and mechanisms via learning in zero-sum extensive-form games. Oh A, Neumann T, Globerson A, Saenko K, Hardt M, Levine S, eds. Advances in Neural Information Processing Systems, vol. 36 (Curran Associates Inc., Red Hook, NY), 2659–2699.Google Scholar
Zinkevich M, Johanson M, Bowling M, Piccione C (2007) Regret minimization in games with incomplete information. Platt J, Koller D, Singer Y, Roweis S, eds. Adv. Neural Inform. Processing Systems, vol. 20 (Curran Associates Inc., Red Hook, NY). Google Scholar

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Received:July 29, 2024
Accepted:April 09, 2025
Published Online:December 05, 2025

Cite as

Fabian R. Pieroth, Nils Kohring, Martin Bichler (2025) Deep Reinforcement Learning for Equilibrium Computation in Multistage Auctions and Contests. Management Science 0(0).

https://doi.org/10.1287/mnsc.2024.06771

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Deep Reinforcement Learning for Equilibrium Computation in Multistage Auctions and Contests

References

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News