An Approximate Dynamic Programming Approach to Repeated Games with Vector Losses

Vijay Kamble
Corresponding Author
Vijay Kamble
[email protected]
https://orcid.org/0000-0002-9261-1612
Department of Information and Decision Sciences, University of Illinois, Chicago, Illinois 60607;
Search for more papers by this author
,
Patrick Loiseau
Patrick Loiseau
[email protected]
University Grenoble Alpes, INRIA, Centre National de la Recherche Scientifique, Institut Polytechnique de Grenoble, Laboratoire d’Informatique de Grenoble, 38058 Grenoble, France;Max-Planck Institute for Software Systems, D-66123 Saarbrücken, Germany;
Search for more papers by this author
,
Jean Walrand
Jean Walrand
[email protected]
Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, California 94720
Search for more papers by this author

Vijay Kamble

Corresponding Author

Vijay Kamble

[email protected]

https://orcid.org/0000-0002-9261-1612

Department of Information and Decision Sciences, University of Illinois, Chicago, Illinois 60607;

Search for more papers by this author

Patrick Loiseau

[email protected]

University Grenoble Alpes, INRIA, Centre National de la Recherche Scientifique, Institut Polytechnique de Grenoble, Laboratoire d’Informatique de Grenoble, 38058 Grenoble, France;Max-Planck Institute for Software Systems, D-66123 Saarbrücken, Germany;

Search for more papers by this author

Jean Walrand

[email protected]

Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, California 94720

Search for more papers by this author

Published Online:29 Aug 2022https://doi.org/10.1287/opre.2022.2334

References

Abernethy J, Warmuth MK, Yellin J (2008) Optimal strategies from random walks. Servedio R, Zhang T, eds. Proc. 21st Annual Conf. on Learn. Theory (Association for Computational Learning, Mountain View, CA), 437–446.Google Scholar
Abreu D, Pearce D, Stacchetti E (1986) Optimal cartel equilibria with imperfect monitoring. J. Econom. Theory 39(1):251–269.Crossref, Google Scholar
Abreu D, Pearce D, Stacchetti E (1990) Toward a theory of discounted repeated games with imperfect monitoring. Econometrica 58(5):1041–1063.Crossref, Google Scholar
Amato C, Bernstein DS, Zilberstein S (2006) Solving POMDPs using quadratically constrained linear programs. Weiss G, Stone P, eds. Proc. 5th Internat. Joint Conf. Autonomous Agents Multiagent Systems (Association for Computing Machinery, New York), 341–343.Google Scholar
Amato C, Bernstein DS, Zilberstein S (2010) Optimizing fixed-size stochastic controllers for pomdps and decentralized pomdps. Autonomic Agent Multi Agent Systems 21(3):293–320.Crossref, Google Scholar
Auer P, Cesa-Bianchi N, Freund Y, Schapire RE (2002) The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1):48–77.Crossref, Google Scholar
Aumann RJ, Maschler M, Stearns RE (1995) Repeated Games with Incomplete Information (MIT Press, Cambridge, MA).Google Scholar
Bartlett PL, Koolen WM, Malek A, Takimoto E, Warmuth MK (2015) Minimax fixed-design linear regression. Grünwald P, Hazan E, eds. Conf. Learn. Theory (Association for Computational Learning, Mountain View, CA), 437–446.Google Scholar
Bayraktar E, Ekren I, Zhang Y (2020) On the asymptotic optimality of the comb strategy for prediction with expert advice. Ann. Appl. Probability 30(6):2517–2546.Crossref, Google Scholar
Beal LDR, Hill DC, Martin RA, Hedengren JD (2018) Gekko optimization suite. Processes (Basel) 6(8):106.Crossref, Google Scholar
Bertsekas DP (2005) Dynamic Programming and Optimal Control, vol 1 (Athena Scientific).Google Scholar
Bertsekas DP (2012) Dynamic Programming and Optimal Control, vol 2 (Athena Scientific).Google Scholar
Blackwell D (1956) An analog of the minimax theorem for vector payoffs. Pacific J. Math. 6(1):1–8.Crossref, Google Scholar
Blum A, Mansour Y (2007) From external to internal regret. J. Machine Learn. Res. 8(Jun):1307–1324.Google Scholar
Bubeck S, Cesa-Bianchi N (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations Trends Machine Learn. 5(1):1–122.Crossref, Google Scholar
Cesa-Bianchi N, Lugosi G (2003) Potential-based algorithms in on-line prediction and game theory. Machine Learn. 51(3):239–261.Crossref, Google Scholar
Cesa-Bianchi N, Lugosi G (2006) Prediction, Learning, and Games (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Cesa-Bianchi N, Freund Y, Haussler D, Helmbold DP, Schapire RE, Warmuth MK (1997) How to use expert advice. J. ACM 44(3):427–485.Crossref, Google Scholar
Chernov A, Zhdanov F (2010) Prediction with expert advice under discounted loss. Algorithmic Learning Theory (Springer, Berlin), 255–269.Crossref, Google Scholar
Cover TM (1966) Behavior of sequential predictors of binary sequences. Technical report, Stanford University, Stanford, CA.Google Scholar
Foster DP, Vohra RV (1997) Calibrated learning and correlated equilibrium. Games Econom. Behav. 21(1–2):40–55.Crossref, Google Scholar
Freund Y, Schapire RE (1999) Adaptive game playing using multiplicative weights. Games Econom. Behav. 29(1–2):79–103.Crossref, Google Scholar
Gravin N, Peres Y, Sivan B (2016) Toward optimal algorithms for prediction with expert advice. Kraughgamer R, ed. Proc. 27th Annual ACM-SIAM Sympos. on Discrete Algorithms (Society for Industrial and Applied Mathematics, Philadelphia), 528–547.Google Scholar
Hannan J (1957) Approximation to Bayes risk in repeated plays. Dresher M, Tucker AW, Wolfe P, eds. Contributions to the Theory of Games, vol. 3 (Princeton University Press, Princeton, NJ), 97–139.Google Scholar
Hazan E (2016) Introduction to online convex optimization. Foundations Trends Optim. 2(3-4):157–325.Crossref, Google Scholar
Henrikson J (1999) Completeness and total boundedness of the Hausdorff metric. MIT Undergraduate J. Math. 1:69–80.Google Scholar
Koolen WM (2013) The Pareto regret frontier. Burges CJ, Bottou L, Welling M, Ghahramani Z, Weinberger KQ, eds. Advances in Neural Information Processing Systems (Curran Associates, Inc., Red Hook, NY), 863–871.Google Scholar
Koolen WM, Malek A, Bartlett PL (2014) Efficient minimax strategies for square loss games. Ghahramani Z, Welling M, Cortes C, Lawrence N, Weinberger KQ, eds. Advances in Neural Information Processing Systems (Curran Associates, Inc., Red Hook, NY), 3230–3238.Google Scholar
Koolen WM, Malek A, Bartlett PL, Abbasi Y (2015) Minimax time series prediction. Cortes C, Lawrence N, Lee D, Sugiyama M, Garnett R, eds. Advances in Neural Information Processing Systems (Curran Associates, Inc., Red Hook, NY), 2557–2565.Google Scholar
Laraki R, Sorin S (2015) Advances in zero-sum dynamic games. Handbook of Game Theory with Economic Applications, vol. 4 (Elsevier, New York), 27–93.Google Scholar
Lehrer E (2003) Approachability in infinite dimensional spaces. Internat. J. Game Theory 31(2):253–268.Crossref, Google Scholar
Littlestone N, Warmuth MK (1994) The weighted majority algorithm. Inform. Comput. 108(2):212–261.Crossref, Google Scholar
Luo H, Schapire R (2014) Toward minimax online learning with unknown time horizon. Xing Ep, Jebara T, eds. Internat. Conf. Machine Learn. Proc. 31st Internat. Conf. on Machine Learn., 226–234.Google Scholar
Milman E (2006) Approachable sets of vector payoffs in stochastic games. Games Econom. Behav. 56(1):135–147.Crossref, Google Scholar
Munkres JR (2000) Topology: A First Course (Prentice Hall, Inc., Hoboken, NJ).Google Scholar
Perchet V (2011a) Approachability of convex sets in games with partial monitoring. J. Optim. Theory Appl. 149(3):665–677.Crossref, Google Scholar
Perchet V (2011b) Internal regret with partial monitoring: Calibration-based optimal algorithms. J. Machine Learn. Res. 12(Jun):1893–1921.Google Scholar
Perchet V (2014) Approachability, regret and calibration: Implications and equivalences. J. Dynamic Games 1(2):181–254.Crossref, Google Scholar
Perchet V, Quincampoix M (2014) On a unified framework for approachability with full or partial monitoring. Math. Oper. Res. 40(3):596–610.Link, Google Scholar
Puterman ML (2014) Markov Decision Processes: Discrete Stochastic Dynamic Programming (John Wiley & Sons, Hoboken, NJ).Google Scholar
Shapley LS (1953) Stochastic games. Proc. National Acad. Sci. USA 39(10):1095.Crossref, Google Scholar
Spinat X (2002) A necessary and sufficient condition for approachability. Math. Oper. Res. 27(1):31–44.Link, Google Scholar
Stoltz G, Lugosi G (2005) Internal regret in on-line portfolio selection. Machine Learn. 59(1–2):125–159.Crossref, Google Scholar
Vieille N (1992) Weak approachability. Math. Oper. Res. 17(4):781–791.Link, Google Scholar
Vovk VG (1990) Aggregating strategies. Fulk M, Case J, eds. Proc. 3rd Annual Workshop Comput. Learning Theory (COLT ’90) (Morgan Kaufmann Publishers Inc., San Francisco), 371–386.Google Scholar

Volume 72, Issue 1

January-February 2024

Pages iii-vi, 1-424, C2-C3

Article Information

Supplemental Material

Metrics

Information

Received:September 29, 2018
Accepted:June 03, 2022
Published Online:August 29, 2022

Cite as

Vijay Kamble, Patrick Loiseau, Jean Walrand (2022) An Approximate Dynamic Programming Approach to Repeated Games with Vector Losses. Operations Research 72(1):373-388.

https://doi.org/10.1287/opre.2022.2334

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

An Approximate Dynamic Programming Approach to Repeated Games with Vector Losses

References

Volume 72, Issue 1

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News