An Approximate Dynamic Programming Approach to Repeated Games with Vector Losses

Published Online:https://doi.org/10.1287/opre.2022.2334

References

  • Abernethy J, Warmuth MK, Yellin J (2008) Optimal strategies from random walks. Servedio R, Zhang T, eds. Proc. 21st Annual Conf. on Learn. Theory (Association for Computational Learning, Mountain View, CA), 437–446.Google Scholar
  • Abreu D, Pearce D, Stacchetti E (1986) Optimal cartel equilibria with imperfect monitoring. J. Econom. Theory 39(1):251–269.CrossrefGoogle Scholar
  • Abreu D, Pearce D, Stacchetti E (1990) Toward a theory of discounted repeated games with imperfect monitoring. Econometrica 58(5):1041–1063.CrossrefGoogle Scholar
  • Amato C, Bernstein DS, Zilberstein S (2006) Solving POMDPs using quadratically constrained linear programs. Weiss G, Stone P, eds. Proc. 5th Internat. Joint Conf. Autonomous Agents Multiagent Systems (Association for Computing Machinery, New York), 341–343.Google Scholar
  • Amato C, Bernstein DS, Zilberstein S (2010) Optimizing fixed-size stochastic controllers for pomdps and decentralized pomdps. Autonomic Agent Multi Agent Systems 21(3):293–320.CrossrefGoogle Scholar
  • Auer P, Cesa-Bianchi N, Freund Y, Schapire RE (2002) The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1):48–77.CrossrefGoogle Scholar
  • Aumann RJ, Maschler M, Stearns RE (1995) Repeated Games with Incomplete Information (MIT Press, Cambridge, MA).Google Scholar
  • Bartlett PL, Koolen WM, Malek A, Takimoto E, Warmuth MK (2015) Minimax fixed-design linear regression. Grünwald P, Hazan E, eds. Conf. Learn. Theory (Association for Computational Learning, Mountain View, CA), 437–446.Google Scholar
  • Bayraktar E, Ekren I, Zhang Y (2020) On the asymptotic optimality of the comb strategy for prediction with expert advice. Ann. Appl. Probability 30(6):2517–2546.CrossrefGoogle Scholar
  • Beal LDR, Hill DC, Martin RA, Hedengren JD (2018) Gekko optimization suite. Processes (Basel) 6(8):106.CrossrefGoogle Scholar
  • Bertsekas DP (2005) Dynamic Programming and Optimal Control, vol 1 (Athena Scientific).Google Scholar
  • Bertsekas DP (2012) Dynamic Programming and Optimal Control, vol 2 (Athena Scientific).Google Scholar
  • Blackwell D (1956) An analog of the minimax theorem for vector payoffs. Pacific J. Math. 6(1):1–8.CrossrefGoogle Scholar
  • Blum A, Mansour Y (2007) From external to internal regret. J. Machine Learn. Res. 8(Jun):1307–1324.Google Scholar
  • Bubeck S, Cesa-Bianchi N (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations Trends Machine Learn. 5(1):1–122.CrossrefGoogle Scholar
  • Cesa-Bianchi N, Lugosi G (2003) Potential-based algorithms in on-line prediction and game theory. Machine Learn. 51(3):239–261.CrossrefGoogle Scholar
  • Cesa-Bianchi N, Lugosi G (2006) Prediction, Learning, and Games (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Cesa-Bianchi N, Freund Y, Haussler D, Helmbold DP, Schapire RE, Warmuth MK (1997) How to use expert advice. J. ACM 44(3):427–485.CrossrefGoogle Scholar
  • Chernov A, Zhdanov F (2010) Prediction with expert advice under discounted loss. Algorithmic Learning Theory (Springer, Berlin), 255–269.CrossrefGoogle Scholar
  • Cover TM (1966) Behavior of sequential predictors of binary sequences. Technical report, Stanford University, Stanford, CA.Google Scholar
  • Foster DP, Vohra RV (1997) Calibrated learning and correlated equilibrium. Games Econom. Behav. 21(1–2):40–55.CrossrefGoogle Scholar
  • Freund Y, Schapire RE (1999) Adaptive game playing using multiplicative weights. Games Econom. Behav. 29(1–2):79–103.CrossrefGoogle Scholar
  • Gravin N, Peres Y, Sivan B (2016) Toward optimal algorithms for prediction with expert advice. Kraughgamer R, ed. Proc. 27th Annual ACM-SIAM Sympos. on Discrete Algorithms (Society for Industrial and Applied Mathematics, Philadelphia), 528–547.Google Scholar
  • Hannan J (1957) Approximation to Bayes risk in repeated plays. Dresher M, Tucker AW, Wolfe P, eds. Contributions to the Theory of Games, vol. 3 (Princeton University Press, Princeton, NJ), 97–139.Google Scholar
  • Hazan E (2016) Introduction to online convex optimization. Foundations Trends Optim. 2(3-4):157–325.CrossrefGoogle Scholar
  • Henrikson J (1999) Completeness and total boundedness of the Hausdorff metric. MIT Undergraduate J. Math. 1:69–80.Google Scholar
  • Koolen WM (2013) The Pareto regret frontier. Burges CJ, Bottou L, Welling M, Ghahramani Z, Weinberger KQ, eds. Advances in Neural Information Processing Systems (Curran Associates, Inc., Red Hook, NY), 863–871.Google Scholar
  • Koolen WM, Malek A, Bartlett PL (2014) Efficient minimax strategies for square loss games. Ghahramani Z, Welling M, Cortes C, Lawrence N, Weinberger KQ, eds. Advances in Neural Information Processing Systems (Curran Associates, Inc., Red Hook, NY), 3230–3238.Google Scholar
  • Koolen WM, Malek A, Bartlett PL, Abbasi Y (2015) Minimax time series prediction. Cortes C, Lawrence N, Lee D, Sugiyama M, Garnett R, eds. Advances in Neural Information Processing Systems (Curran Associates, Inc., Red Hook, NY), 2557–2565.Google Scholar
  • Laraki R, Sorin S (2015) Advances in zero-sum dynamic games. Handbook of Game Theory with Economic Applications, vol. 4 (Elsevier, New York), 27–93.Google Scholar
  • Lehrer E (2003) Approachability in infinite dimensional spaces. Internat. J. Game Theory 31(2):253–268.CrossrefGoogle Scholar
  • Littlestone N, Warmuth MK (1994) The weighted majority algorithm. Inform. Comput. 108(2):212–261.CrossrefGoogle Scholar
  • Luo H, Schapire R (2014) Toward minimax online learning with unknown time horizon. Xing Ep, Jebara T, eds. Internat. Conf. Machine Learn. Proc. 31st Internat. Conf. on Machine Learn., 226–234.Google Scholar
  • Milman E (2006) Approachable sets of vector payoffs in stochastic games. Games Econom. Behav. 56(1):135–147.CrossrefGoogle Scholar
  • Munkres JR (2000) Topology: A First Course (Prentice Hall, Inc., Hoboken, NJ).Google Scholar
  • Perchet V (2011a) Approachability of convex sets in games with partial monitoring. J. Optim. Theory Appl. 149(3):665–677.CrossrefGoogle Scholar
  • Perchet V (2011b) Internal regret with partial monitoring: Calibration-based optimal algorithms. J. Machine Learn. Res. 12(Jun):1893–1921.Google Scholar
  • Perchet V (2014) Approachability, regret and calibration: Implications and equivalences. J. Dynamic Games 1(2):181–254.CrossrefGoogle Scholar
  • Perchet V, Quincampoix M (2014) On a unified framework for approachability with full or partial monitoring. Math. Oper. Res. 40(3):596–610.LinkGoogle Scholar
  • Puterman ML (2014) Markov Decision Processes: Discrete Stochastic Dynamic Programming (John Wiley & Sons, Hoboken, NJ).Google Scholar
  • Shapley LS (1953) Stochastic games. Proc. National Acad. Sci. USA 39(10):1095.CrossrefGoogle Scholar
  • Spinat X (2002) A necessary and sufficient condition for approachability. Math. Oper. Res. 27(1):31–44.LinkGoogle Scholar
  • Stoltz G, Lugosi G (2005) Internal regret in on-line portfolio selection. Machine Learn. 59(1–2):125–159.CrossrefGoogle Scholar
  • Vieille N (1992) Weak approachability. Math. Oper. Res. 17(4):781–791.LinkGoogle Scholar
  • Vovk VG (1990) Aggregating strategies. Fulk M, Case J, eds. Proc. 3rd Annual Workshop Comput. Learning Theory (COLT ’90) (Morgan Kaufmann Publishers Inc., San Francisco), 371–386.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.