An Approximate Dynamic Programming Approach to Repeated Games with Vector Losses
Published Online:29 Aug 2022https://doi.org/10.1287/opre.2022.2334
References
- (2008) Optimal strategies from random walks. Servedio R, Zhang T, eds. Proc. 21st Annual Conf. on Learn. Theory (Association for Computational Learning, Mountain View, CA), 437–446.Google Scholar
- (1986) Optimal cartel equilibria with imperfect monitoring. J. Econom. Theory 39(1):251–269.Crossref, Google Scholar
- (1990) Toward a theory of discounted repeated games with imperfect monitoring. Econometrica 58(5):1041–1063.Crossref, Google Scholar
- (2006) Solving POMDPs using quadratically constrained linear programs. Weiss G, Stone P, eds. Proc. 5th Internat. Joint Conf. Autonomous Agents Multiagent Systems (Association for Computing Machinery, New York), 341–343.Google Scholar
- (2010) Optimizing fixed-size stochastic controllers for pomdps and decentralized pomdps. Autonomic Agent Multi Agent Systems 21(3):293–320.Crossref, Google Scholar
- (2002) The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1):48–77.Crossref, Google Scholar
- (1995) Repeated Games with Incomplete Information (MIT Press, Cambridge, MA).Google Scholar
- (2015) Minimax fixed-design linear regression. Grünwald P, Hazan E, eds. Conf. Learn. Theory (Association for Computational Learning, Mountain View, CA), 437–446.Google Scholar
- (2020) On the asymptotic optimality of the comb strategy for prediction with expert advice. Ann. Appl. Probability 30(6):2517–2546.Crossref, Google Scholar
- (2018) Gekko optimization suite. Processes (Basel) 6(8):106.Crossref, Google Scholar
- (2005) Dynamic Programming and Optimal Control, vol 1 (Athena Scientific).Google Scholar
- (2012) Dynamic Programming and Optimal Control, vol 2 (Athena Scientific).Google Scholar
- (1956) An analog of the minimax theorem for vector payoffs. Pacific J. Math. 6(1):1–8.Crossref, Google Scholar
- (2007) From external to internal regret. J. Machine Learn. Res. 8(Jun):1307–1324.Google Scholar
- (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations Trends Machine Learn. 5(1):1–122.Crossref, Google Scholar
- (2003) Potential-based algorithms in on-line prediction and game theory. Machine Learn. 51(3):239–261.Crossref, Google Scholar
- (2006) Prediction, Learning, and Games (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- (1997) How to use expert advice. J. ACM 44(3):427–485.Crossref, Google Scholar
- (2010) Prediction with expert advice under discounted loss. Algorithmic Learning Theory (Springer, Berlin), 255–269.Crossref, Google Scholar
- (1966) Behavior of sequential predictors of binary sequences. Technical report, Stanford University, Stanford, CA.Google Scholar
- (1997) Calibrated learning and correlated equilibrium. Games Econom. Behav. 21(1–2):40–55.Crossref, Google Scholar
- (1999) Adaptive game playing using multiplicative weights. Games Econom. Behav. 29(1–2):79–103.Crossref, Google Scholar
- (2016) Toward optimal algorithms for prediction with expert advice. Kraughgamer R, ed. Proc. 27th Annual ACM-SIAM Sympos. on Discrete Algorithms (Society for Industrial and Applied Mathematics, Philadelphia), 528–547.Google Scholar
- (1957) Approximation to Bayes risk in repeated plays. Dresher M, Tucker AW, Wolfe P, eds. Contributions to the Theory of Games, vol. 3 (Princeton University Press, Princeton, NJ), 97–139.Google Scholar
- (2016) Introduction to online convex optimization. Foundations Trends Optim. 2(3-4):157–325.Crossref, Google Scholar
- (1999) Completeness and total boundedness of the Hausdorff metric. MIT Undergraduate J. Math. 1:69–80.Google Scholar
- (2013) The Pareto regret frontier. Burges CJ, Bottou L, Welling M, Ghahramani Z, Weinberger KQ, eds. Advances in Neural Information Processing Systems (Curran Associates, Inc., Red Hook, NY), 863–871.Google Scholar
- (2014) Efficient minimax strategies for square loss games. Ghahramani Z, Welling M, Cortes C, Lawrence N, Weinberger KQ, eds. Advances in Neural Information Processing Systems (Curran Associates, Inc., Red Hook, NY), 3230–3238.Google Scholar
- (2015) Minimax time series prediction. Cortes C, Lawrence N, Lee D, Sugiyama M, Garnett R, eds. Advances in Neural Information Processing Systems (Curran Associates, Inc., Red Hook, NY), 2557–2565.Google Scholar
- (2015) Advances in zero-sum dynamic games. Handbook of Game Theory with Economic Applications, vol. 4 (Elsevier, New York), 27–93.Google Scholar
- (2003) Approachability in infinite dimensional spaces. Internat. J. Game Theory 31(2):253–268.Crossref, Google Scholar
- (1994) The weighted majority algorithm. Inform. Comput. 108(2):212–261.Crossref, Google Scholar
- (2014) Toward minimax online learning with unknown time horizon. Xing Ep, Jebara T, eds. Internat. Conf. Machine Learn. Proc. 31st Internat. Conf. on Machine Learn., 226–234.Google Scholar
- (2006) Approachable sets of vector payoffs in stochastic games. Games Econom. Behav. 56(1):135–147.Crossref, Google Scholar
- (2000) Topology: A First Course (Prentice Hall, Inc., Hoboken, NJ).Google Scholar
- (2011a) Approachability of convex sets in games with partial monitoring. J. Optim. Theory Appl. 149(3):665–677.Crossref, Google Scholar
- (2011b) Internal regret with partial monitoring: Calibration-based optimal algorithms. J. Machine Learn. Res. 12(Jun):1893–1921.Google Scholar
- (2014) Approachability, regret and calibration: Implications and equivalences. J. Dynamic Games 1(2):181–254.Crossref, Google Scholar
- (2014) On a unified framework for approachability with full or partial monitoring. Math. Oper. Res. 40(3):596–610.Link, Google Scholar
- (2014) Markov Decision Processes: Discrete Stochastic Dynamic Programming (John Wiley & Sons, Hoboken, NJ).Google Scholar
- (1953) Stochastic games. Proc. National Acad. Sci. USA 39(10):1095.Crossref, Google Scholar
- (2002) A necessary and sufficient condition for approachability. Math. Oper. Res. 27(1):31–44.Link, Google Scholar
- (2005) Internal regret in on-line portfolio selection. Machine Learn. 59(1–2):125–159.Crossref, Google Scholar
- (1992) Weak approachability. Math. Oper. Res. 17(4):781–791.Link, Google Scholar
- (1990) Aggregating strategies. Fulk M, Case J, eds. Proc. 3rd Annual Workshop Comput. Learning Theory (COLT ’90) (Morgan Kaufmann Publishers Inc., San Francisco), 371–386.Google Scholar

