An Approximate Dynamic Programming Approach to Repeated Games with Vector Losses
Abstract
We describe an approximate dynamic programming (ADP) approach to compute approximations of the optimal strategies and of the minimal losses that can be guaranteed in discounted repeated games with vector-valued losses. Among other applications, such vector-valued games prominently arise in the analysis of worst-case regret in repeated decision making in unknown environments, also known as the adversarial online learning framework. At the core of our approach is a characterization of the lower Pareto frontier of the set of expected losses that a player can guarantee in these games as the unique fixed point of a set-valued dynamic programming operator. When applied to the problem of worst-case regret minimization with discounted losses, our approach yields algorithms that achieve markedly improved performance bounds compared with off-the-shelf online learning algorithms like Hedge. These results thus suggest the significant potential of ADP-based approaches in adversarial online learning.
Funding: This work has been partially supported by the Multidisciplinary Institute in Artificial Intelligence (MIAI) at Grenoble Alpes (ANR-19-P3IA-0003), by the French National Research Agency (ANR) [Grant ANR-20-CE23-0007], by the U.S. Airforce Office of Scientific Research (AFOSR) [Grant MURI FA9550-10-1-0573], by the France-Berkeley Fund, and by the Alexander von Humboldt Foundation.
Supplemental Material: The online appendix is available at https://doi.org/10.1287/opre.2022.2334.

