Beyond O(T) Regret: Decoupling Learning and Decision Making in Online Linear Programming

Published Online:https://doi.org/10.1287/opre.2024.1575

Online linear programming plays an important role in both revenue management and resource allocation, and recent research has focused on developing efficient first-order online learning algorithms. Despite the empirical success of first-order methods, they typically achieve a regret no better than O(T), which is suboptimal compared with the O(logT) bound guaranteed by the state-of-the-art linear programming (LP)-based online algorithms. This paper establishes a general framework that improves on the O(T) result when the LP dual problem exhibits certain error bound conditions. For the first time, we show that first-order learning algorithms achieve o(T) regret in the continuous support setting and O(logT) regret in the finite support setting beyond the nondegeneracy assumption. Our results significantly improve the state-of-the-art regret results and provide new insights for sequential decision making.

Funding: This research was supported by the National Natural Science Foundation of China [Grants 72225009, 72394360, and 72394365].

Supplemental Material: All supplemental materials, including the code, data, and files required to reproduce the results, are available at https://doi.org/10.1287/opre.2024.1575.

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.