Online Stochastic Optimization with Wasserstein-Based Nonstationarity

Jiashuo Jiang
Jiashuo Jiang
[email protected]
https://orcid.org/0000-0001-5230-4231
Department of Industrial Engineering and Decision Analytics, Hong Kong University of Science and Technology, Hong Kong
Search for more papers by this author
,
Xiaocheng Li
Corresponding Author
Xiaocheng Li
[email protected]
https://orcid.org/0000-0001-6155-9068
Department of Analytics, Marketing, and Operations, Imperial College Business School, Imperial College London, London SW7 2AZ, United Kingdom
Search for more papers by this author
,
Jiawei Zhang
Jiawei Zhang
[email protected]
Department of Technology, Operations and Statistics, NYU Stern School of Business, New York, New York 10012
Search for more papers by this author

Department of Industrial Engineering and Decision Analytics, Hong Kong University of Science and Technology, Hong Kong

Corresponding Author

Xiaocheng Li

Department of Analytics, Marketing, and Operations, Imperial College Business School, Imperial College London, London SW7 2AZ, United Kingdom

Search for more papers by this author

Jiawei Zhang

[email protected]

Department of Technology, Operations and Statistics, NYU Stern School of Business, New York, New York 10012

Search for more papers by this author

Published Online:3 Mar 2025https://doi.org/10.1287/mnsc.2020.03850

References

Adelman D (2007) Dynamic bid prices in revenue management. Oper. Res. 55(4):647–661.Link, Google Scholar
Agrawal S, Devanur NR (2014a) Bandits with concave rewards and convex knapsacks. Proc. 15th ACM Conf. Econom. Comput. (Association for Computing Machinery, New York), 989–1006.Google Scholar
Agrawal S, Devanur NR (2014b) Fast algorithms for online stochastic convex programming. Proc. 26th Ann. ACM-SIAM Sympos. Discrete Algorithms (SIAM, Philadelphia), 1405–1424.Google Scholar
Agrawal S, Wang Z, Ye Y (2014) A dynamic near-optimal algorithm for online linear programming. Oper. Res. 62(4):876–890.Link, Google Scholar
Arjovsky M, Chintala S, Bottou L (2017) Wasserstein generative adversarial networks. Proc. 34th Internat. Conf. Machine Learn., vol. 70 (JMLR.org), 214–223.Google Scholar
Arlotto A, Gurvich I (2019) Uniformly bounded regret in the multisecretary problem. Stochastic Systems 9(3):231–260.Link, Google Scholar
Arlotto A, Xie X (2020) Logarithmic regret in the dynamic and stochastic knapsack problem with equal rewards. Stochastic Systems 10(2):170–191.Link, Google Scholar
Asadpour A, Wang X, Zhang J (2020) Online resource allocation with limited flexibility. Management Sci. 66(2):642–666.Link, Google Scholar
Badanidiyuru A, Kleinberg R, Slivkins A (2013) Bandits with knapsacks. Proc. IEEE 54th Ann. Sympos. Foundations Comput. Sci. (IEEE, Piscataway, NJ), 207–216.Google Scholar
Balseiro S, Lu H, Mirrokni V (2020) The best of many worlds: Dual mirror descent for online allocation problems. Oper. Res. 71(1):iii–v, 1–395, C2–C3.Google Scholar
Banerjee S, Freund D (2020) Uniform loss algorithms for online stochastic decision-making with applications to bin packing. SIGMETRICS '20: ACM SIGMETRICS / Internat. Conf. Measurement Modeling Comput. Systems (Association for Computing Machinery, New York), 1–2.Google Scholar
Banerjee S, Freund D (2024) Good prophets know when the end is near. Management Sci., ePub ahead of print September 23, https://doi.org/10.1287/mnsc.2023.04307.Google Scholar
Bemporad A, Morari M (1999) Robust model predictive control: A survey. Robustness in Identification and Control (Springer, London), 207–226.Crossref, Google Scholar
Besbes O, Zeevi A (2012) Blind network revenue management. Oper. Res. 60(6):1537–1550.Link, Google Scholar
Besbes O, Gur Y, Zeevi A (2014) Stochastic multi-armed-bandit problem with non-stationary rewards. Advances in Neural Information Processing Systems, vol. 27 (Curran Associates, Inc., Red Hook, NY), 199–207.Google Scholar
Besbes O, Gur Y, Zeevi A (2015) Non-stationary stochastic optimization. Oper. Res. 63(5):1227–1244.Link, Google Scholar
Blanchet J, Kang Y, Murthy K (2019) Robust Wasserstein profile inference and applications to machine learning. J. Appl. Probab. 56(3):830–857.Crossref, Google Scholar
Buchbinder N, Naor J (2009) Online primal-dual algorithms for covering and packing. Math. Oper. Res. 34(2):270–286.Link, Google Scholar
Bumpensanti P, Wang H (2020) A re-solving heuristic with uniformly bounded loss for network revenue management. Management Sci. 66(7):2993–3009.Link, Google Scholar
Cheung WC, Simchi-Levi D, Zhu R (2019) Non-stationary reinforcement learning: The blessing of (more) optimism. Preprint, submitted June 17, https://dx.doi.org/10.2139/ssrn.3397818.Google Scholar
Cooper WL (2002) Asymptotic behavior of an allocation policy for revenue management. Oper. Res. 50(4):720–727.Link, Google Scholar
Correa J, Foncea P, Pizarro D, Verdugo V (2019) From pricing to prophets, and back!. Oper. Res. Lett. 47(1):25–29.Crossref, Google Scholar
Devanur NR, Jain K, Sivan B, Wilkens CA (2019) Near optimal online algorithms and fast approximation algorithms for resource allocation problems. J. ACM 66(1):7.Crossref, Google Scholar
Esfahani PM, Kuhn D (2018) Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations. Math. Programming 171(1–2):115–166.Crossref, Google Scholar
Ferguson TS (1989) Who solved the secretary problem? Statist. Sci. 4(3):282–289.Crossref, Google Scholar
Galichon A (2018) Optimal Transport Methods in Economics (Princeton University Press, Princeton, NJ).Google Scholar
Gallego G, Topaloglu H (2019) Revenue Management and Pricing Analytics, vol. 209 (Springer, New York).Crossref, Google Scholar
Gallego G, Van Ryzin G (1994) Optimal dynamic pricing of inventories with stochastic demand over finite horizons. Management Sci. 40(8):999–1020.Link, Google Scholar
Garivier A, Moulines E (2008) On upper-confidence bound policies for non-stationary bandit problems. Preprint, submitted May 22, https://arxiv.org/abs/0805.3415.Google Scholar
Golrezaei N, Nazerzadeh H, Rusmevichientong P (2014) Real-time optimization of personalized assortments. Management Sci. 60(6):1532–1551.Link, Google Scholar
Gupta A, Molinaro M (2014) How experts can solve LPS online. Proc. Eur. Sympos. Algorithms (Springer, Berlin, Heidelberg), 517–529.Google Scholar
Hall E, Willett R (2013) Dynamical models and tracking regret in online convex programming. Proc. Internat. Conf. Machine Learn. (PMLR, New York), 579–587.Google Scholar
Hazan E (2016) Introduction to online convex optimization. Frontiers Optim. 2(3–4):157–325.Google Scholar
Immorlica N, Abinav Sankararaman K, Schapire R, Aleksandrs S (2019) Adversarial bandits with knapsacks. Proc. IEEE 60th Ann. Sympos, Foundations Comput. Sci. (IEEE, Piscataway, NJ), 202–219.Google Scholar
Jadbabaie A, Rakhlin A, Shahrampour S, Sridharan K (2015) Online optimization: Competing with dynamic comparators. Artificial Intelligence and Statistics (PMLR, New York), 398–406.Google Scholar
Jasin S (2015) Performance of an LP-based control for revenue management with unknown demand parameters. Oper. Res. 63(4):909–915.Link, Google Scholar
Jasin S, Kumar S (2012) A re-solving heuristic with bounded revenue loss for network revenue management with customer choice. Math. Oper. Res. 37(2):313–345.Link, Google Scholar
Jasin S, Kumar S (2013) Analysis of deterministic LP-based booking limit and bid price controls for revenue management. Oper. Res. 61(6):1312–1320.Link, Google Scholar
Jenatton R, Huang J, Archambeau C (2016) Adaptive algorithms for online convex optimization with long-term constraints. Proc. Internat. Conf. Machine Learn. (PMLR, New York), 402–411.Google Scholar
Jiang J, Ma W, Zhang J (2022) Degeneracy is ok: Logarithmic regret for network revenue management with indiscrete distributions. Preprint, submitted October 14, https://arxiv.org/abs/2210.07996.Google Scholar
Jiang J, Wang S, Zhang J (2023) Achieving high individual service levels without safety stock? Optimal rationing policy of pooled resources. Oper. Res. 71(1):358–377.Link, Google Scholar
Jiang J, Zhang J (2020) Online resource allocation with stochastic resource consumption. Preprint, submitted December 14, https://arxiv.org/abs/2012.07933.Google Scholar
Kunnumkal S, Talluri K (2016) On a piecewise-linear approximation for network revenue management. Math. Oper. Res. 41(1):72–91.Link, Google Scholar
Lattimore T, Szepesvári C (2020) Bandit Algorithms (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Lecarpentier E, Rachelson E (2019) Non-stationary Markov decision processes, a worst-case approach using model-based reinforcement learning. Advances in Neural Information Processing Systems, vol. 32 (Curran Associates, Inc., Red Hook, NY), 7216–7225.Google Scholar
Li X, Sun C, Ye Y (2020) Simple and fast algorithm for binary integer and online linear programming. Preprint, submitted March 5, https://arxiv.org/abs/2003.02513.Google Scholar
Lu H, Balseiro S, Mirrokni V (2020) Dual mirror descent for online allocation problems. Preprint, submitted November 18, https://arxiv.org/abs/2002.10421.Google Scholar
Ma Y, Rusmevichientong P, Sumida M, Topaloglu H (2020) An approximation algorithm for network revenue management under nonstationary arrivals. Oper. Res. 68(3):834–855.Link, Google Scholar
Mehta A, Saberi A, Vazirani U, Vazirani V (2005) Adwords and generalized on-line matching. Proc. 46th Ann. IEEE Sympos. Foundations Comput. Sci. (IEEE, Piscataway, NJ), 264–273.Google Scholar
Miao S, Wang Y, Zhang J (2021) A general framework for resource constrained revenue management with demand learning and large action space. Preprint, submitted May 10, https://dx.doi.org/10.2139/ssrn.3841273.Google Scholar
Molinaro M, Ravi R (2013) The geometry of online packing linear programs. Math. Oper. Res. 39(1):46–59.Link, Google Scholar
Neely MJ, Yu H (2017) Online convex optimization with time-varying constraints. Preprint, submitted February 15, https://arxiv.org/abs/1702.04783.Google Scholar
Rangi A, Franceschetti M, Tran-Thanh L (2018) Unifying the stochastic and the adversarial bandits with knapsack. Preprint, submitted October 23, https://arxiv.org/abs/1811.12253.Google Scholar
Rawlings JB (2000) Tutorial overview of model predictive control. IEEE Control Systems Magazine 20(3):38–52.Crossref, Google Scholar
Reiman MI, Wang Q (2008) An asymptotically optimal policy for a quantity-based network revenue management problem. Math. Oper. Res. 33(2):257–282.Link, Google Scholar
Sun R, Wang X, Zhou Z (2020) Near-optimal primal-dual algorithms for quantity-based network revenue management. Preprint, submitted November 12, https://arxiv.org/abs/2011.06327.Google Scholar
Talluri K, Van Ryzin G (1998) An analysis of bid-price controls for network revenue management. Management Sci. 44(11-part-1):1577–1593.Link, Google Scholar
Talluri K, Van Ryzin G (2004) Revenue management under a general discrete choice model of consumer behavior. Management Sci. 50(1):15–33.Link, Google Scholar
Talluri KT, Van Ryzin GJ (2006) The Theory and Practice of Revenue Management, vol. 68 (Springer Science & Business Media, New York).Google Scholar
Vanderbei RJ (1998) Linear programming: foundations and extensions. J. Oper. Res. Soc. 49(1):94.Google Scholar
Vera A, Banerjee S (2021) The Bayesian prophet: A low-regret framework for online decision making. Management Sci. 67(3):1329–1992, iii–iv.Link, Google Scholar
Vera A, Banerjee S, Gurvich I (2019) Online allocation and pricing: Constant regret via bellman inequalities. Preprint, submitted June 14, https://arxiv.org/abs/1906.06361.Google Scholar
Villani C (2008) Optimal Transport: Old and New, vol. 338 (Springer Science & Business Media, Berlin, Heidelberg).Google Scholar
Wagener N, Cheng C-A, Sacks J, Boots B (2019) An online learning approach to model predictive control. Preprint, submitted February 24, https://arxiv.org/abs/1902.08967.Google Scholar
Yi X, Li X, Yang T, Xie L, Chai T, Johansson K (2021) Regret and cumulative constraint violation analysis for online convex optimization with long term constraints. Proc. Internat. Conf. Machine Learn. (PMLR, New York), 11998–12008.Google Scholar
Yuan J, Lamperski A (2018) Online convex optimization for cumulative constraints. Advances in Neural Information Processing Systems, vol. 31 (Curran Associates Inc., Red Hook, NY), 6137–6146.Google Scholar
Zhang D, Adelman D (2009) An approximate dynamic programming approach to network revenue management with customer choice. Transportation Sci. 43(3):381–394.Link, Google Scholar

Volume 71, Issue 11

November 2025

Pages 8995-9868, iv-vi

Article Information

Supplemental Material

Metrics

Information

Received:December 23, 2020
Accepted:February 12, 2024
Published Online:March 03, 2025

Cite as

Jiashuo Jiang, Xiaocheng Li, Jiawei Zhang (2025) Online Stochastic Optimization with Wasserstein-Based Nonstationarity. Management Science 71(11):9104-9122.

https://doi.org/10.1287/mnsc.2020.03850

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Online Stochastic Optimization with Wasserstein-Based Nonstationarity

References

Volume 71, Issue 11

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News