Last-Iterate Convergence in No-Regret Learning: Games with Reference Effects Under Logit Demand

Mengzi Amy Guo
Mengzi Amy Guo
[email protected]
https://orcid.org/0000-0001-8869-4673
Department of Industrial Engineering and Operations Research, University of California, Berkeley, Berkeley, California 94720
Search for more papers by this author
,
Donghao Ying
Donghao Ying
[email protected]
https://orcid.org/0009-0001-7329-5917
Department of Industrial Engineering and Operations Research, University of California, Berkeley, Berkeley, California 94720
Search for more papers by this author
,
Javad Lavaei
Javad Lavaei
[email protected]
https://orcid.org/0000-0003-4294-1338
Department of Industrial Engineering and Operations Research, University of California, Berkeley, Berkeley, California 94720
Search for more papers by this author
,
Zuo-Jun Max Shen
Corresponding Author
Zuo-Jun Max Shen
[email protected]
https://orcid.org/0000-0003-4538-8312
Department of Industrial Engineering and Operations Research, University of California, Berkeley, Berkeley, California 94720; and Faculty of Engineering and Faculty of Business and Economics, University of Hong Kong, Hong Kong, China
Search for more papers by this author

Department of Industrial Engineering and Operations Research, University of California, Berkeley, Berkeley, California 94720

Search for more papers by this author

Donghao Ying

[email protected]

https://orcid.org/0009-0001-7329-5917

Department of Industrial Engineering and Operations Research, University of California, Berkeley, Berkeley, California 94720

Search for more papers by this author

Javad Lavaei

[email protected]

https://orcid.org/0000-0003-4294-1338

Department of Industrial Engineering and Operations Research, University of California, Berkeley, Berkeley, California 94720

Search for more papers by this author

Zuo-Jun Max Shen

Corresponding Author

Zuo-Jun Max Shen

[email protected]

https://orcid.org/0000-0003-4538-8312

Department of Industrial Engineering and Operations Research, University of California, Berkeley, Berkeley, California 94720; and Faculty of Engineering and Faculty of Business and Economics, University of Hong Kong, Hong Kong, China

Search for more papers by this author

Published Online:21 May 2025https://doi.org/10.1287/mnsc.2023.03464

Abstract

This work examines the behaviors of the online projected gradient ascent (OPGA) algorithm and its variant in a repeated oligopoly price competition under reference effects. In particular, we consider that multiple firms engage in a multiperiod price competition, where consecutive periods are linked by the reference price update and each firm has access only to its own first-order feedback. Consumers assess their willingness to pay by comparing the current price against the memory-based reference price, and their choices follow the multinomial logit (MNL) model. We use the notion of stationary Nash equilibrium (SNE), defined as the fixed point of the equilibrium pricing policy, to simultaneously capture the long-run equilibrium and stability. We first study the loss-neutral reference effects and show that if the firms employ the OPGA algorithm—adjusting the price using the first-order derivatives of their log-revenues—the price and reference price paths attain last-iterate convergence to the unique SNE, thereby guaranteeing the no-regret learning and market stability. Moreover, with appropriate step-sizes, we prove that this algorithm exhibits a convergence rate of $\tilde{O} (1 / t^{2})$ in terms of the squared distance and achieves a constant dynamic regret. Despite the simplicity of the algorithm, its convergence analysis is challenging due to the model lacking typical properties such as strong monotonicity and variational stability that are ordinarily used for the convergence analysis of online games. The inherent asymmetry nature of reference effects motivates the exploration beyond loss-neutrality. When loss-averse reference effects are introduced, we propose a variant of the original algorithm named the conservative-OPGA (C-OPGA) to handle the nonsmooth revenue functions and show that the price and reference price achieve last-iterate convergence to the set of SNEs with the rate of $O (1 / \sqrt{t})$ . Finally, we demonstrate the practicality and robustness of OPGA and C-OPGA by theoretically showing that these algorithms can also adapt to firm-differentiated step-sizes and inexact gradients.

This paper was accepted by Chung Piaw Teo, optimization and decision analytics.

Funding: J. Lavaei acknowledges the support from the U.S. Army Research Laboratory and the U.S. Army Research Office under Grant W911NF2010219, Office of Naval Research under Grant N000142412673, AFOSR, NSF, and the UC Noyce Initiative.

Supplemental Material: The online appendix and data files are available at https://doi.org/10.1287/mnsc.2023.03464.

Volume 72, Issue 2

February 2026

Pages 783-1726, iv-vi

Article Information

Supplemental Material

Metrics

Information

Received:October 27, 2023
Accepted:February 14, 2025
Published Online:May 21, 2025

Cite as

Mengzi Amy Guo, Donghao Ying, Javad Lavaei, Zuo-Jun Max Shen (2025) Last-Iterate Convergence in No-Regret Learning: Games with Reference Effects Under Logit Demand. Management Science 72(2):1007-1024.

https://doi.org/10.1287/mnsc.2023.03464

Keywords

Acknowledgments

The authors thank Editor Chung Piaw Teo, the Associate Editor, and the three anonymous reviewers for their constructive feedback. The authors also want to thank the anonymous reviewers of NeurIPS 2023 for their valuable comments. The authors note that the algorithm and its convergence for the case of loss-neutral reference effects in duopoly competition (i.e., Theorems 1 and 2) were proposed in their conference proceeding paper (Guo et al. 2023). In this work, however, the authors not only improve the convergence rate and regret in the loss-neutral scenario but also extend all results to the oligopoly competition with potentially asymmetric reference effects.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Last-Iterate Convergence in No-Regret Learning: Games with Reference Effects Under Logit Demand

Abstract

Volume 72, Issue 2

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News