Phase Transitions in Bandits with Switching Constraints

David Simchi-Levi
David Simchi-Levi
[email protected]
https://orcid.org/0000-0002-4650-1519
Institute for Data, Systems, and Society, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139;Department of Civil and Environmental Engineering and Operations Research Center, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139;
Search for more papers by this author
,
Yunzong Xu
Corresponding Author
Yunzong Xu
[email protected]
https://orcid.org/0000-0002-1682-419X
Institute for Data, Systems, and Society, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139;Microsoft Research, New York, New York 10012;Department of Industrial and Enterprise Systems Engineering, University of Illinois, Urbana-Champaign, Illinois 61801
Search for more papers by this author

Institute for Data, Systems, and Society, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139;Department of Civil and Environmental Engineering and Operations Research Center, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139;

Search for more papers by this author

Yunzong Xu

Corresponding Author

Yunzong Xu

[email protected]

https://orcid.org/0000-0002-1682-419X

Institute for Data, Systems, and Society, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139;Microsoft Research, New York, New York 10012;Department of Industrial and Enterprise Systems Engineering, University of Illinois, Urbana-Champaign, Illinois 61801

Search for more papers by this author

Published Online:9 Aug 2023https://doi.org/10.1287/mnsc.2023.4755

References

Agrawal R, Hedge M, Teneketzis D (1988) Asymptotically efficient adaptive allocation rules for the multiarmed bandit problem with switching cost. IEEE Trans. Automatic Control 33(10):899–906.Crossref, Google Scholar
Agrawal R, Hegde M, Teneketzis D (1990) Multi-armed bandit problems with multiple plays and switching cost. Stochastics Stochastic Rep. 29(4):437–459.Crossref, Google Scholar
Agrawal S, Avadhanula V, Goyal V, Zeevi A (2019) MNL-bandit: A dynamic learning approach to assortment selection. Oper. Res. 67(5):1453–1485.Link, Google Scholar
Altschuler JM, Talwar K (2021) Online learning over a finite action set with limited switching. Math. Oper. Res. 46(1):179–203.Link, Google Scholar
Asawa M, Teneketzis D (1996) Multi-armed bandits with switching penalties. IEEE Trans. Automatic Control 41(3):328–348.Crossref, Google Scholar
Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2–3):235–256.Crossref, Google Scholar
Banks JS, Sundaram RK (1994) Switching costs and the Gittins index. Econometrica 62(3):687–694.Crossref, Google Scholar
Bayati M, Lelarge M, Montanari A (2015) Universality in polytope phase transitions and message passing algorithms. Ann. Appl. Probab. 25(2):753–822.Crossref, Google Scholar
Bergemann D, Välimäki J (2001) Stationary multi-choice bandit problems. J. Econom. Dynamic Control 25(10):1585–1594.Crossref, Google Scholar
Brezzi M, Lai TL (2002) Optimal learning and experimentation in bandit problems. J. Econom. Dynamic Control 27(1):87–108.Crossref, Google Scholar
Cesa-Bianchi N, Dekel O, Shamir O (2013) Online learning with switching costs and other adaptive adversaries. Adv. Neural Inform. Processing Systems 1:1160–1168.Google Scholar
Chen B, Chao X (2019) Parametric demand learning with limited price explorations in a backlog stochastic inventory system. IISE Trans. 51(6):605–613.Crossref, Google Scholar
Chen B, Chao X, Wang Y (2020) Data-based dynamic pricing and inventory control with censored demand and limited price changes. Oper. Res. 68(5):1445–1456.Link, Google Scholar
Cheung WC, Simchi-Levi D, Wang H (2017) Dynamic pricing and demand learning with limited price experimentation. Oper. Res. 65(6):1722–1731.Link, Google Scholar
Dekel O, Ding J, Koren T, Peres Y (2014) Bandits with switching costs: T 2/3 regret. Proc. 46th Annual ACM Sympos. Theory Comput., 459–467.Google Scholar
den Boer AV (2015) Dynamic pricing and learning: Historical origins, current research, and new directions. Surveys Oper. Res. Management Sci. 20(1):1–18.Crossref, Google Scholar
Domb C (2000) Phase Transitions and Critical Phenomena, vol. 1 (Elsevier, Amsterdam).Google Scholar
Dong K, Li Y, Zhang Q, Zhou Y (2020) Multinomial logit bandit with low switching cost. Internat. Conf. Machine Learn. (PMLR), 2607–2615.Google Scholar
Gao Z, Han Y, Ren Z, Zhou Z (2019) Batched multi-armed bandits problem. Adv. Neural Inform. Processing Systems 33:503–513.Google Scholar
Guha S, Munagala K (2009) Multi-armed bandits with metric switching costs. Internat. Colloquium Automata Languages Programming (Springer, Berlin), 496–507.Crossref, Google Scholar
Guha S, Munagala K (2013) Approximation algorithms for Bayesian multi-armed bandit problems. Preprint, submitted June 14, https://arxiv.org/abs/1306.3525.Google Scholar
Herbster M, Warmuth MK (1998) Tracking the best expert. Machine Learn. 32(2):151–178.Crossref, Google Scholar
Hu Y, Kallus N, Mao X (2020) Smooth contextual bandits: Bridging the parametric and non-differentiable regret regimes. Conf. Learn. Theory (PMLR), 2007–2010.Google Scholar
Jia S, Li A, Ravi R (2021) Markdown pricing under unknown demand. Preprint, submitted June 8, https://dx.doi.org/10.2139/ssrn.3861379.Google Scholar
Jun T (2004) A survey on the bandit problem with switching costs. Economist 152(4):513–541.Crossref, Google Scholar
Jun KS, Orabona F, Wright S, Willett R (2017) Online learning for changing environments using coin betting. Electronic J. Statist. 11(2):5282–5310.Crossref, Google Scholar
Kerr D (2015) Detest Uber’s surge pricing? Some drivers don’t like it either. Accessed July 11, 2023, https://www.cnet.com/tech/tech-industry/detest-ubers-surge-pricing-some-drivers-dont-like-it-either/.Google Scholar
Koren T, Livni R, Mansour Y (2017) Bandits with movement costs and adaptive pricing. Conf. Learn. Theory (PMLR), 1242–1268.Google Scholar
Lai TL, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.Crossref, Google Scholar
Lattimore T, Szepesvári C (2020) Bandit Algorithms (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Perchet V, Rigollet P, Chassang S, Snowberg E (2016) Batched bandit problems. Ann. Statist. 44(2):660–681.Crossref, Google Scholar
Portnoy S (1984) Asymptotic behavior of m-estimators of p regression parameters when p2/n is large. I. Consistency. Ann. Statist. 12(4):1298–1309.Crossref, Google Scholar
Portnoy S (1988) Asymptotic behavior of likelihood methods for exponential families when the number of parameters tends to infinity. Ann. Statist. 16(1):356–366.Crossref, Google Scholar
Scheiber N (2017) How Uber uses psychological tricks to push its drivers’ buttons. The New York Times Online (April 2), https://www.nytimes.com/interactive/2017/04/02/technology/uber-drivers-psychological-tricks.html.Google Scholar
Simchi-Levi D, Xu Y (2019) Phase transitions and cyclic phenomena in bandits with switching constraints. Adv. Neural Inform. Processing Systems 32:7521–7530.Google Scholar
Simchi-Levi D, Kaminsky P, Simchi-Levi E, Shankar R (2008) Designing and Managing the Supply Chain: Concepts, Strategies and Case Studies (Tata McGraw-Hill Education, New York).Google Scholar
Slivkins A (2019) Introduction to multi-armed bandits. Preprint, submitted April 15, https://arxiv.org/abs/1904.07272.Google Scholar
Tsybakov AB (2008) Introduction to Nonparametric Estimation (Springer Science & Business Media, Berlin).Google Scholar
Wainwright MJ (2009) Sharp thresholds for high-dimensional and noisy sparsity recovery using ℓ1-constrained quadratic programming (LASSO). IEEE Trans. Inform. Theory 55(5):2183–2202.Crossref, Google Scholar

Volume 69, Issue 12

December 2023

Pages 7151-7882, iii-iv

Article Information

Supplemental Material

Metrics

Information

Received:September 09, 2019
Accepted:July 17, 2022
Published Online:August 09, 2023

Cite as

David Simchi-Levi, Yunzong Xu (2023) Phase Transitions in Bandits with Switching Constraints. Management Science 69(12):7182-7201.

https://doi.org/10.1287/mnsc.2023.4755

Keywords

Acknowledgments

The authors thank the review team for constructive comments and suggestions, which helped to significantly improve both the content and exposition of this paper. The authors also thank the Massachusetts Institute of Technology (MIT)-IBM partnership in artificial intelligence and the MIT Data Science Laboratory for support. A preliminary version of this paper appeared in the 33rd Conference on Neural Information Processing Systems (Simchi-Levi and Xu 2019), and the current paper is a significantly enhanced version of it.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Phase Transitions in Bandits with Switching Constraints

References

Volume 69, Issue 12

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News