Phase Transitions in Bandits with Switching Constraints

Published Online:https://doi.org/10.1287/mnsc.2023.4755

References

  • Agrawal R, Hedge M, Teneketzis D (1988) Asymptotically efficient adaptive allocation rules for the multiarmed bandit problem with switching cost. IEEE Trans. Automatic Control 33(10):899–906.CrossrefGoogle Scholar
  • Agrawal R, Hegde M, Teneketzis D (1990) Multi-armed bandit problems with multiple plays and switching cost. Stochastics Stochastic Rep. 29(4):437–459.CrossrefGoogle Scholar
  • Agrawal S, Avadhanula V, Goyal V, Zeevi A (2019) MNL-bandit: A dynamic learning approach to assortment selection. Oper. Res. 67(5):1453–1485.LinkGoogle Scholar
  • Altschuler JM, Talwar K (2021) Online learning over a finite action set with limited switching. Math. Oper. Res. 46(1):179–203.LinkGoogle Scholar
  • Asawa M, Teneketzis D (1996) Multi-armed bandits with switching penalties. IEEE Trans. Automatic Control 41(3):328–348.CrossrefGoogle Scholar
  • Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2–3):235–256.CrossrefGoogle Scholar
  • Banks JS, Sundaram RK (1994) Switching costs and the Gittins index. Econometrica 62(3):687–694.CrossrefGoogle Scholar
  • Bayati M, Lelarge M, Montanari A (2015) Universality in polytope phase transitions and message passing algorithms. Ann. Appl. Probab. 25(2):753–822.CrossrefGoogle Scholar
  • Bergemann D, Välimäki J (2001) Stationary multi-choice bandit problems. J. Econom. Dynamic Control 25(10):1585–1594.CrossrefGoogle Scholar
  • Brezzi M, Lai TL (2002) Optimal learning and experimentation in bandit problems. J. Econom. Dynamic Control 27(1):87–108.CrossrefGoogle Scholar
  • Cesa-Bianchi N, Dekel O, Shamir O (2013) Online learning with switching costs and other adaptive adversaries. Adv. Neural Inform. Processing Systems 1:1160–1168.Google Scholar
  • Chen B, Chao X (2019) Parametric demand learning with limited price explorations in a backlog stochastic inventory system. IISE Trans. 51(6):605–613.CrossrefGoogle Scholar
  • Chen B, Chao X, Wang Y (2020) Data-based dynamic pricing and inventory control with censored demand and limited price changes. Oper. Res. 68(5):1445–1456.LinkGoogle Scholar
  • Cheung WC, Simchi-Levi D, Wang H (2017) Dynamic pricing and demand learning with limited price experimentation. Oper. Res. 65(6):1722–1731.LinkGoogle Scholar
  • Dekel O, Ding J, Koren T, Peres Y (2014) Bandits with switching costs: T 2/3 regret. Proc. 46th Annual ACM Sympos. Theory Comput., 459–467.Google Scholar
  • den Boer AV (2015) Dynamic pricing and learning: Historical origins, current research, and new directions. Surveys Oper. Res. Management Sci. 20(1):1–18.CrossrefGoogle Scholar
  • Domb C (2000) Phase Transitions and Critical Phenomena, vol. 1 (Elsevier, Amsterdam).Google Scholar
  • Dong K, Li Y, Zhang Q, Zhou Y (2020) Multinomial logit bandit with low switching cost. Internat. Conf. Machine Learn. (PMLR), 2607–2615.Google Scholar
  • Gao Z, Han Y, Ren Z, Zhou Z (2019) Batched multi-armed bandits problem. Adv. Neural Inform. Processing Systems 33:503–513.Google Scholar
  • Guha S, Munagala K (2009) Multi-armed bandits with metric switching costs. Internat. Colloquium Automata Languages Programming (Springer, Berlin), 496–507.CrossrefGoogle Scholar
  • Guha S, Munagala K (2013) Approximation algorithms for Bayesian multi-armed bandit problems. Preprint, submitted June 14, https://arxiv.org/abs/1306.3525.Google Scholar
  • Herbster M, Warmuth MK (1998) Tracking the best expert. Machine Learn. 32(2):151–178.CrossrefGoogle Scholar
  • Hu Y, Kallus N, Mao X (2020) Smooth contextual bandits: Bridging the parametric and non-differentiable regret regimes. Conf. Learn. Theory (PMLR), 2007–2010.Google Scholar
  • Jia S, Li A, Ravi R (2021) Markdown pricing under unknown demand. Preprint, submitted June 8, https://dx.doi.org/10.2139/ssrn.3861379.Google Scholar
  • Jun T (2004) A survey on the bandit problem with switching costs. Economist 152(4):513–541.CrossrefGoogle Scholar
  • Jun KS, Orabona F, Wright S, Willett R (2017) Online learning for changing environments using coin betting. Electronic J. Statist. 11(2):5282–5310.CrossrefGoogle Scholar
  • Kerr D (2015) Detest Uber’s surge pricing? Some drivers don’t like it either. Accessed July 11, 2023, https://www.cnet.com/tech/tech-industry/detest-ubers-surge-pricing-some-drivers-dont-like-it-either/.Google Scholar
  • Koren T, Livni R, Mansour Y (2017) Bandits with movement costs and adaptive pricing. Conf. Learn. Theory (PMLR), 1242–1268.Google Scholar
  • Lai TL, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.CrossrefGoogle Scholar
  • Lattimore T, Szepesvári C (2020) Bandit Algorithms (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Perchet V, Rigollet P, Chassang S, Snowberg E (2016) Batched bandit problems. Ann. Statist. 44(2):660–681.CrossrefGoogle Scholar
  • Portnoy S (1984) Asymptotic behavior of m-estimators of p regression parameters when p2/n is large. I. Consistency. Ann. Statist. 12(4):1298–1309.CrossrefGoogle Scholar
  • Portnoy S (1988) Asymptotic behavior of likelihood methods for exponential families when the number of parameters tends to infinity. Ann. Statist. 16(1):356–366.CrossrefGoogle Scholar
  • Scheiber N (2017) How Uber uses psychological tricks to push its drivers’ buttons. The New York Times Online (April 2), https://www.nytimes.com/interactive/2017/04/02/technology/uber-drivers-psychological-tricks.html.Google Scholar
  • Simchi-Levi D, Xu Y (2019) Phase transitions and cyclic phenomena in bandits with switching constraints. Adv. Neural Inform. Processing Systems 32:7521–7530.Google Scholar
  • Simchi-Levi D, Kaminsky P, Simchi-Levi E, Shankar R (2008) Designing and Managing the Supply Chain: Concepts, Strategies and Case Studies (Tata McGraw-Hill Education, New York).Google Scholar
  • Slivkins A (2019) Introduction to multi-armed bandits. Preprint, submitted April 15, https://arxiv.org/abs/1904.07272.Google Scholar
  • Tsybakov AB (2008) Introduction to Nonparametric Estimation (Springer Science & Business Media, Berlin).Google Scholar
  • Wainwright MJ (2009) Sharp thresholds for high-dimensional and noisy sparsity recovery using ℓ1-constrained quadratic programming (LASSO). IEEE Trans. Inform. Theory 55(5):2183–2202.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.