Nonstationary Bandits with Habituation and Recovery Dynamics

Published Online:https://doi.org/10.1287/opre.2019.1918

References

  • Afèche P, Ata B (2013) Bayesian dynamic pricing in queueing systems with unknown delay cost characteristics. Manufacturing Service Oper. Management 15(2):292–304.LinkGoogle Scholar
  • Agrawal S, Goyal N (2013) Thompson sampling for contextual bandits with linear payoffs. Dasgupta S, McAllester D, eds. Proc. 30th Internat. Conf. Machine Learn., vol. 28 (JMLR), III-1220–III-1228.Google Scholar
  • Agrawal S, Zizhuo Y, Ye Y (2014) A dynamic near-optimal algorithm for online linear programming. Oper. Res. 62(4):876–890.LinkGoogle Scholar
  • Anantharam V, Varaiya P, Walrand J (1987) Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays—Part II: Markovian rewards. IEEE Trans. Automatic Control 32(11):977–982.CrossrefGoogle Scholar
  • Aswani A (2019) Statistics with set-valued functions: Applications to inverse approximate optimization. Math. Programming 174(1–2):225–251.Google Scholar
  • Aswani A, Shen Z-J, Siddiq A (2018) Inverse optimization with noisy data. Oper. Res. 66(3):870–892.Google Scholar
  • Aswani A, Kaminsky P, Mintz Y, Flowers E, Fukuoka Y (2019) Behavioral modeling in weight loss interventions. Eur. J. Oper. Res. 272(3):1058–1072.Google Scholar
  • Auer P, Cesa-Bianchi N, Fischer P (2002a) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2–3):235–256.Google Scholar
  • Auer P, Cesa-Bianchi N, Freund Y, Schapire R (2002b) The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1):48–77.CrossrefGoogle Scholar
  • Ayer T, Alagoz O, Stout NK (2012) A POMDP approach to personalize mammography screening decisions. Oper. Res. 60(5):1019–1034.LinkGoogle Scholar
  • Ayer T, Alagoz O, Stout N, Burnside E (2015) Heterogeneity in women’s adherence and its role on optimal breast cancer screening policies. Management Sci. 62(5):1339–1362.Google Scholar
  • Ban G-Y (2019) The data-driven (s, S) policy: Why you can have confidence in censored demand data. Oper. Res. Forthcoming.Google Scholar
  • Ban G, Rudin C (2019) The big data newsvendor: Practical insights from machine learning. Oper. Res. 67(1):90–108.LinkGoogle Scholar
  • Bartlett P, Mendelson S (2002) Rademacher and Gaussian complexities: Risk bounds and structural results. J. Machine Learn. Res. 3(November):463–482.Google Scholar
  • Bastani H, Bayati M (2015) Online decision-making with high-dimensional covariates. Working paper, University of Pennsylvania, Philadelphia.Google Scholar
  • Bertsekas DP, Bertsekas DP, Bertsekas DP, Bertsekas DP (2005) Dynamic Programming and Optimal Control, vol. 1 (Athena Scientific, Belmont, MA).Google Scholar
  • Bertsimas D, Kallus N (2020) From predictive to prescriptive analytics. Management Sci. 66(3):1025–1044.Google Scholar
  • Bertsimas D, Mersereau A (2007) A learning approach for interactive marketing to a customer segment. Oper. Res. 55(6):1120–1135.LinkGoogle Scholar
  • Bertsimas D, Niño-Mora J (2000) Restless bandits, linear programming relaxations, and a primal-dual index heuristic. Oper. Res. 48(1):80–90.LinkGoogle Scholar
  • Besbes O, Gur Y, Zeevi A (2014) Optimal exploration-exploitation in a multi-armed-bandit problem with non-stationary rewards. Working paper, Columbia University, New York. Preprint, submitted May 13, https://arxiv.org/abs/1405.3316.Google Scholar
  • Besbes O, Gur Y, Zeevi A (2015) Non-stationary stochastic optimization. Oper. Res. 63(5):1227–1244.LinkGoogle Scholar
  • Bickel P, Doksum K (2006) Mathematical Statistics: Basic Ideas and Selected Topics, vol. 1, 2nd ed. (Pearson Prentice Hall, Upper Saddle River, NJ).Google Scholar
  • Bouneffouf D, Féraud R (2016) Multi-armed bandit problem with known trend. Neurocomputing 205(September):16–21.CrossrefGoogle Scholar
  • Caro F, Gallien J (2007) Dynamic assortment with demand learning for seasonal consumer goods. Management Sci. 53(2):276–292.LinkGoogle Scholar
  • Çelik M, Ergun Ö, Keskinocak P (2015) The post-disaster debris clearance problem under incomplete information. Oper. Res. 63(1):65–85.LinkGoogle Scholar
  • Chang H, Fu M, Hu J, Marcus S (2005) An adaptive sampling algorithm for solving markov decision processes. Oper. Res. 53(1):126–139.LinkGoogle Scholar
  • Chen Q, Ayer T, Chhatwal J (2018) Optimal M-switch surveillance policies for liver cancer in a hepatitis C–infected population. Oper. Res. 66(3):673–696.LinkGoogle Scholar
  • Chen W, Wang Y, Yuan Y (2013) Combinatorial multi-armed bandit: General framework, results and applications. Dasgupta S, McAllester D, eds. Proc. 30th Internat. Conf. Machine Learn. (PMLR), 151–159.Google Scholar
  • Daniely A, Gonen A, Shalev-Shwartz S (2015) Strongly adaptive online learning. Bach F, Blei D, eds. Proc. 32nd Internat. Conf. Machine Learn. (PMLR), 1405–1411.Google Scholar
  • Deng T, Shen ZJM, Shanthikumar JG (2014) Statistical learning of service-dependent demand in a multiperiod newsvendor setting. Oper. Res. 62(5):1064–1076.LinkGoogle Scholar
  • Deo S, Jiang T, Iravani S, Smilowitz K, Samuelson S (2013) Improving health outcomes through better capacity allocation in a community-based chronic care model. Oper. Res. 61(6):1277–1294.LinkGoogle Scholar
  • Diabetes Prevention Program Research Group (2002) Reduction in the incidence of type 2 diabetes with lifestyle intervention or metformin. New England J. Medicine 346(6):393–403.CrossrefGoogle Scholar
  • Diabetes Prevention Program Research Group (2009) 10-year follow-up of diabetes incidence and weight loss in the diabetes prevention program outcomes study. Lancet 374(9720):1677–1686.CrossrefGoogle Scholar
  • Eagle JN (1984) The optimal search for a moving target when the search path is constrained. Oper. Res. 32(5):1107–1115.LinkGoogle Scholar
  • Filippi S, Cappe O, Garivier A, Szepesvári C (2010) Parametric bandits: The generalized linear case. Lafferty JD, Williams CKI, Shawe-Taylor J, Zemel RS, Culotta A, eds. Advances in Neural Information Processing Systems, vol. 23 (Curran Associates, Red Hook, NY), 586–594.Google Scholar
  • Frazier P, Wang J (2016) Bayesian optimization for materials design. Lookman T, Alexander FJ, Rajan K, eds. Information Science for Materials Discovery and Design (Springer, Cham, Switzerland), 45–75.CrossrefGoogle Scholar
  • Friedenreich C, Neilson H, Lynch B (2010) State of the epidemiological evidence on physical activity and cancer prevention. Eur. J. Cancer 46(14):2593–2604.CrossrefGoogle Scholar
  • Fujiwara M (1916) Über die obere schranke des absoluten betrages der wurzeln einer algebraischen gleichung. Tohoku Math. J. First Ser. 10(1916):167–171.Google Scholar
  • Fukuoka Y, Vittinghoff E, Hooper J (2018) A weight loss intervention using a commercial mobile application in Latino Americans-Adelgaza Trial. Translational Behavioral Medicine 8(5):714–723.CrossrefGoogle Scholar
  • Fukuoka Y, Gay C, Joiner K, Vittinghoff E (2015) A novel diabetes prevention intervention using a mobile app: A randomized controlled trial with overweight adults at risk. Amer. J. Preventive Medicine 49(2):223–237.CrossrefGoogle Scholar
  • Fukuoka Y, Haskell W, Lin F, Vittinghoff E (2019) Short- and Long-term effects of a mobile phone app in conjunction with brief in-person counseling on physical activity among physically inactive women: The mPED Randomized Clinical Trial. JAMA Network Open 2(5):e194281.CrossrefGoogle Scholar
  • Garivier A, Cappé O (2011) The KL-UCB algorithm for bounded stochastic bandits and beyond. Proc. 25th Conf. Learn. Theory (JMLR), 359–376.Google Scholar
  • Garivier A, Moulines E (2008) On upper-confidence bound policies for non-stationary bandit problems. Working paper, Télécom Paris, Paris. Preprint, submitted May 22, https://arxiv.org/abs/0805.3415.Google Scholar
  • Ghose A, Yang S (2009) An empirical analysis of search engine advertising: Sponsored search in electronic markets. Management Sci. 55(10):1605–1622.LinkGoogle Scholar
  • Gittins J (1979) Bandit processes and dynamic allocation indices. J. Roy. Statist. Soc. Ser. B (Methodological) 41(2):148–177.Google Scholar
  • Goldfarb A, Tucker C (2011) Online display advertising: Targeting and obtrusiveness. Marketing Sci. 30(3):389–404.LinkGoogle Scholar
  • Goldfarb A, Tucker C (2014) Standardization and the effectiveness of online advertising. Management Sci. 61(11):2707–2719.LinkGoogle Scholar
  • Guha S, Munagala K, Shi P (2010) Approximation algorithms for restless bandit problems. J. ACM 58(1):Article 3.CrossrefGoogle Scholar
  • Gupta D, Wang L (2008) Revenue management for a primary-care clinic in the presence of patient choice. Oper. Res. 56(3):576–592.LinkGoogle Scholar
  • Gurobi Optimization (2016) Gurobi optimizer reference manual, Version 7.0. Accessed June 1, 2017, http://www.gurobi.com.Google Scholar
  • Hartland C, Gelly S, Baskiotis N, Teytaud O, Sebag M (2006) Multi-armed bandit, dynamic environments and meta-bandits. Working paper, Université Paris-Sud, Orsay, France.Google Scholar
  • Hero A, Usman M, Sauve A, Fessler J (1997) Recursive algorithms for computing the Cramer-Rao bound. IEEE Trans. Signal Processing 45(3):803–807.CrossrefGoogle Scholar
  • Johari R, Pekelis L, Walsh D (2015) Always valid inference: Bringing sequential analysis to A/B testing. Working paper, Stanford University, Stanford, CA. Preprint, submitted December 15, https://arxiv.org/abs/1512.04922.Google Scholar
  • Kim M, Lim A (2015) Robust multiarmed bandit problems. Management Sci. 62(1):264–285.Google Scholar
  • Kleinberg R, Slivkins A, Upfal E (2008) Multi-armed bandits in metric spaces. Proc. 40th Annual ACM Sympos. Theory Comput. (ACM, New York), 681–690.Google Scholar
  • Kontorovich A (2014) Concentration in unbounded metric spaces and algorithmic stability. Xing EP, Jebara T, eds. Proc. 31st Internat. Conf. Machine Learn. (ICML), 28–36.Google Scholar
  • Koolen W, Malek A, Bartlett P (2014) Efficient minimax strategies for square loss games. Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 27 (Curran Associates, Red Hook, NY), 3230–3238.Google Scholar
  • Koolen W, Malek A, Bartlett P, Abbasi Y (2015) Minimax time series prediction. Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 28 (Curran Associates, Red Hook, NY), 2557–2565.Google Scholar
  • Kucukyazici B, Verter V, Mayo N (2011) An analytical framework for designing community-based care for chronic diseases. Production Oper. Management 20(3):474–488.CrossrefGoogle Scholar
  • Laffont J-J, Martimort D (2002) The Theory of Incentives: The Principal-Agent Model (Princeton University Press, Princeton, NJ).CrossrefGoogle Scholar
  • Lai T, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.CrossrefGoogle Scholar
  • Leff H, Dada M, Graves S (1986) An LP planning model for a mental health community support system. Management Sci. 32(2):139–155.LinkGoogle Scholar
  • Levine N, Crammer K, Mannor S (2017) Rotting bandits. Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Proc. Neural Inform. Processing Systems 2017 (Neural Information Processing Systems, San Diego).Google Scholar
  • Lewis B, Napolitano M, Buman M, Williams D, Nigg C (2017) Future directions in physical activity intervention research: Expanding our focus to sedentary behaviors, technology, and dissemination. J. Behavioral Medicine 40(1):112–126.CrossrefGoogle Scholar
  • Liu K, Zhao Q (2010a) Distributed learning in multi-armed bandit with multiple players. IEEE Trans. Signal Processing 58(11):5667–5681.CrossrefGoogle Scholar
  • Liu K, Zhao Q (2010b) Indexability of restless bandit problems and optimality of Whittle index for dynamic multichannel access. IEEE Trans. Inform. Theory 56(11):5547–5567.CrossrefGoogle Scholar
  • Mason J, Denton B, Smith S, Shah N (2018) Using electronic health records to monitor and improve adherence to medication. IISE Trans. Healthcare Systems Engrg. 7(4):194–214.Google Scholar
  • McCullagh P (1984) Generalized linear models. Eur. J. Oper. Res. 16(3):285–292.CrossrefGoogle Scholar
  • Mintz Y, Aswani A, Kaminsky P, Flowers E, Fukuoka Y (2017) Behavioral analytics for myopic agents. Working paper, University of California, Berkeley, Berkeley. Preprint, submitted February 17, https://arxiv.org/abs/1702.05496.Google Scholar
  • Monahan GE (1982) A survey of partially observable Markov decision processes: Theory, models, and algorithms. Management Sci. 28(1):1–16.LinkGoogle Scholar
  • Papadimitriou C, Tsitsiklis J (1999) The complexity of optimal queuing network control. Math. Oper. Res. 24(2):293–305.LinkGoogle Scholar
  • Portnoy F, Marchionini G (2010) Modeling the effect of habituation on banner blindness as a function of repetition and search type: Gap analysis for future work. CHI’10 Extended Abstracts Human Factors Comput. Systems (ACM, New York), 4297–4302.CrossrefGoogle Scholar
  • PwC Health Research Institute (2014) Health wearables: Early days. Report, PricewaterhouseCoopers, New York.Google Scholar
  • Qi A, Ahn HS, Sinha A (2017) Capacity investment with demand learning. Oper. Res. 65(1):145–164.LinkGoogle Scholar
  • Qu X, Keener R (2011) Theoretical Statistics: Topics for a Core Course (Springer, New York).Google Scholar
  • Radner R (1985) Repeated principal-agent games with discounting. Econometrica 53(5):1173–1198.CrossrefGoogle Scholar
  • Richter W (2017) Online ad revenues are surging, but 2 companies are getting most of the spoils. Bus. Insider (April 27), http://www.businessinsider.com/online-ads-revenues-going-to-google-and-facebook-the-most-2017-4.Google Scholar
  • Rockafellar RT, Wets RJB (2009) Variational Analysis, Grundlehren der mathematischen Wissenschaften, vol. 317 (Springer Science & Business Media, New York).Google Scholar
  • Rosenbaum L (2016) Should you really take 10,000 steps a day? Fitbit (blog), Accessed March 1 2019, https://blog.fitbit.com/should-you-really-take-10000-steps-a-day/.Google Scholar
  • Russo D, Roy BV (2014) Learning to optimize via posterior sampling. Math. Oper. Res. 39(4):1221–1243.LinkGoogle Scholar
  • Russo D, Roy BV (2016) An information-theoretic analysis of Thompson sampling. J. Machine Learn. Res. 17(68):1–30.Google Scholar
  • Ryzhov I, Powell W (2011) Information collection on a graph. Oper. Res. 59(1):188–201.LinkGoogle Scholar
  • Ryzhov I, Powell W, Frazier P (2012) The knowledge gradient algorithm for a general class of online learning problems. Oper. Res. 60(1):180–195.LinkGoogle Scholar
  • Sattelmair J, Pertman J, Ding E, Kohl H, Haskell W, Lee I (2011) Dose response between physical activity and risk of coronary heart disease. Circulation 124(7):789–795.Google Scholar
  • Savelsbergh M, Smilowitz K (2016) Stratified patient appointment scheduling for mobile community-based chronic disease management programs. IIE Trans. Healthcare Systems Engrg. 6(2):65–78.CrossrefGoogle Scholar
  • Schell G, Marrero W, Lavieri M, Sussman J, Hayward R (2016) Data-driven Markov decision process approximations for personalized hypertension treatment planning. MDM Policy Practice 1(1):2381468316674214.CrossrefGoogle Scholar
  • Shani G, Pineau J, Kaplow R (2013) A survey of point-based POMDP solvers. Autonomous Agents Multi-Agent Systems 27(1):1–51.CrossrefGoogle Scholar
  • Shapiro A (1993) Asymptotic behavior of optimal solutions in stochastic programming. Math. Oper. Res. 18(4):829–845.LinkGoogle Scholar
  • Smith T, Simmons R (2012) Point-based POMDP algorithms: Improved analysis and implementation. Working paper, Carnegie Mellon University, Pittsburgh. Preprint, submitted July 4, https://arxiv.org/abs/1207.1412.Google Scholar
  • Stackelberg H (1952) The Theory of Market Economy (Oxford University Press, Oxford, UK).Google Scholar
  • Thompson W (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4):285–294.CrossrefGoogle Scholar
  • U.S. Department of Health and Human Services (2018) Physical activity guidelines for Americans, 2nd edition. Report, U.S. Department of Health and Human Services, Washington, DC.Google Scholar
  • Wainwright J (2019) High-Dimensional Statistics: A Non-Asymptotic Viewpoint (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Wang WY, Gupta D (2011) Adaptive appointment systems with patient preferences. Manufacturing Service Oper. Management 13(3):373–389.LinkGoogle Scholar
  • Wang J, Lee CG (2015) Multistate Bayesian control chart over a finite horizon. Oper. Res. 63(4):949–964.LinkGoogle Scholar
  • Whittle P (1988) Restless bandits: Activity allocation in a changing world. J. Appl. Probab. 25(A):287–298.Google Scholar
  • Xie J, Frazier P (2013) Sequential Bayes-optimal policies for multiple comparisons with a known standard. Oper. Res. 61(5):1174–1189.LinkGoogle Scholar
  • Xie J, Frazier P, Chick S (2016) Bayesian optimization via simulation with pairwise sampling and correlated prior beliefs. Oper. Res. 64(2):542–559.LinkGoogle Scholar
  • Yi-jun L, Li-gang C, Wen-guo A (2010) Exploitation vs. exploration: Choosing keywords for search-based advertising services. Internat. Conf. Management Sci. Engrg. 17th Annual Conf. Proc. (IEEE, Piscataway, NJ), 3–8.Google Scholar
  • Zhang L, Yang T, Jin R, Zhou ZH (2017) Strongly adaptive regret implies optimally dynamic regret. Working paper, Nanjing University, Nanjing, China. Preprint, submitted January 26, https://arxiv.org/abs/1701.07570.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.