Nonstationary Bandits with Habituation and Recovery Dynamics

Yonatan Mintz
Corresponding Author
Yonatan Mintz
[email protected]
https://orcid.org/0000-0002-0670-1794
School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332;
Search for more papers by this author
,
Anil Aswani
Corresponding Author
Anil Aswani
[email protected]
Department of Industrial Engineering and Operations Research, University of California, Berkeley, Berkeley, California 94720;
Search for more papers by this author
,
Philip Kaminsky
Corresponding Author
Philip Kaminsky
[email protected]
https://orcid.org/0000-0002-3079-0299
Department of Industrial Engineering and Operations Research, University of California, Berkeley, Berkeley, California 94720;
Search for more papers by this author
,
Elena Flowers
Corresponding Author
Elena Flowers
[email protected]
Department of Physiological Nursing, School of Nursing, University of California, San Francisco, San Francisco, California 94143;
Search for more papers by this author
,
Yoshimi Fukuoka
Corresponding Author
Yoshimi Fukuoka
[email protected]
Department of Physiological Nursing & Institute for Health & Aging, School of Nursing, University of California, San Francisco, San Francisco, California 94143
Search for more papers by this author

Yonatan Mintz

Corresponding Author

Yonatan Mintz

[email protected]

https://orcid.org/0000-0002-0670-1794

School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332;

Search for more papers by this author

Anil Aswani

Corresponding Author

Anil Aswani

[email protected]

Department of Industrial Engineering and Operations Research, University of California, Berkeley, Berkeley, California 94720;

Search for more papers by this author

Philip Kaminsky

Corresponding Author

Philip Kaminsky

[email protected]

https://orcid.org/0000-0002-3079-0299

Department of Industrial Engineering and Operations Research, University of California, Berkeley, Berkeley, California 94720;

Search for more papers by this author

Elena Flowers

Corresponding Author

Elena Flowers

[email protected]

Department of Physiological Nursing, School of Nursing, University of California, San Francisco, San Francisco, California 94143;

Search for more papers by this author

Yoshimi Fukuoka

Corresponding Author

Yoshimi Fukuoka

[email protected]

Department of Physiological Nursing & Institute for Health & Aging, School of Nursing, University of California, San Francisco, San Francisco, California 94143

Search for more papers by this author

Published Online:9 Jul 2020https://doi.org/10.1287/opre.2019.1918

References

Afèche P, Ata B (2013) Bayesian dynamic pricing in queueing systems with unknown delay cost characteristics. Manufacturing Service Oper. Management 15(2):292–304.Link, Google Scholar
Agrawal S, Goyal N (2013) Thompson sampling for contextual bandits with linear payoffs. Dasgupta S, McAllester D, eds. Proc. 30th Internat. Conf. Machine Learn., vol. 28 (JMLR), III-1220–III-1228.Google Scholar
Agrawal S, Zizhuo Y, Ye Y (2014) A dynamic near-optimal algorithm for online linear programming. Oper. Res. 62(4):876–890.Link, Google Scholar
Anantharam V, Varaiya P, Walrand J (1987) Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays—Part II: Markovian rewards. IEEE Trans. Automatic Control 32(11):977–982.Crossref, Google Scholar
Aswani A (2019) Statistics with set-valued functions: Applications to inverse approximate optimization. Math. Programming 174(1–2):225–251.Google Scholar
Aswani A, Shen Z-J, Siddiq A (2018) Inverse optimization with noisy data. Oper. Res. 66(3):870–892.Google Scholar
Aswani A, Kaminsky P, Mintz Y, Flowers E, Fukuoka Y (2019) Behavioral modeling in weight loss interventions. Eur. J. Oper. Res. 272(3):1058–1072.Google Scholar
Auer P, Cesa-Bianchi N, Fischer P (2002a) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2–3):235–256.Google Scholar
Auer P, Cesa-Bianchi N, Freund Y, Schapire R (2002b) The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1):48–77.Crossref, Google Scholar
Ayer T, Alagoz O, Stout NK (2012) A POMDP approach to personalize mammography screening decisions. Oper. Res. 60(5):1019–1034.Link, Google Scholar
Ayer T, Alagoz O, Stout N, Burnside E (2015) Heterogeneity in women’s adherence and its role on optimal breast cancer screening policies. Management Sci. 62(5):1339–1362.Google Scholar
Ban G-Y (2019) The data-driven (s, S) policy: Why you can have confidence in censored demand data. Oper. Res. Forthcoming.Google Scholar
Ban G, Rudin C (2019) The big data newsvendor: Practical insights from machine learning. Oper. Res. 67(1):90–108.Link, Google Scholar
Bartlett P, Mendelson S (2002) Rademacher and Gaussian complexities: Risk bounds and structural results. J. Machine Learn. Res. 3(November):463–482.Google Scholar
Bastani H, Bayati M (2015) Online decision-making with high-dimensional covariates. Working paper, University of Pennsylvania, Philadelphia.Google Scholar
Bertsekas DP, Bertsekas DP, Bertsekas DP, Bertsekas DP (2005) Dynamic Programming and Optimal Control, vol. 1 (Athena Scientific, Belmont, MA).Google Scholar
Bertsimas D, Kallus N (2020) From predictive to prescriptive analytics. Management Sci. 66(3):1025–1044.Google Scholar
Bertsimas D, Mersereau A (2007) A learning approach for interactive marketing to a customer segment. Oper. Res. 55(6):1120–1135.Link, Google Scholar
Bertsimas D, Niño-Mora J (2000) Restless bandits, linear programming relaxations, and a primal-dual index heuristic. Oper. Res. 48(1):80–90.Link, Google Scholar
Besbes O, Gur Y, Zeevi A (2014) Optimal exploration-exploitation in a multi-armed-bandit problem with non-stationary rewards. Working paper, Columbia University, New York. Preprint, submitted May 13, https://arxiv.org/abs/1405.3316.Google Scholar
Besbes O, Gur Y, Zeevi A (2015) Non-stationary stochastic optimization. Oper. Res. 63(5):1227–1244.Link, Google Scholar
Bickel P, Doksum K (2006) Mathematical Statistics: Basic Ideas and Selected Topics, vol. 1, 2nd ed. (Pearson Prentice Hall, Upper Saddle River, NJ).Google Scholar
Bouneffouf D, Féraud R (2016) Multi-armed bandit problem with known trend. Neurocomputing 205(September):16–21.Crossref, Google Scholar
Caro F, Gallien J (2007) Dynamic assortment with demand learning for seasonal consumer goods. Management Sci. 53(2):276–292.Link, Google Scholar
Çelik M, Ergun Ö, Keskinocak P (2015) The post-disaster debris clearance problem under incomplete information. Oper. Res. 63(1):65–85.Link, Google Scholar
Chang H, Fu M, Hu J, Marcus S (2005) An adaptive sampling algorithm for solving markov decision processes. Oper. Res. 53(1):126–139.Link, Google Scholar
Chen Q, Ayer T, Chhatwal J (2018) Optimal M-switch surveillance policies for liver cancer in a hepatitis C–infected population. Oper. Res. 66(3):673–696.Link, Google Scholar
Chen W, Wang Y, Yuan Y (2013) Combinatorial multi-armed bandit: General framework, results and applications. Dasgupta S, McAllester D, eds. Proc. 30th Internat. Conf. Machine Learn. (PMLR), 151–159.Google Scholar
Daniely A, Gonen A, Shalev-Shwartz S (2015) Strongly adaptive online learning. Bach F, Blei D, eds. Proc. 32nd Internat. Conf. Machine Learn. (PMLR), 1405–1411.Google Scholar
Deng T, Shen ZJM, Shanthikumar JG (2014) Statistical learning of service-dependent demand in a multiperiod newsvendor setting. Oper. Res. 62(5):1064–1076.Link, Google Scholar
Deo S, Jiang T, Iravani S, Smilowitz K, Samuelson S (2013) Improving health outcomes through better capacity allocation in a community-based chronic care model. Oper. Res. 61(6):1277–1294.Link, Google Scholar
Diabetes Prevention Program Research Group (2002) Reduction in the incidence of type 2 diabetes with lifestyle intervention or metformin. New England J. Medicine 346(6):393–403.Crossref, Google Scholar
Diabetes Prevention Program Research Group (2009) 10-year follow-up of diabetes incidence and weight loss in the diabetes prevention program outcomes study. Lancet 374(9720):1677–1686.Crossref, Google Scholar
Eagle JN (1984) The optimal search for a moving target when the search path is constrained. Oper. Res. 32(5):1107–1115.Link, Google Scholar
Filippi S, Cappe O, Garivier A, Szepesvári C (2010) Parametric bandits: The generalized linear case. Lafferty JD, Williams CKI, Shawe-Taylor J, Zemel RS, Culotta A, eds. Advances in Neural Information Processing Systems, vol. 23 (Curran Associates, Red Hook, NY), 586–594.Google Scholar
Frazier P, Wang J (2016) Bayesian optimization for materials design. Lookman T, Alexander FJ, Rajan K, eds. Information Science for Materials Discovery and Design (Springer, Cham, Switzerland), 45–75.Crossref, Google Scholar
Friedenreich C, Neilson H, Lynch B (2010) State of the epidemiological evidence on physical activity and cancer prevention. Eur. J. Cancer 46(14):2593–2604.Crossref, Google Scholar
Fujiwara M (1916) Über die obere schranke des absoluten betrages der wurzeln einer algebraischen gleichung. Tohoku Math. J. First Ser. 10(1916):167–171.Google Scholar
Fukuoka Y, Vittinghoff E, Hooper J (2018) A weight loss intervention using a commercial mobile application in Latino Americans-Adelgaza Trial. Translational Behavioral Medicine 8(5):714–723.Crossref, Google Scholar
Fukuoka Y, Gay C, Joiner K, Vittinghoff E (2015) A novel diabetes prevention intervention using a mobile app: A randomized controlled trial with overweight adults at risk. Amer. J. Preventive Medicine 49(2):223–237.Crossref, Google Scholar
Fukuoka Y, Haskell W, Lin F, Vittinghoff E (2019) Short- and Long-term effects of a mobile phone app in conjunction with brief in-person counseling on physical activity among physically inactive women: The mPED Randomized Clinical Trial. JAMA Network Open 2(5):e194281.Crossref, Google Scholar
Garivier A, Cappé O (2011) The KL-UCB algorithm for bounded stochastic bandits and beyond. Proc. 25th Conf. Learn. Theory (JMLR), 359–376.Google Scholar
Garivier A, Moulines E (2008) On upper-confidence bound policies for non-stationary bandit problems. Working paper, Télécom Paris, Paris. Preprint, submitted May 22, https://arxiv.org/abs/0805.3415.Google Scholar
Ghose A, Yang S (2009) An empirical analysis of search engine advertising: Sponsored search in electronic markets. Management Sci. 55(10):1605–1622.Link, Google Scholar
Gittins J (1979) Bandit processes and dynamic allocation indices. J. Roy. Statist. Soc. Ser. B (Methodological) 41(2):148–177.Google Scholar
Goldfarb A, Tucker C (2011) Online display advertising: Targeting and obtrusiveness. Marketing Sci. 30(3):389–404.Link, Google Scholar
Goldfarb A, Tucker C (2014) Standardization and the effectiveness of online advertising. Management Sci. 61(11):2707–2719.Link, Google Scholar
Guha S, Munagala K, Shi P (2010) Approximation algorithms for restless bandit problems. J. ACM 58(1):Article 3.Crossref, Google Scholar
Gupta D, Wang L (2008) Revenue management for a primary-care clinic in the presence of patient choice. Oper. Res. 56(3):576–592.Link, Google Scholar
Gurobi Optimization (2016) Gurobi optimizer reference manual, Version 7.0. Accessed June 1, 2017, http://www.gurobi.com.Google Scholar
Hartland C, Gelly S, Baskiotis N, Teytaud O, Sebag M (2006) Multi-armed bandit, dynamic environments and meta-bandits. Working paper, Université Paris-Sud, Orsay, France.Google Scholar
Hero A, Usman M, Sauve A, Fessler J (1997) Recursive algorithms for computing the Cramer-Rao bound. IEEE Trans. Signal Processing 45(3):803–807.Crossref, Google Scholar
Johari R, Pekelis L, Walsh D (2015) Always valid inference: Bringing sequential analysis to A/B testing. Working paper, Stanford University, Stanford, CA. Preprint, submitted December 15, https://arxiv.org/abs/1512.04922.Google Scholar
Kim M, Lim A (2015) Robust multiarmed bandit problems. Management Sci. 62(1):264–285.Google Scholar
Kleinberg R, Slivkins A, Upfal E (2008) Multi-armed bandits in metric spaces. Proc. 40th Annual ACM Sympos. Theory Comput. (ACM, New York), 681–690.Google Scholar
Kontorovich A (2014) Concentration in unbounded metric spaces and algorithmic stability. Xing EP, Jebara T, eds. Proc. 31st Internat. Conf. Machine Learn. (ICML), 28–36.Google Scholar
Koolen W, Malek A, Bartlett P (2014) Efficient minimax strategies for square loss games. Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 27 (Curran Associates, Red Hook, NY), 3230–3238.Google Scholar
Koolen W, Malek A, Bartlett P, Abbasi Y (2015) Minimax time series prediction. Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 28 (Curran Associates, Red Hook, NY), 2557–2565.Google Scholar
Kucukyazici B, Verter V, Mayo N (2011) An analytical framework for designing community-based care for chronic diseases. Production Oper. Management 20(3):474–488.Crossref, Google Scholar
Laffont J-J, Martimort D (2002) The Theory of Incentives: The Principal-Agent Model (Princeton University Press, Princeton, NJ).Crossref, Google Scholar
Lai T, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.Crossref, Google Scholar
Leff H, Dada M, Graves S (1986) An LP planning model for a mental health community support system. Management Sci. 32(2):139–155.Link, Google Scholar
Levine N, Crammer K, Mannor S (2017) Rotting bandits. Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Proc. Neural Inform. Processing Systems 2017 (Neural Information Processing Systems, San Diego).Google Scholar
Lewis B, Napolitano M, Buman M, Williams D, Nigg C (2017) Future directions in physical activity intervention research: Expanding our focus to sedentary behaviors, technology, and dissemination. J. Behavioral Medicine 40(1):112–126.Crossref, Google Scholar
Liu K, Zhao Q (2010a) Distributed learning in multi-armed bandit with multiple players. IEEE Trans. Signal Processing 58(11):5667–5681.Crossref, Google Scholar
Liu K, Zhao Q (2010b) Indexability of restless bandit problems and optimality of Whittle index for dynamic multichannel access. IEEE Trans. Inform. Theory 56(11):5547–5567.Crossref, Google Scholar
Mason J, Denton B, Smith S, Shah N (2018) Using electronic health records to monitor and improve adherence to medication. IISE Trans. Healthcare Systems Engrg. 7(4):194–214.Google Scholar
McCullagh P (1984) Generalized linear models. Eur. J. Oper. Res. 16(3):285–292.Crossref, Google Scholar
Mintz Y, Aswani A, Kaminsky P, Flowers E, Fukuoka Y (2017) Behavioral analytics for myopic agents. Working paper, University of California, Berkeley, Berkeley. Preprint, submitted February 17, https://arxiv.org/abs/1702.05496.Google Scholar
Monahan GE (1982) A survey of partially observable Markov decision processes: Theory, models, and algorithms. Management Sci. 28(1):1–16.Link, Google Scholar
Papadimitriou C, Tsitsiklis J (1999) The complexity of optimal queuing network control. Math. Oper. Res. 24(2):293–305.Link, Google Scholar
Portnoy F, Marchionini G (2010) Modeling the effect of habituation on banner blindness as a function of repetition and search type: Gap analysis for future work. CHI’10 Extended Abstracts Human Factors Comput. Systems (ACM, New York), 4297–4302.Crossref, Google Scholar
PwC Health Research Institute (2014) Health wearables: Early days. Report, PricewaterhouseCoopers, New York.Google Scholar
Qi A, Ahn HS, Sinha A (2017) Capacity investment with demand learning. Oper. Res. 65(1):145–164.Link, Google Scholar
Qu X, Keener R (2011) Theoretical Statistics: Topics for a Core Course (Springer, New York).Google Scholar
Radner R (1985) Repeated principal-agent games with discounting. Econometrica 53(5):1173–1198.Crossref, Google Scholar
Richter W (2017) Online ad revenues are surging, but 2 companies are getting most of the spoils. Bus. Insider (April 27), http://www.businessinsider.com/online-ads-revenues-going-to-google-and-facebook-the-most-2017-4.Google Scholar
Rockafellar RT, Wets RJB (2009) Variational Analysis, Grundlehren der mathematischen Wissenschaften, vol. 317 (Springer Science & Business Media, New York).Google Scholar
Rosenbaum L (2016) Should you really take 10,000 steps a day? Fitbit (blog), Accessed March 1 2019, https://blog.fitbit.com/should-you-really-take-10000-steps-a-day/.Google Scholar
Russo D, Roy BV (2014) Learning to optimize via posterior sampling. Math. Oper. Res. 39(4):1221–1243.Link, Google Scholar
Russo D, Roy BV (2016) An information-theoretic analysis of Thompson sampling. J. Machine Learn. Res. 17(68):1–30.Google Scholar
Ryzhov I, Powell W (2011) Information collection on a graph. Oper. Res. 59(1):188–201.Link, Google Scholar
Ryzhov I, Powell W, Frazier P (2012) The knowledge gradient algorithm for a general class of online learning problems. Oper. Res. 60(1):180–195.Link, Google Scholar
Sattelmair J, Pertman J, Ding E, Kohl H, Haskell W, Lee I (2011) Dose response between physical activity and risk of coronary heart disease. Circulation 124(7):789–795.Google Scholar
Savelsbergh M, Smilowitz K (2016) Stratified patient appointment scheduling for mobile community-based chronic disease management programs. IIE Trans. Healthcare Systems Engrg. 6(2):65–78.Crossref, Google Scholar
Schell G, Marrero W, Lavieri M, Sussman J, Hayward R (2016) Data-driven Markov decision process approximations for personalized hypertension treatment planning. MDM Policy Practice 1(1):2381468316674214.Crossref, Google Scholar
Shani G, Pineau J, Kaplow R (2013) A survey of point-based POMDP solvers. Autonomous Agents Multi-Agent Systems 27(1):1–51.Crossref, Google Scholar
Shapiro A (1993) Asymptotic behavior of optimal solutions in stochastic programming. Math. Oper. Res. 18(4):829–845.Link, Google Scholar
Smith T, Simmons R (2012) Point-based POMDP algorithms: Improved analysis and implementation. Working paper, Carnegie Mellon University, Pittsburgh. Preprint, submitted July 4, https://arxiv.org/abs/1207.1412.Google Scholar
Stackelberg H (1952) The Theory of Market Economy (Oxford University Press, Oxford, UK).Google Scholar
Thompson W (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4):285–294.Crossref, Google Scholar
U.S. Department of Health and Human Services (2018) Physical activity guidelines for Americans, 2nd edition. Report, U.S. Department of Health and Human Services, Washington, DC.Google Scholar
Wainwright J (2019) High-Dimensional Statistics: A Non-Asymptotic Viewpoint (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Wang WY, Gupta D (2011) Adaptive appointment systems with patient preferences. Manufacturing Service Oper. Management 13(3):373–389.Link, Google Scholar
Wang J, Lee CG (2015) Multistate Bayesian control chart over a finite horizon. Oper. Res. 63(4):949–964.Link, Google Scholar
Whittle P (1988) Restless bandits: Activity allocation in a changing world. J. Appl. Probab. 25(A):287–298.Google Scholar
Xie J, Frazier P (2013) Sequential Bayes-optimal policies for multiple comparisons with a known standard. Oper. Res. 61(5):1174–1189.Link, Google Scholar
Xie J, Frazier P, Chick S (2016) Bayesian optimization via simulation with pairwise sampling and correlated prior beliefs. Oper. Res. 64(2):542–559.Link, Google Scholar
Yi-jun L, Li-gang C, Wen-guo A (2010) Exploitation vs. exploration: Choosing keywords for search-based advertising services. Internat. Conf. Management Sci. Engrg. 17th Annual Conf. Proc. (IEEE, Piscataway, NJ), 3–8.Google Scholar
Zhang L, Yang T, Jin R, Zhou ZH (2017) Strongly adaptive regret implies optimally dynamic regret. Working paper, Nanjing University, Nanjing, China. Preprint, submitted January 26, https://arxiv.org/abs/1701.07570.Google Scholar

Volume 68, Issue 5

September-October 2020

Pages iii-vi, 1285-1624, C2-C3

Article Information

Metrics

Information

Received:December 20, 2017
Accepted:June 28, 2019
Published Online:July 09, 2020

Cite as

Yonatan Mintz, Anil Aswani, Philip Kaminsky, Elena Flowers, Yoshimi Fukuoka (2020) Nonstationary Bandits with Habituation and Recovery Dynamics. Operations Research 68(5):1493-1516.

https://doi.org/10.1287/opre.2019.1918

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Nonstationary Bandits with Habituation and Recovery Dynamics

References

Volume 68, Issue 5

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News