Nonstationary Bandits with Habituation and Recovery Dynamics
Published Online:9 Jul 2020https://doi.org/10.1287/opre.2019.1918
References
- (2013) Bayesian dynamic pricing in queueing systems with unknown delay cost characteristics. Manufacturing Service Oper. Management 15(2):292–304.Link, Google Scholar
- (2013) Thompson sampling for contextual bandits with linear payoffs. Dasgupta S, McAllester D, eds. Proc. 30th Internat. Conf. Machine Learn., vol. 28 (JMLR), III-1220–III-1228.Google Scholar
- (2014) A dynamic near-optimal algorithm for online linear programming. Oper. Res. 62(4):876–890.Link, Google Scholar
- (1987) Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays—Part II: Markovian rewards. IEEE Trans. Automatic Control 32(11):977–982.Crossref, Google Scholar
- (2019) Statistics with set-valued functions: Applications to inverse approximate optimization. Math. Programming 174(1–2):225–251.Google Scholar
- (2018) Inverse optimization with noisy data. Oper. Res. 66(3):870–892.Google Scholar
- (2019) Behavioral modeling in weight loss interventions. Eur. J. Oper. Res. 272(3):1058–1072.Google Scholar
- (2002a) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2–3):235–256.Google Scholar
- (2002b) The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1):48–77.Crossref, Google Scholar
- (2012) A POMDP approach to personalize mammography screening decisions. Oper. Res. 60(5):1019–1034.Link, Google Scholar
- (2015) Heterogeneity in women’s adherence and its role on optimal breast cancer screening policies. Management Sci. 62(5):1339–1362.Google Scholar
- (2019) The data-driven (s, S) policy: Why you can have confidence in censored demand data. Oper. Res. Forthcoming.Google Scholar
- (2019) The big data newsvendor: Practical insights from machine learning. Oper. Res. 67(1):90–108.Link, Google Scholar
- (2002) Rademacher and Gaussian complexities: Risk bounds and structural results. J. Machine Learn. Res. 3(November):463–482.Google Scholar
- (2015) Online decision-making with high-dimensional covariates. Working paper, University of Pennsylvania, Philadelphia.Google Scholar
- (2005) Dynamic Programming and Optimal Control, vol. 1 (Athena Scientific, Belmont, MA).Google Scholar
- (2020) From predictive to prescriptive analytics. Management Sci. 66(3):1025–1044.Google Scholar
- (2007) A learning approach for interactive marketing to a customer segment. Oper. Res. 55(6):1120–1135.Link, Google Scholar
- (2000) Restless bandits, linear programming relaxations, and a primal-dual index heuristic. Oper. Res. 48(1):80–90.Link, Google Scholar
- (2014) Optimal exploration-exploitation in a multi-armed-bandit problem with non-stationary rewards. Working paper, Columbia University, New York. Preprint, submitted May 13, https://arxiv.org/abs/1405.3316.Google Scholar
- (2015) Non-stationary stochastic optimization. Oper. Res. 63(5):1227–1244.Link, Google Scholar
- (2006) Mathematical Statistics: Basic Ideas and Selected Topics, vol. 1, 2nd ed. (Pearson Prentice Hall, Upper Saddle River, NJ).Google Scholar
- (2016) Multi-armed bandit problem with known trend. Neurocomputing 205(September):16–21.Crossref, Google Scholar
- (2007) Dynamic assortment with demand learning for seasonal consumer goods. Management Sci. 53(2):276–292.Link, Google Scholar
- (2015) The post-disaster debris clearance problem under incomplete information. Oper. Res. 63(1):65–85.Link, Google Scholar
- (2005) An adaptive sampling algorithm for solving markov decision processes. Oper. Res. 53(1):126–139.Link, Google Scholar
- (2018) Optimal M-switch surveillance policies for liver cancer in a hepatitis C–infected population. Oper. Res. 66(3):673–696.Link, Google Scholar
- (2013) Combinatorial multi-armed bandit: General framework, results and applications. Dasgupta S, McAllester D, eds. Proc. 30th Internat. Conf. Machine Learn. (PMLR), 151–159.Google Scholar
- (2015) Strongly adaptive online learning. Bach F, Blei D, eds. Proc. 32nd Internat. Conf. Machine Learn. (PMLR), 1405–1411.Google Scholar
- (2014) Statistical learning of service-dependent demand in a multiperiod newsvendor setting. Oper. Res. 62(5):1064–1076.Link, Google Scholar
- (2013) Improving health outcomes through better capacity allocation in a community-based chronic care model. Oper. Res. 61(6):1277–1294.Link, Google Scholar
- Diabetes Prevention Program Research Group (2002) Reduction in the incidence of type 2 diabetes with lifestyle intervention or metformin. New England J. Medicine 346(6):393–403.Crossref, Google Scholar
- Diabetes Prevention Program Research Group (2009) 10-year follow-up of diabetes incidence and weight loss in the diabetes prevention program outcomes study. Lancet 374(9720):1677–1686.Crossref, Google Scholar
- (1984) The optimal search for a moving target when the search path is constrained. Oper. Res. 32(5):1107–1115.Link, Google Scholar
- (2010) Parametric bandits: The generalized linear case. Lafferty JD, Williams CKI, Shawe-Taylor J, Zemel RS, Culotta A, eds. Advances in Neural Information Processing Systems, vol. 23 (Curran Associates, Red Hook, NY), 586–594.Google Scholar
- (2016) Bayesian optimization for materials design. Lookman T, Alexander FJ, Rajan K, eds. Information Science for Materials Discovery and Design (Springer, Cham, Switzerland), 45–75.Crossref, Google Scholar
- (2010) State of the epidemiological evidence on physical activity and cancer prevention. Eur. J. Cancer 46(14):2593–2604.Crossref, Google Scholar
- (1916) Über die obere schranke des absoluten betrages der wurzeln einer algebraischen gleichung. Tohoku Math. J. First Ser. 10(1916):167–171.Google Scholar
- (2018) A weight loss intervention using a commercial mobile application in Latino Americans-Adelgaza Trial. Translational Behavioral Medicine 8(5):714–723.Crossref, Google Scholar
- (2015) A novel diabetes prevention intervention using a mobile app: A randomized controlled trial with overweight adults at risk. Amer. J. Preventive Medicine 49(2):223–237.Crossref, Google Scholar
- (2019) Short- and Long-term effects of a mobile phone app in conjunction with brief in-person counseling on physical activity among physically inactive women: The mPED Randomized Clinical Trial. JAMA Network Open 2(5):e194281.Crossref, Google Scholar
- (2011) The KL-UCB algorithm for bounded stochastic bandits and beyond. Proc. 25th Conf. Learn. Theory (JMLR), 359–376.Google Scholar
- (2008) On upper-confidence bound policies for non-stationary bandit problems. Working paper, Télécom Paris, Paris. Preprint, submitted May 22, https://arxiv.org/abs/0805.3415.Google Scholar
- (2009) An empirical analysis of search engine advertising: Sponsored search in electronic markets. Management Sci. 55(10):1605–1622.Link, Google Scholar
- (1979) Bandit processes and dynamic allocation indices. J. Roy. Statist. Soc. Ser. B (Methodological) 41(2):148–177.Google Scholar
- (2011) Online display advertising: Targeting and obtrusiveness. Marketing Sci. 30(3):389–404.Link, Google Scholar
- (2014) Standardization and the effectiveness of online advertising. Management Sci. 61(11):2707–2719.Link, Google Scholar
- (2010) Approximation algorithms for restless bandit problems. J. ACM 58(1):Article 3.Crossref, Google Scholar
- (2008) Revenue management for a primary-care clinic in the presence of patient choice. Oper. Res. 56(3):576–592.Link, Google Scholar
- Gurobi Optimization (2016) Gurobi optimizer reference manual, Version 7.0. Accessed June 1, 2017, http://www.gurobi.com.Google Scholar
- (2006) Multi-armed bandit, dynamic environments and meta-bandits. Working paper, Université Paris-Sud, Orsay, France.Google Scholar
- (1997) Recursive algorithms for computing the Cramer-Rao bound. IEEE Trans. Signal Processing 45(3):803–807.Crossref, Google Scholar
- (2015) Always valid inference: Bringing sequential analysis to A/B testing. Working paper, Stanford University, Stanford, CA. Preprint, submitted December 15, https://arxiv.org/abs/1512.04922.Google Scholar
- (2015) Robust multiarmed bandit problems. Management Sci. 62(1):264–285.Google Scholar
- (2008) Multi-armed bandits in metric spaces. Proc. 40th Annual ACM Sympos. Theory Comput. (ACM, New York), 681–690.Google Scholar
- (2014) Concentration in unbounded metric spaces and algorithmic stability. Xing EP, Jebara T, eds. Proc. 31st Internat. Conf. Machine Learn. (ICML), 28–36.Google Scholar
- (2014) Efficient minimax strategies for square loss games. Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 27 (Curran Associates, Red Hook, NY), 3230–3238.Google Scholar
- (2015) Minimax time series prediction. Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 28 (Curran Associates, Red Hook, NY), 2557–2565.Google Scholar
- (2011) An analytical framework for designing community-based care for chronic diseases. Production Oper. Management 20(3):474–488.Crossref, Google Scholar
- (2002) The Theory of Incentives: The Principal-Agent Model (Princeton University Press, Princeton, NJ).Crossref, Google Scholar
- (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.Crossref, Google Scholar
- (1986) An LP planning model for a mental health community support system. Management Sci. 32(2):139–155.Link, Google Scholar
- (2017) Rotting bandits. Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Proc. Neural Inform. Processing Systems 2017 (Neural Information Processing Systems, San Diego).Google Scholar
- (2017) Future directions in physical activity intervention research: Expanding our focus to sedentary behaviors, technology, and dissemination. J. Behavioral Medicine 40(1):112–126.Crossref, Google Scholar
- (2010a) Distributed learning in multi-armed bandit with multiple players. IEEE Trans. Signal Processing 58(11):5667–5681.Crossref, Google Scholar
- (2010b) Indexability of restless bandit problems and optimality of Whittle index for dynamic multichannel access. IEEE Trans. Inform. Theory 56(11):5547–5567.Crossref, Google Scholar
- (2018) Using electronic health records to monitor and improve adherence to medication. IISE Trans. Healthcare Systems Engrg. 7(4):194–214.Google Scholar
- (1984) Generalized linear models. Eur. J. Oper. Res. 16(3):285–292.Crossref, Google Scholar
- (2017) Behavioral analytics for myopic agents. Working paper, University of California, Berkeley, Berkeley. Preprint, submitted February 17, https://arxiv.org/abs/1702.05496.Google Scholar
- (1982) A survey of partially observable Markov decision processes: Theory, models, and algorithms. Management Sci. 28(1):1–16.Link, Google Scholar
- (1999) The complexity of optimal queuing network control. Math. Oper. Res. 24(2):293–305.Link, Google Scholar
- (2010) Modeling the effect of habituation on banner blindness as a function of repetition and search type: Gap analysis for future work. CHI’10 Extended Abstracts Human Factors Comput. Systems (ACM, New York), 4297–4302.Crossref, Google Scholar
- PwC Health Research Institute (2014) Health wearables: Early days. Report, PricewaterhouseCoopers, New York.Google Scholar
- (2017) Capacity investment with demand learning. Oper. Res. 65(1):145–164.Link, Google Scholar
- (2011) Theoretical Statistics: Topics for a Core Course (Springer, New York).Google Scholar
- (1985) Repeated principal-agent games with discounting. Econometrica 53(5):1173–1198.Crossref, Google Scholar
- (2017) Online ad revenues are surging, but 2 companies are getting most of the spoils. Bus. Insider (April 27), http://www.businessinsider.com/online-ads-revenues-going-to-google-and-facebook-the-most-2017-4.Google Scholar
- (2009) Variational Analysis, Grundlehren der mathematischen Wissenschaften, vol. 317 (Springer Science & Business Media, New York).Google Scholar
- (2016) Should you really take 10,000 steps a day? Fitbit (blog), Accessed March 1 2019, https://blog.fitbit.com/should-you-really-take-10000-steps-a-day/.Google Scholar
- (2014) Learning to optimize via posterior sampling. Math. Oper. Res. 39(4):1221–1243.Link, Google Scholar
- (2016) An information-theoretic analysis of Thompson sampling. J. Machine Learn. Res. 17(68):1–30.Google Scholar
- (2011) Information collection on a graph. Oper. Res. 59(1):188–201.Link, Google Scholar
- (2012) The knowledge gradient algorithm for a general class of online learning problems. Oper. Res. 60(1):180–195.Link, Google Scholar
- (2011) Dose response between physical activity and risk of coronary heart disease. Circulation 124(7):789–795.Google Scholar
- (2016) Stratified patient appointment scheduling for mobile community-based chronic disease management programs. IIE Trans. Healthcare Systems Engrg. 6(2):65–78.Crossref, Google Scholar
- (2016) Data-driven Markov decision process approximations for personalized hypertension treatment planning. MDM Policy Practice 1(1):2381468316674214.Crossref, Google Scholar
- (2013) A survey of point-based POMDP solvers. Autonomous Agents Multi-Agent Systems 27(1):1–51.Crossref, Google Scholar
- (1993) Asymptotic behavior of optimal solutions in stochastic programming. Math. Oper. Res. 18(4):829–845.Link, Google Scholar
- (2012) Point-based POMDP algorithms: Improved analysis and implementation. Working paper, Carnegie Mellon University, Pittsburgh. Preprint, submitted July 4, https://arxiv.org/abs/1207.1412.Google Scholar
- (1952) The Theory of Market Economy (Oxford University Press, Oxford, UK).Google Scholar
- (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4):285–294.Crossref, Google Scholar
- U.S. Department of Health and Human Services (2018) Physical activity guidelines for Americans, 2nd edition. Report, U.S. Department of Health and Human Services, Washington, DC.Google Scholar
- (2019) High-Dimensional Statistics: A Non-Asymptotic Viewpoint (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- (2011) Adaptive appointment systems with patient preferences. Manufacturing Service Oper. Management 13(3):373–389.Link, Google Scholar
- (2015) Multistate Bayesian control chart over a finite horizon. Oper. Res. 63(4):949–964.Link, Google Scholar
- (1988) Restless bandits: Activity allocation in a changing world. J. Appl. Probab. 25(A):287–298.Google Scholar
- (2013) Sequential Bayes-optimal policies for multiple comparisons with a known standard. Oper. Res. 61(5):1174–1189.Link, Google Scholar
- (2016) Bayesian optimization via simulation with pairwise sampling and correlated prior beliefs. Oper. Res. 64(2):542–559.Link, Google Scholar
- (2010) Exploitation vs. exploration: Choosing keywords for search-based advertising services. Internat. Conf. Management Sci. Engrg. 17th Annual Conf. Proc. (IEEE, Piscataway, NJ), 3–8.Google Scholar
- (2017) Strongly adaptive regret implies optimally dynamic regret. Working paper, Nanjing University, Nanjing, China. Preprint, submitted January 26, https://arxiv.org/abs/1701.07570.Google Scholar

