Ambiguous Dynamic Treatment Regimes: A Reinforcement Learning Approach

Published Online:https://doi.org/10.1287/mnsc.2022.00883

References

  • ADA (2012) Standards of medical care in diabetes. Diabetes Care 35:S11–S63.CrossrefGoogle Scholar
  • Ahn D, Choi S, Gale D, Kariv S (2014) Estimating ambiguity aversion in a portfolio choice experiment. Quant. Econom. 5(2):195–223.CrossrefGoogle Scholar
  • Angrist JD, Imbens GW, Rubin DB (1996) Identification of causal effects using instrumental variables. J. Amer. Statist. Assoc. 91:434–471.Google Scholar
  • Arrow KJ (1951) Alternative approaches to the theory of choice in risk-taking situations. Econometrica 19(4):404–437.CrossrefGoogle Scholar
  • Arrow KJ, Hurwicz L (1977) An optimality criterion for decision making under ignorance. Arrow KJ, Hurwicz L, eds. Studies in Resource Allocation Processes (Cambridge University Press, Cambridge, UK), 461–472.CrossrefGoogle Scholar
  • Athey S, Wager S (2021) Policy learning with observational data. Econometrica 89(1):133–161.CrossrefGoogle Scholar
  • Bang H, Robins JM (2021) Doubly robust estimation in missing data and causal inference models. Biometrics 61(4):962–972.CrossrefGoogle Scholar
  • Bennett A, Kallus N (2021) Proximal reinforcement learning: Efficient off-policy evaluation in partially observed Markov decision processes. Preprint, submitted October 28, https://doi.org/10.48550/arXiv.2110.15332.Google Scholar
  • Bennett A, Kallus N, Li L, Mousavi A (2021) Off-policy evaluation in infinite-horizon reinforcement learning with latent confounders. Proc. 24th Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 1999–2007.Google Scholar
  • Bhidé AV (2000) The Origin and Evolution of New Business (Oxford University Press, Oxford, UK).CrossrefGoogle Scholar
  • Boloori A, Saghafian S, Chakkera HA, Cook CB (2015) Characterization of remitting and relapsing hyperglycemia in post-renal-transplant recipients. PLoS One 10(11):1–16.CrossrefGoogle Scholar
  • Boloori A, Saghafian S, Chakkera HA, Cook CB (2020) Data-driven management of post-transplant medications: An ambiguous partially observable Markov decision process approach. Manufacturing Service Oper. Management 22(5):1066–1087.LinkGoogle Scholar
  • Box G (1979) Robustness in the strategy of scientific model building. Launer R, Wilkinson G, eds. Robustness in Statistics (Academic Press, New York), 201–236.CrossrefGoogle Scholar
  • Bren A, Saghafian S (2019) Data-driven percentile optimization for Multi-Class queueing systems with model ambiguity: Theory and application. INFORMS J. Optim. 1(4):267–287.LinkGoogle Scholar
  • Butler EL, Laber EB, Davis SM, Kosorok MR (2018) Incorporating patient preferences into estimation of optimal individualized treatment rules. Biometrics 74(1):18–26.CrossrefGoogle Scholar
  • Chakkera HA, Weil EJ, Castro J, Heilman RL, Reddy KS, Mazur MJ, Hamawi K, et al. (2009) Hyperglycemia during the immediate period after kidney transplantation. Clinical J. Amer. Soc. Nephrology 4:853–859.CrossrefGoogle Scholar
  • Chakraborty B, Murphy SA (2014) Dynamic treatment regimes. Annual Rev. Statist. Appl. 1(1):447–464.CrossrefGoogle Scholar
  • Dedecker J, Louhichi S (2002) Maximal inequalities and empirical central limit theorems. Mikosch T, Sørensen M, eds. Empirical Process Techniques for Dependent Data (Birkhäuser, Boston), 137–159.CrossrefGoogle Scholar
  • Frank RG, Zeckhauser RJ (2007) Custom-made vs. ready-to-wear treatments: Behavioral propensities in physicians’ choices. J. Health Econom. 26(6):1101–1127.CrossrefGoogle Scholar
  • Ghiradato P, Maccheroni F, Marinacci M (2004) Differentiating ambiguity and ambiguity attitude. J. Econom. Theory 118:133–173.CrossrefGoogle Scholar
  • Ghisdal L, Van Laecke S, Abramowicz MJ, Vanholder R, Abramowicz D (2012) New-onset diabetes after renal transplantation risk assessment and management. Diabetes Care 35(1):181–188.CrossrefGoogle Scholar
  • Hansen LP (1982) Large sample properties of generalized method of moments estimators. Econometrica 50(4):1029–1054.CrossrefGoogle Scholar
  • Heath C, Tversky A (1991) Preference and belief: Ambiguity and competence in choice under uncertainty. J. Risk Uncertainty 4(1):5–28.CrossrefGoogle Scholar
  • Hu Y, Wager S (2021) Off-policy evaluation in partially observed Markov decision processes. Preprint, submitted October 24, https://arxiv.org/abs/2110.12343.Google Scholar
  • Hurwicz L (1951a) Optimality criteria for decision making under ignorance. Cowles Commission Discussion Paper: Statistics No. 370, Cowles Commission.Google Scholar
  • Hurwicz L (1951b) Some specification problems and applications to econometric models. Econometrica 19:343–344.Google Scholar
  • Jiang N, Li L (2016) Doubly robust off-policy value evaluation for reinforcement learning. Proc. 33rd Internat. Conf. Machine Learn. (JMLR: W&CP), 652–661.Google Scholar
  • Kallus N, Uehara M (2020) Double reinforcement learning for efficient off-policy evaluation in Markov decision processes. J. Machine Learn. Res. 21:1–63.Google Scholar
  • Kallus N, Zhou A (2020) Confounding-robust policy evaluation in infinite-horizon reinforcement learning. Preprint, submitted February 11, https://arxiv.org/abs/2002.04518.Google Scholar
  • Kallus N, Zhou A (2021) Minimax-optimal policy learning under unobserved confounding. Management Sci. 67(5):2870–2890.LinkGoogle Scholar
  • Kosorok MR (2008) Introduction to Empirical Processes and Semiparametric Inference (Springer, New York).CrossrefGoogle Scholar
  • Kosorok MR, Laber EB (2019) Precision medicine. Annual Rev. Statist. Appl. 6(263–286):1243–1254.Google Scholar
  • Laber EB, Lizotte DJ, Ferguson B (2014) Set-valued dynamic treatment regimes for competing outcomes. Biometrics 70(1):53–61.CrossrefGoogle Scholar
  • Leqi L, Kennedy EH (2021) Median optimal treatment regimes. Preprint, submitted March 2, https://arxiv.org/abs/2103.01802.Google Scholar
  • Linn KA, Laber EB, Stefanski LA (2015) Estimation of dynamic treatment regimes for complex outcomes: Balancing benefits and risks. Kosorok MR, Moodie EEM, eds. Adaptive Treatment Strategies in Practice: Planning Trials and Analyzing Data for Personalized Medicine (SIAM, Philadelphia), 249–262.CrossrefGoogle Scholar
  • Linn KA, Laber EB, Stefanski LA (2017) Interactive Q-learning for quantiles. J. Amer. Statist. Assoc. 112(518):638–649.CrossrefGoogle Scholar
  • Lizotte DJ, Laber EB (2016) Multi-objective Markov decision processes for data-driven decision support. J. Machine Learn. Res. 17(1):7378–7405.Google Scholar
  • Lizotte DJ, Bowling M, Murphy SA (2012) Linear fitted-q iteration with multiple reward functions. J. Machine Learn. Res. 13(1):3253–3295.Google Scholar
  • Luckett DJ, Laber EB, Kahkoska AR, Maahs DM, Mayer-Davis E, Kosorok MR (2020) Estimating dynamic treatment regimes in mobile health using V-learning. J. Amer. Statist. Assoc. 115(530):692–706.CrossrefGoogle Scholar
  • Magnani A, Boyd SP (2009) Convex piecewise-linear fitting. Optim. Engrg. 10:1–17.CrossrefGoogle Scholar
  • Manski CF (2007) Identification for Prediction and Decision (Harvard University Press, Cambridge, MA).Google Scholar
  • Manski CF (2021) Econometrics for decision making: Building foundations sketched by Haavelmo and Wald. Econometrica 89(6):2827–2853.CrossrefGoogle Scholar
  • Marinacci M (2002) Probabilistic sophistication and multiple priors. Econometrica 70(2):755–764.CrossrefGoogle Scholar
  • Munshi VN, Saghafian S, Cook CB, Aradhyula S, Chakkera HA (2021) Use of imputation and decision modeling to improve diagnosis and management of patients at risk for newonset diabetes after transplantation. Ann. Transplantation 26:1–9.CrossrefGoogle Scholar
  • Munshi VN, Saghafian S, Cook CB, Werner KT, Chakkera HA (2020a) Comparison of post-transplantation diabetes mellitus incidence and risk factors between kidney and liver transplantation patients. PLoS One 15(1):1–12.CrossrefGoogle Scholar
  • Munshi VN, Saghafian S, Cook CB, Steidley D, Hardaway B, Chakkera HA (2020b) Incidence, risk factors, and trends for post-heart transplantation diabetes mellitus. Amer. J. Cardiology 125(3):436–440.CrossrefGoogle Scholar
  • Murphy SA (2003) Optimal dynamic treatment regimes. J. Roy. Statist. Soc. Ser. B Statist. Methodology 65(2):331–355.CrossrefGoogle Scholar
  • Murphy SA (2005) An experimental design for the development of adaptive treatment strategies. Statist. Medicine 24(10):1455–1481.CrossrefGoogle Scholar
  • Murphy SA, van der Laan MJ, Robins JM, CPPRG (2001) Marginal mean models for dynamic regimes. J. Amer. Statist. Assoc. 96(456):1410–1423.CrossrefGoogle Scholar
  • Murphy SA, Deng Y, Laber EB, Maei HR, Sutton RS, Witkiewitz K (2016) A batch, off-policy, actor-critic algorithm for optimizing the average reward. Preprint, submitted July 18, https://arxiv.org/abs/1607.05047.Google Scholar
  • Namkoong H, Keramati R, Yadlowsky S, Brunskill E (2020) Off-policy policy evaluation for sequential decisions under unobserved confounding. Preprint, submitted March 12, https://arxiv.org/abs/2003.05623.Google Scholar
  • Nie X, Brunskill E, Wager S (2021) Learning when-to-treat policies. J. Amer. Statist. Assoc. 116(533):392–409.CrossrefGoogle Scholar
  • Pearl J (2009) Causality: Models, Reasoning, and Inference (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Pearl J, Robins J (1995) Probabilistic evaluation of sequential plans from causal models with hidden variables. Besnard P, Hanks S, eds. Uncertainty in Artificial Intelligence 11 (Morgan Kaufmann, San Francisco), 444–453.Google Scholar
  • Precup D, Sutton RS, Singh S (2000) Eligibility traces for off-policy policy evaluation. Proc. 17th Internat. Conf. Machine Learn., 759–766.Google Scholar
  • Robins J (1986) A new approach to causal inference in mortality studies with a sustained exposure period—Application to control of the healthy worker survivor effect. Math. Model. 7(9–12):1393–1512.CrossrefGoogle Scholar
  • Robins J (1997) Causal inference from complex longitudinal data. Berkane M, ed. Latent Variable Modeling and Applications to Causality (Springer, New York), 69–117.CrossrefGoogle Scholar
  • Robins J (2004) Optimal structural nested models for optimal sequential decisions. Proc. Second Seattle Sympos. Biostatistics (Springer, New York), 189–326.Google Scholar
  • Robins J, Hernán MA, Brumback B (2000) Marginal structural models and causal inference in epidemiology. Epidemiology 11(5):550–560.CrossrefGoogle Scholar
  • Rosenbaum PR (2002) Observational Studies (Springer, New York).CrossrefGoogle Scholar
  • Rosenbaum PR (2010) Design of Observational Studies (Springer, New York).CrossrefGoogle Scholar
  • Rubin DB (1986) Comment: Which ifs have causal answers. J. Amer. Statist. Assoc. 81:961–962.Google Scholar
  • Saghafian S (2018) Ambiguous partially observable Markov decision processes: Structural results and applications. J. Econom. Theory 178:1–35.CrossrefGoogle Scholar
  • Saghafian S, Murphy SA (2021) Innovative healthcare delivery: The scientific and regulatory challenges in designing mHealth interventions. NAM Perspectives. Commentary. Report, National Academy of Medicine, Washington, DC.Google Scholar
  • Saghafian S, Rasouli M (2019) Robust partially observable Markov decision processes. Working paper, Harvard University, Cambridge, MA.Google Scholar
  • Saghafian S, Tomlin BT (2016) The newsvendor under demand ambiguity: Combining data with moment and tail information. Oper. Res. 64(1):167–185.LinkGoogle Scholar
  • Saghafian S, Tomlin B, Biller S (2022) The Internet of things and information fusion: Who talks to who? Manufacturing Service Oper. Management 24(1):333–351.LinkGoogle Scholar
  • Savage L (1951) The theory of statistical decision. J. Amer. Statist. Assoc. 46:55–67.CrossrefGoogle Scholar
  • Smallwood R, Sondik EJ (1973) The optimal control of partially observable Markov processes over a finite horizon. Oper. Res. 21(5):1071–1088.LinkGoogle Scholar
  • Stoy J (2011) Statistical decisions under ambiguity. Theory Decision 70(2):129–148.CrossrefGoogle Scholar
  • Tennenholtz G, Shalit U, Mannor Sh (2020) Off-policy evaluation in partially observable environments. Proc. Conf. AAAI Artificial Intelligence 34:10276–10283.CrossrefGoogle Scholar
  • Thomas PS, Brunskill E (2016) Data-efficient off-policy policy evaluation for reinforcement learning. Proc. 33rd Internat. Conf. Machine Learn., 2139–2148.Google Scholar
  • Tsiatis AA, Davidian M, Holloway ST, Laber EB, Kosorok MR (2019) Dynamic Treatment Regimes: Statistical Methods for Precision Medicine (Chapman and Hall/CRC, Boca Raton, FL).CrossrefGoogle Scholar
  • Wald A (1939) Contribution to the theory of statistical estimation and testing hypotheses. Ann. Math. Statist. 10:299–326.CrossrefGoogle Scholar
  • Wald A (1945) Statistical decision functions which minimize the maximum risk. Ann. Math. 46:265–280.CrossrefGoogle Scholar
  • Wald A (1950) Statistical Decision Functions (Wiley, New York).Google Scholar
  • Wang L, Zhou Y, Song R, Sherwood B (2018) Quantile-optimal treatment regimes. J. Amer. Statist. Assoc. 113(523):1243–1254.CrossrefGoogle Scholar
  • Watson J, Holmes C (2016) Approximate models and robust decisions. Statist. Sci. 31:465–489.CrossrefGoogle Scholar
  • Whelton PK, Carey RM, Aronow WS, Casey DE Jr, Collins KJ, Himmelfarb CD, DePalma SM, et al. (2017) 2017 ACC/AHA/AAPA/ABC/ACPM/AGS/APhA/ASH/ASPC/NMA/PCNA guideline for the prevention, detection, evaluation, and management of high blood pressure in adults: A report of the American College of Cardiology/American Heart Association Task Force on clinical practice guidelines. J. Amer. College Cardiology 71(19):e127–e248.CrossrefGoogle Scholar
  • Xu Z, Laber E, Staicu AM, Severus E (2020) Latent-state models for precision medicine. Preprint, submitted May 26, https://arxiv.org/abs/2005.13001.Google Scholar
  • Zhang J, Bareinboim E (2019) Near-optimal reinforcement learning in dynamic treatment regimes. Adv. Neural Inform. Processing Systems, vol. 32 (NeurIPS).Google Scholar
  • Zhang Y, Laber EB, Davidian M, Tsiatis AA (2018) Interpretable dynamic treatment regimes. J. Amer. Statist. Assoc. 113(524):1541–1549.CrossrefGoogle Scholar
  • Zhao YQ, Zeng D, Laber EB, Kosorok MR (2015) New statistical learning methods for estimating optimal dynamic treatment regimes. J. Amer. Statist. Assoc. 110(510):583–598.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.