Ambiguous Dynamic Treatment Regimes: A Reinforcement Learning Approach
References
- ADA (2012) Standards of medical care in diabetes. Diabetes Care 35:S11–S63.Crossref, Google Scholar
- (2014) Estimating ambiguity aversion in a portfolio choice experiment. Quant. Econom. 5(2):195–223.Crossref, Google Scholar
- (1996) Identification of causal effects using instrumental variables. J. Amer. Statist. Assoc. 91:434–471.Google Scholar
- (1951) Alternative approaches to the theory of choice in risk-taking situations. Econometrica 19(4):404–437.Crossref, Google Scholar
- (1977) An optimality criterion for decision making under ignorance. Arrow KJ, Hurwicz L, eds. Studies in Resource Allocation Processes (Cambridge University Press, Cambridge, UK), 461–472.Crossref, Google Scholar
- (2021) Policy learning with observational data. Econometrica 89(1):133–161.Crossref, Google Scholar
- (2021) Doubly robust estimation in missing data and causal inference models. Biometrics 61(4):962–972.Crossref, Google Scholar
- (2021) Proximal reinforcement learning: Efficient off-policy evaluation in partially observed Markov decision processes. Preprint, submitted October 28, https://doi.org/10.48550/arXiv.2110.15332.Google Scholar
- (2021) Off-policy evaluation in infinite-horizon reinforcement learning with latent confounders. Proc. 24th Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 1999–2007.Google Scholar
- (2000) The Origin and Evolution of New Business (Oxford University Press, Oxford, UK).Crossref, Google Scholar
- (2015) Characterization of remitting and relapsing hyperglycemia in post-renal-transplant recipients. PLoS One 10(11):1–16.Crossref, Google Scholar
- (2020) Data-driven management of post-transplant medications: An ambiguous partially observable Markov decision process approach. Manufacturing Service Oper. Management 22(5):1066–1087.Link, Google Scholar
- (1979) Robustness in the strategy of scientific model building. Launer R, Wilkinson G, eds. Robustness in Statistics (Academic Press, New York), 201–236.Crossref, Google Scholar
- (2019) Data-driven percentile optimization for Multi-Class queueing systems with model ambiguity: Theory and application. INFORMS J. Optim. 1(4):267–287.Link, Google Scholar
- (2018) Incorporating patient preferences into estimation of optimal individualized treatment rules. Biometrics 74(1):18–26.Crossref, Google Scholar
- (2009) Hyperglycemia during the immediate period after kidney transplantation. Clinical J. Amer. Soc. Nephrology 4:853–859.Crossref, Google Scholar
- (2014) Dynamic treatment regimes. Annual Rev. Statist. Appl. 1(1):447–464.Crossref, Google Scholar
- (2002) Maximal inequalities and empirical central limit theorems. Mikosch T, Sørensen M, eds. Empirical Process Techniques for Dependent Data (Birkhäuser, Boston), 137–159.Crossref, Google Scholar
- (2007) Custom-made vs. ready-to-wear treatments: Behavioral propensities in physicians’ choices. J. Health Econom. 26(6):1101–1127.Crossref, Google Scholar
- (2004) Differentiating ambiguity and ambiguity attitude. J. Econom. Theory 118:133–173.Crossref, Google Scholar
- (2012) New-onset diabetes after renal transplantation risk assessment and management. Diabetes Care 35(1):181–188.Crossref, Google Scholar
- (1982) Large sample properties of generalized method of moments estimators. Econometrica 50(4):1029–1054.Crossref, Google Scholar
- (1991) Preference and belief: Ambiguity and competence in choice under uncertainty. J. Risk Uncertainty 4(1):5–28.Crossref, Google Scholar
- (2021) Off-policy evaluation in partially observed Markov decision processes. Preprint, submitted October 24, https://arxiv.org/abs/2110.12343.Google Scholar
- (1951a) Optimality criteria for decision making under ignorance. Cowles Commission Discussion Paper: Statistics No. 370, Cowles Commission.Google Scholar
- (1951b) Some specification problems and applications to econometric models. Econometrica 19:343–344.Google Scholar
- (2016) Doubly robust off-policy value evaluation for reinforcement learning. Proc. 33rd Internat. Conf. Machine Learn. (JMLR: W&CP), 652–661.Google Scholar
- (2020) Double reinforcement learning for efficient off-policy evaluation in Markov decision processes. J. Machine Learn. Res. 21:1–63.Google Scholar
- (2020) Confounding-robust policy evaluation in infinite-horizon reinforcement learning. Preprint, submitted February 11, https://arxiv.org/abs/2002.04518.Google Scholar
- (2021) Minimax-optimal policy learning under unobserved confounding. Management Sci. 67(5):2870–2890.Link, Google Scholar
- (2008) Introduction to Empirical Processes and Semiparametric Inference (Springer, New York).Crossref, Google Scholar
- (2019) Precision medicine. Annual Rev. Statist. Appl. 6(263–286):1243–1254.Google Scholar
- (2014) Set-valued dynamic treatment regimes for competing outcomes. Biometrics 70(1):53–61.Crossref, Google Scholar
- (2021) Median optimal treatment regimes. Preprint, submitted March 2, https://arxiv.org/abs/2103.01802.Google Scholar
- (2015) Estimation of dynamic treatment regimes for complex outcomes: Balancing benefits and risks. Kosorok MR, Moodie EEM, eds. Adaptive Treatment Strategies in Practice: Planning Trials and Analyzing Data for Personalized Medicine (SIAM, Philadelphia), 249–262.Crossref, Google Scholar
- (2017) Interactive Q-learning for quantiles. J. Amer. Statist. Assoc. 112(518):638–649.Crossref, Google Scholar
- (2016) Multi-objective Markov decision processes for data-driven decision support. J. Machine Learn. Res. 17(1):7378–7405.Google Scholar
- (2012) Linear fitted-q iteration with multiple reward functions. J. Machine Learn. Res. 13(1):3253–3295.Google Scholar
- (2020) Estimating dynamic treatment regimes in mobile health using V-learning. J. Amer. Statist. Assoc. 115(530):692–706.Crossref, Google Scholar
- (2009) Convex piecewise-linear fitting. Optim. Engrg. 10:1–17.Crossref, Google Scholar
- (2007) Identification for Prediction and Decision (Harvard University Press, Cambridge, MA).Google Scholar
- (2021) Econometrics for decision making: Building foundations sketched by Haavelmo and Wald. Econometrica 89(6):2827–2853.Crossref, Google Scholar
- (2002) Probabilistic sophistication and multiple priors. Econometrica 70(2):755–764.Crossref, Google Scholar
- (2021) Use of imputation and decision modeling to improve diagnosis and management of patients at risk for newonset diabetes after transplantation. Ann. Transplantation 26:1–9.Crossref, Google Scholar
- (2020a) Comparison of post-transplantation diabetes mellitus incidence and risk factors between kidney and liver transplantation patients. PLoS One 15(1):1–12.Crossref, Google Scholar
- (2020b) Incidence, risk factors, and trends for post-heart transplantation diabetes mellitus. Amer. J. Cardiology 125(3):436–440.Crossref, Google Scholar
- (2003) Optimal dynamic treatment regimes. J. Roy. Statist. Soc. Ser. B Statist. Methodology 65(2):331–355.Crossref, Google Scholar
- (2005) An experimental design for the development of adaptive treatment strategies. Statist. Medicine 24(10):1455–1481.Crossref, Google Scholar
- , CPPRG (2001) Marginal mean models for dynamic regimes. J. Amer. Statist. Assoc. 96(456):1410–1423.Crossref, Google Scholar
- (2016) A batch, off-policy, actor-critic algorithm for optimizing the average reward. Preprint, submitted July 18, https://arxiv.org/abs/1607.05047.Google Scholar
- (2020) Off-policy policy evaluation for sequential decisions under unobserved confounding. Preprint, submitted March 12, https://arxiv.org/abs/2003.05623.Google Scholar
- (2021) Learning when-to-treat policies. J. Amer. Statist. Assoc. 116(533):392–409.Crossref, Google Scholar
- (2009) Causality: Models, Reasoning, and Inference (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- (1995) Probabilistic evaluation of sequential plans from causal models with hidden variables. Besnard P, Hanks S, eds. Uncertainty in Artificial Intelligence 11 (Morgan Kaufmann, San Francisco), 444–453.Google Scholar
- (2000) Eligibility traces for off-policy policy evaluation. Proc. 17th Internat. Conf. Machine Learn., 759–766.Google Scholar
- (1986) A new approach to causal inference in mortality studies with a sustained exposure period—Application to control of the healthy worker survivor effect. Math. Model. 7(9–12):1393–1512.Crossref, Google Scholar
- (1997) Causal inference from complex longitudinal data. Berkane M, ed. Latent Variable Modeling and Applications to Causality (Springer, New York), 69–117.Crossref, Google Scholar
- (2004) Optimal structural nested models for optimal sequential decisions. Proc. Second Seattle Sympos. Biostatistics (Springer, New York), 189–326.Google Scholar
- (2000) Marginal structural models and causal inference in epidemiology. Epidemiology 11(5):550–560.Crossref, Google Scholar
- (2002) Observational Studies (Springer, New York).Crossref, Google Scholar
- (2010) Design of Observational Studies (Springer, New York).Crossref, Google Scholar
- (1986) Comment: Which ifs have causal answers. J. Amer. Statist. Assoc. 81:961–962.Google Scholar
- (2018) Ambiguous partially observable Markov decision processes: Structural results and applications. J. Econom. Theory 178:1–35.Crossref, Google Scholar
- (2021) Innovative healthcare delivery: The scientific and regulatory challenges in designing mHealth interventions. NAM Perspectives. Commentary. Report, National Academy of Medicine, Washington, DC.Google Scholar
- (2019) Robust partially observable Markov decision processes. Working paper, Harvard University, Cambridge, MA.Google Scholar
- (2016) The newsvendor under demand ambiguity: Combining data with moment and tail information. Oper. Res. 64(1):167–185.Link, Google Scholar
- (2022) The Internet of things and information fusion: Who talks to who? Manufacturing Service Oper. Management 24(1):333–351.Link, Google Scholar
- (1951) The theory of statistical decision. J. Amer. Statist. Assoc. 46:55–67.Crossref, Google Scholar
- (1973) The optimal control of partially observable Markov processes over a finite horizon. Oper. Res. 21(5):1071–1088.Link, Google Scholar
- (2011) Statistical decisions under ambiguity. Theory Decision 70(2):129–148.Crossref, Google Scholar
- (2020) Off-policy evaluation in partially observable environments. Proc. Conf. AAAI Artificial Intelligence 34:10276–10283.Crossref, Google Scholar
- (2016) Data-efficient off-policy policy evaluation for reinforcement learning. Proc. 33rd Internat. Conf. Machine Learn., 2139–2148.Google Scholar
- (2019) Dynamic Treatment Regimes: Statistical Methods for Precision Medicine (Chapman and Hall/CRC, Boca Raton, FL).Crossref, Google Scholar
- (1939) Contribution to the theory of statistical estimation and testing hypotheses. Ann. Math. Statist. 10:299–326.Crossref, Google Scholar
- (1945) Statistical decision functions which minimize the maximum risk. Ann. Math. 46:265–280.Crossref, Google Scholar
- (1950) Statistical Decision Functions (Wiley, New York).Google Scholar
- (2018) Quantile-optimal treatment regimes. J. Amer. Statist. Assoc. 113(523):1243–1254.Crossref, Google Scholar
- (2016) Approximate models and robust decisions. Statist. Sci. 31:465–489.Crossref, Google Scholar
- (2017) 2017 ACC/AHA/AAPA/ABC/ACPM/AGS/APhA/ASH/ASPC/NMA/PCNA guideline for the prevention, detection, evaluation, and management of high blood pressure in adults: A report of the American College of Cardiology/American Heart Association Task Force on clinical practice guidelines. J. Amer. College Cardiology 71(19):e127–e248.Crossref, Google Scholar
- (2020) Latent-state models for precision medicine. Preprint, submitted May 26, https://arxiv.org/abs/2005.13001.Google Scholar
- (2019) Near-optimal reinforcement learning in dynamic treatment regimes. Adv. Neural Inform. Processing Systems, vol. 32 (NeurIPS).Google Scholar
- (2018) Interpretable dynamic treatment regimes. J. Amer. Statist. Assoc. 113(524):1541–1549.Crossref, Google Scholar
- (2015) New statistical learning methods for estimating optimal dynamic treatment regimes. J. Amer. Statist. Assoc. 110(510):583–598.Crossref, Google Scholar

