Uncertainty Quantification and Exploration for Reinforcement Learning
Published Online:2 Mar 2023https://doi.org/10.1287/opre.2023.2436
References
- (2017) Constrained policy optimization. Proc. 34th Internat. Conf. Machine Learn. (JMLR), 70:22–31.Google Scholar
- (1999) Constrained Markov Decision Processes, vol. 7 (CRC Press, Boca Raton, FL).Google Scholar
- (2020) Learning robust control policies for end-to-end autonomous driving from data-driven simulation. IEEE Robot. Autom. Lett. 5(2):1143–1150.Crossref, Google Scholar
- (2010) Best arm identification in multi-armed bandits. 23rd Annual Conf. Learn. Theory (COLT 2010), 41–53.Google Scholar
- (2017) Minimax regret bounds for reinforcement learning. Internat. Conf. Machine Learn. (PMLR), 263–272.Google Scholar
- (2012) Tutorial: Input uncertainty in output analysis. Laroque C, Himmelspach J, Pasupathy R, Rose O, Uhrmacher A, eds. Proc. 2012 Winter Simulation Conf. (IEEE, Piscataway, NJ), 1–12.Google Scholar
- (2013) Quantifying input uncertainty via simulation confidence intervals. INFORMS J. Comput. 26(1):74–87.Link, Google Scholar
- (2001) Resampling methods for input modeling. Peters BA, Smith JS, Medeiros DJ, Rohrer MW, eds. Proc. 2001 Winter Simulation Conf. (IEEE, Piscataway, NJ), 1:372–378.Google Scholar
- (2006) Assessing solution quality in stochastic programs. Math. Programming 108(2–3):495–514.Crossref, Google Scholar
- (2017) A distributional perspective on reinforcement learning. Internat. Conf. Machine Learn. (PMLR), 449–458.Google Scholar
- (2016) Budget allocation using weakly coupled, constrained Markov decision processes. Proc. Thirty-Second Conf. Uncertainty Artificial Intelligence (AUAI Press), 52–61.Google Scholar
- (2006) Efficient dynamic simulation allocation in ordinal optimization. IEEE Trans. Automat. Control. 51(12):2005–2009.Crossref, Google Scholar
- (2011) Stochastic Simulation Optimization: An Optimal Computing Budget Allocation, vol. 1 (World Scientific, Singapore).Google Scholar
- (2020) Explicit mean-square error bounds for Monte-Carlo and linear stochastic approximation. Internat. Conf. Artificial Intelligence Statist. (PMLR), 4173–4183.Google Scholar
- (2013) An optimal sample allocation strategy for partition-based random search. IEEE Trans. Autom. Sci. Engrg. 11(1):177–186.Crossref, Google Scholar
- (1997) Sensitivity of computer simulation experiments to errors in input data. J. Statist. Comput. Simul. 57(1–4):219–241.Crossref, Google Scholar
- (2004) Calculation of confidence intervals for simulation output. ACM Trans. Model. Comput. Simul. 14(4):344–362.Crossref, Google Scholar
- (2001) Input distribution selection for simulation experiments: Accounting for input uncertainty. Oper. Res. 49(5):744–758.Link, Google Scholar
- (2017) Risk-constrained reinforcement learning with percentile risk criteria. J. Machine Learn. Res. 18(1):6070–6120.Google Scholar
- (2020) A survey of algorithms for black-box safety validation. Preprint, submitted May 6, https://doi.org/10.48550/arXiv.2005.02979.Google Scholar
- (2017) Zap Q-learning. Proc. 31st Internat. Conf. Neural Inform. Processing Systems, 2232–2241.Google Scholar
- (2016) Three asymptotic regimes for ranking and selection with general sample distributions. Proc. 2016 Winter Simulation Conf. (IEEE, Piscataway, NJ), 277–288.Google Scholar
- (2012) Splitting randomized stationary policies in total-reward Markov decision processes. Math. Oper. Res. 37(1):129–153.Link, Google Scholar
- (2016) Efficient feasibility determination with multiple performance measure constraints. IEEE Trans. Automat. Control. 62(1):113–122.Crossref, Google Scholar
- (2017) Robust ranking and selection with optimal computing budget allocation. Automatica J. IFAC. 81:30–36.Crossref, Google Scholar
- (2004) A large deviations perspective on ordinal optimization. Proc. 36th Winter Simulation Conf., 577–585.Google Scholar
- (1995) Stable function approximation in dynamic programming. Machine Learn. Proc. (Elsevier), 261–268.Google Scholar
- (2013) Stochastic Decomposition: A Statistical Method for Large Scale Stochastic Linear Programming, vol. 8 (Springer Science & Business Media).Google Scholar
- (2010) Near-optimal regret bounds for reinforcement learning. J. Machine Learn. Res. 11(Apr):1563–1600.Google Scholar
- (2012) Efficient computing budget allocation for simulation-based policy improvement. IEEE Trans. Autom. Sci. Engrg. 9(2):342–352.Crossref, Google Scholar
- (2018) Is Q-learning provably efficient? Adv. Neural Inf. Process. Syst. 31:4863–4873.Google Scholar
- (2003) On the sample complexity of reinforcement learning. PhD thesis, University College London.Google Scholar
- (2018) QT-Opt: Scalable deep reinforcement learning for vision-based robotic manipulation. Preprint, submitted June 27, https://doi.org/10.48550/arxiv.1806.10293.Google Scholar
- (2016) On the complexity of best-arm identification in multi-armed bandit models. J. Machine Learn. Res. 17(1):1–42.Google Scholar
- (1998) Finite-sample convergence rates for Q-learning and indirect algorithms. Proc. Conf. Adv. Neural Inform. Processing Systems II, 996–1002.Google Scholar
- (2007) Recent advances in ranking and selection. 2007 Winter Simulation Conf. (IEEE, Piscataway, NJ), 162–172.Google Scholar
- (2020) Deep reinforcement learning for autonomous driving: A survey. Preprint, submitted February 2, https://doi.org/10.48550/arxiv.2002.00444.Google Scholar
- (2014) Dynamic treatment regimes: Technical challenges and applications. Electron. J. Stat. 8(1):1225.Google Scholar
- (2016) Advanced tutorial: Input uncertainty and robust analysis in stochastic simulation. 2016 Winter Simulation Conf. (WSC) (IEEE, Piscataway, NJ), 178–192.Google Scholar
- (2012) Approximate simulation budget allocation for selecting the best design in the presence of stochastic constraints. IEEE Trans. Automat. Control. 57(11):2940–2945.Crossref, Google Scholar
- (1999) Monte Carlo bounding techniques for determining solution quality in stochastic programs. Oper. Res. Lett. 24(1–2):47–56.Crossref, Google Scholar
- (2004) Bias and variance in value function estimation. Proc. 21st Internat. Conf. Machine Learn. (ACM), 72.Google Scholar
- (2007) Bias and variance approximation in value function estimates. Management Sci. 53(2):308–322.Link, Google Scholar
- (2008) Finite-time bounds for fitted value iteration. J. Machine Learn. Res. 9(May):815–857.Google Scholar
- (2013) (More) efficient reinforcement learning via posterior sampling. Adv. Neural Inf. Process. Syst. 26:3003–3011.Google Scholar
- (2017) Gradient-based myopic allocation policy: An efficient sampling procedure in a low-confidence scenario. IEEE Trans. Automat. Control. 63(9):3091–3097.Crossref, Google Scholar
- (2020) Efficient sampling allocation procedures for optimal quantile selection. INFORMS J. Comput. 33(1):230–245.Link, Google Scholar
- (2018) Ranking and selection as stochastic control. IEEE Trans. Automat. Control. 63(8):2359–2373.Crossref, Google Scholar
- (2017) Pure exploration in episodic fixed-horizon Markov decision processes. AAMAS, 1703–1704.Google Scholar
- (2016) Simple Bayesian algorithms for best arm identification. Conf. Learn. Theory, 1417–1418.Google Scholar
- (2018) The local time method for targeting and selection. Oper. Res. 66(5):1406–1422.Link, Google Scholar
- (2012) The knowledge gradient algorithm for a general class of online learning problems. Oper. Res. 60(1):180–195.Link, Google Scholar
- (2017) The role of simulation in development and testing of autonomous vehicles. Driving Simulation Conf., Stuttgart.Google Scholar
- (2009) Approximation Theorems of Mathematical Statistics, vol. 162. (John Wiley & Sons, Hoboken, NJ).Google Scholar
- (2018) Q-learning with nearest neighbors. Adv. Neural Inf. Process. Syst. 31:3111–3121.Google Scholar
- (2014) Lectures on Stochastic Programming: Modeling and Theory (SIAM, Philadelphia).Crossref, Google Scholar
- (2021) Ranking and selection with covariates for personalized decision making. INFORMS J. Comput. 33(4):1259–1684.Google Scholar
- (2016) Tractable sampling strategies for quantile-based ordinal optimization. 2016 Winter Simulation Conf. (WSC) (IEEE, Piscataway, NJ), 847–858.Google Scholar
- (2015) Quickly assessing contributions to input uncertainty. IIE Trans. 47(9):893–909.Crossref, Google Scholar
- (2014) Advanced tutorial: Input uncertainty quantification. Tolk A, Diallo S, Ryzhov I, Yilmaz L, Buckley S, Miller J, eds. Proc. 2014 Winter Simulation Conf. (IEEE, Piscataway, NJ), 162–176.Google Scholar
- (2008) An analysis of model-based interval estimation for Markov decision processes. J. Comput. System Sci. 74(8):1309–1331.Crossref, Google Scholar
- (2014) A Bayesian framework for quantifying uncertainty in stochastic simulation. Oper. Res. 62(6):1439–1452.Link, Google Scholar
- (2020) Reinforcement learning in feature space: Matrix bandit, kernels, and regret bound. Internat. Conf. Machine Learn. (PMLR), 10746–10756.Google Scholar
- (2017) An efficient budget allocation approach for quantifying the impact of input uncertainty in stochastic simulation. ACM Trans. Model. Comput. Simul. 27(4):25.Crossref, Google Scholar
- (2020) Risk quantification in stochastic simulation under input uncertainty. ACM Trans. Model. Comput. Simul. 30(1):1–24.Crossref, Google Scholar
- (2004) Accounting for input-model and input-parameter uncertainties in simulation. IIE Trans. 36(11):1135–1151.Crossref, Google Scholar

