Mining Optimal Policies: A Pattern Recognition Approach to Model Analysis
Published Online:21 May 2020https://doi.org/10.1287/ijoo.2019.0026
References
- (2004) Apprenticeship learning via inverse reinforcement learning. Proc. 21st Internat. Conf. Machine Learn. (ACM, New York), 1.Google Scholar
- (1987a) Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-part i: iid rewards. IEEE Trans. Automat. Control. 32(11):968–976.Google Scholar
- (1987b) Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-part ii: Markovian rewards. IEEE Trans. Automat. Control. 32(11):977–982.Google Scholar
- (2012) Single-leg airline revenue management with overbooking. Transportation Sci. 47(4):560–583.Link, Google Scholar
- (2016) Delayed purchase options in single-leg revenue management. Transportation Sci. 51(4):1031–1045.Google Scholar
- (2013) Bandits with knapsacks. 2013 IEEE 54th Annual Sympos. Foundations Comput. Sci. (FOCS), (IEEE, Piscataway, NJ), 207–216.Google Scholar
- (2018) The big data newsvendor: practical insights from machine learning. Oper. Res. 67(1):90–108.Google Scholar
- (2017) Online decision-making with high-dimensional covariates. Working paper, The Wharton School, University of Pennsylvania, Philadelphia.Google Scholar
- (1985) Bandit Problems: Sequential Allocation of Experiments, Monographs on Statistics and Applied Probability (Springer, New York).Google Scholar
- (1989) Parallel and Distributed Computation: Numerical Methods, vol. 23 (Prentice Hall, Englewood Cliffs, NJ).Google Scholar
- (1995) Dynamic Programming and Optimal Control, vol. 1 (Athena Scientific, Belmont, MA).Google Scholar
- (1995) Neuro-dynamic programming: an overview. Proc. 1995 34th IEEE Conf. Decision Control, vol. 1. (IEEE, Piscataway, NJ), 560–564.Google Scholar
- (2014) From predictive to prescriptive analytics. Preprint, submitted February 22, https://arxiv.org/abs/1402.5481.Google Scholar
- (1996) Conservation laws, extended polymatroids and multiarmed bandit problems; a polyhedral approach to indexable systems. Math. Oper. Res. 21(2):257–306.Link, Google Scholar
- (2000) Restless bandits, linear programming relaxations, and a primal-dual index heuristic. Oper. Res. 48(1):80–90.Link, Google Scholar
- (2018) The voice of optimization. Preprint, submitted December 24, https://arxiv.org/abs/1812.09991.Google Scholar
- (2006) Pattern Recognition and Machine Learning (Springer, New York).Google Scholar
- (2007) Learning to rank: from pairwise approach to listwise approach. Proc. 24th Internat. Conf. Machine Learn. (ACM, New York), 129–136.Google Scholar
- (1993) Dynamic scheduling of a multiclass fluid network. Oper. Res. 41(6):1104–1115.Link, Google Scholar
- (2018) Interpretable optimal stopping. Working paper, INSEAD, Fountainbleau, France.Google Scholar
- (2018) What works best when? a systematic evaluation of heuristics for max-cut and qubo. INFORMS J. Comput. 30(3):608–624.Google Scholar
- (1979) The power approximation for computing (s, s) inventory policies. Management Sci. 25(8):777–786.Link, Google Scholar
- (1984) A revision of the power approximation for computing (s, s) policies. Management Sci. 30(5):618–622.Link, Google Scholar
- (2001) The Elements of Statistical Learning, vol. 1 (Springer, Berlin).Google Scholar
- (2017) Time prediction in congested healthcare systems using feature mining from event logs. Working paper, University in Aachen, Aachen, Germany.Google Scholar
- (1974) A dynamic allocation index for the sequential design of experiments. J. Gani, ed. Progress in Statistics (North-Holland, Amsterdam, Netherlands), 241–266.Google Scholar
- (2013) Monte Carlo Methods in Financial Engineering, vol. 53. (Springer, New York).Google Scholar
- (2011) Deep sparse rectifier neural networks. Lawrence N, Reid M, eds. Proc. 14th Internat. Conf. Artificial Intelligence Statist. (Microtome Publishing, Brookline, MA), 315–323.Google Scholar
- (2016) Deep Learning, vol. 1 (MIT Press, Cambridge, MA).Google Scholar
- (1999) Support vector learning for ordinal regression. Working paper, Technical University of Berlin, Berlin.Google Scholar
- (1992) Informational Aspects of Decentralized Resource Allocation. PhD thesis, University of California, Berkeley, Berkeley.Google Scholar
- (2007) How to rank with few errors. Proc. 39th Annual ACM Sympos. Theory Comput. (ACM, New York), 95–103.Google Scholar
- (2017a) Learning to run heuristics in tree search. 26th Internat. Joint Conf. Artificial Intelligence (ACM, New York), 659–666.Google Scholar
- , Song L (2017b) Learning combinatorial optimization algorithms over graphs. Proc. 31st Internat. Conf. Neural Inform. Processing Systems (ACM, New York), 6348–6358.Google Scholar
- (2014) Algorithms for multi-armed bandit problems. Preprint, submitted February 25, https://arxiv.org/abs/1402.6028.Google Scholar
- (2017) Interpretable & explorable approximations of black box models. Preprint, submitted July 4, https://arxiv.org/abs/1707.01154.Google Scholar
- (2015) Deep learning. Nature 521(7553):436–444.Google Scholar
- (2010) Empirical methods for two-echelon inventory management with service level constraints based on simulation-regression. Proc. 2010 Winter Simulation Conf. (IEEE, Piscataway, NJ), 1846–1859.Google Scholar
- (2013) An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inform. Sci. 250:113–141.Google Scholar
- (2008) Multi-armed bandit problems. Hero AO, Castañón DA, Cochrane D, Kastella K, eds. Foundations and Applications of Sensor Management (Springer, New York), 121–151.Google Scholar
- (1969) A queueing reward system with several customer classes. Management Sci. 16(3):234–245.Link, Google Scholar
- (2007) A (2/3) n3 fast-pivoting algorithm for the gittins index and optimal stopping of a markov chain. INFORMS J. Comput. 19(4):596–606.Link, Google Scholar
- (1999) On the optimality of the gittins index rule for multi-armed bandits with multiple plays. Math. Methods Oper. Res. 50(3):449–461.Google Scholar
- (2011) Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12(85):2825–2830.Google Scholar
- (2007) Approximate Dynamic Programming: Solving the Curses of Dimensionality, vol. 703 (John Wiley & Sons, New York).Google Scholar
- (2016) Perspectives of approximate dynamic programming. Ann. Oper. Res. 241(1-2):319–356.Google Scholar
- (2009) Bandit solutions provide unified ethical models for randomized clinical trials and comparative effectiveness research. Proc. Natl. Acad. Sci. USA 106(52):22387–22392.Google Scholar
- (2016) Why should I trust you?: Explaining the predictions of any classifier. Proc. 22nd ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 1135–1144.Google Scholar
- (2016) Simulation and the Monte Carlo Method, vol. 10 (John Wiley & Sons, New York).Google Scholar
- (1960) The optimality of (s, s) policies in dynamic inventory problems. Technical report, Yale University, New Haven, CT.Google Scholar
- (2011) Fundamentals of Supply Chain Theory (John Wiley & Sons, New York).Google Scholar
- (1998) Reinforcement Learning: An Introduction, vol. 1. (MIT Press, Cambridge, MA).Google Scholar
- (1994) A short proof of the gittins index theorem. Ann. Appl. Probab. 4(1):194–199.Google Scholar
- (2016) Supersparse linear integer models for optimized medical scoring systems. Machine Learn. 102(3):349–391.Google Scholar
- (1985) Extensions of the multiarmed bandit problem: the discounted case. IEEE Trans. Automat. Control. 30(5):426–439.Google Scholar
- (2015) Falling rule lists. Proc. 18th Internat. Conf. Artificial Intelligence Statist. (Microtome Publishing, Brookline, MA), 1013–1022.Google Scholar
- (2015) Or’s of and’s for interpretable classification, with application to context-aware recommender systems. Working paper, University of Oklahoma, Norman.Google Scholar
- (1988) Branching bandit processes. Probab. Engrg. Inform. Sci. 2(03):269–278.Google Scholar
- (1980) Multi-armed bandits and the gittins index. J. Roy. Statist. Soc. B 42(2):143–149.Google Scholar
- (1988) Restless bandits: Activity allocation in a changing world. J. Appl. Probab. 25(A):287–298.Google Scholar
- Wikipedia (2018) Markov decision process. Accessed March 22, 2018, https://en.wikipedia.org/wiki/Markov_decision_process.Google Scholar
- (2018) Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists (O’Reilly Media, Newton, MA).Google Scholar

