Mining Optimal Policies: A Pattern Recognition Approach to Model Analysis

Published Online:https://doi.org/10.1287/ijoo.2019.0026

References

  • Abbeel P, Ng AY (2004) Apprenticeship learning via inverse reinforcement learning. Proc. 21st Internat. Conf. Machine Learn. (ACM, New York), 1.Google Scholar
  • Anantharam V, Varaiya P, Walrand J (1987a) Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-part i: iid rewards. IEEE Trans. Automat. Control. 32(11):968–976.Google Scholar
  • Anantharam V, Varaiya P, Walrand J (1987b) Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-part ii: Markovian rewards. IEEE Trans. Automat. Control. 32(11):977–982.Google Scholar
  • Aydın N, Birbil S, Frenk JBG, Noyan N (2012) Single-leg airline revenue management with overbooking. Transportation Sci. 47(4):560–583.LinkGoogle Scholar
  • Aydın N, Birbil SI, Topaloğlu H (2016) Delayed purchase options in single-leg revenue management. Transportation Sci. 51(4):1031–1045.Google Scholar
  • Badanidiyuru A, Kleinberg R, Slivkins A (2013) Bandits with knapsacks. 2013 IEEE 54th Annual Sympos. Foundations Comput. Sci. (FOCS), (IEEE, Piscataway, NJ), 207–216.Google Scholar
  • Ban G, Rudin C (2018) The big data newsvendor: practical insights from machine learning. Oper. Res. 67(1):90–108.Google Scholar
  • Bastani H, Bayati M (2017) Online decision-making with high-dimensional covariates. Working paper, The Wharton School, University of Pennsylvania, Philadelphia.Google Scholar
  • Berry DA, Fristedt B (1985) Bandit Problems: Sequential Allocation of Experiments, Monographs on Statistics and Applied Probability (Springer, New York).Google Scholar
  • Bertsekas DP, Tsitsiklis JN (1989) Parallel and Distributed Computation: Numerical Methods, vol. 23 (Prentice Hall, Englewood Cliffs, NJ).Google Scholar
  • Bertsekas DP (1995) Dynamic Programming and Optimal Control, vol. 1 (Athena Scientific, Belmont, MA).Google Scholar
  • Bertsekas DP, Tsitsiklis JN (1995) Neuro-dynamic programming: an overview. Proc. 1995 34th IEEE Conf. Decision Control, vol. 1. (IEEE, Piscataway, NJ), 560–564.Google Scholar
  • Bertsimas D, Kallus N (2014) From predictive to prescriptive analytics. Preprint, submitted February 22, https://arxiv.org/abs/1402.5481.Google Scholar
  • Bertsimas D, Niño-Mora J (1996) Conservation laws, extended polymatroids and multiarmed bandit problems; a polyhedral approach to indexable systems. Math. Oper. Res. 21(2):257–306.LinkGoogle Scholar
  • Bertsimas D, Niño-Mora J (2000) Restless bandits, linear programming relaxations, and a primal-dual index heuristic. Oper. Res. 48(1):80–90.LinkGoogle Scholar
  • Bertsimas D, Stellato B (2018) The voice of optimization. Preprint, submitted December 24, https://arxiv.org/abs/1812.09991.Google Scholar
  • Bishop CM (2006) Pattern Recognition and Machine Learning (Springer, New York).Google Scholar
  • Cao Z, Qin T, Liu T-Y, Tsai M-F, Li H (2007) Learning to rank: from pairwise approach to listwise approach. Proc. 24th Internat. Conf. Machine Learn. (ACM, New York), 129–136.Google Scholar
  • Chen H, Yao DD (1993) Dynamic scheduling of a multiclass fluid network. Oper. Res. 41(6):1104–1115.LinkGoogle Scholar
  • Ciocan D, Misic V (2018) Interpretable optimal stopping. Working paper, INSEAD, Fountainbleau, France.Google Scholar
  • Dunning I, Gupta S, Silberholz J (2018) What works best when? a systematic evaluation of heuristics for max-cut and qubo. INFORMS J. Comput. 30(3):608–624.Google Scholar
  • Ehrhardt R (1979) The power approximation for computing (s, s) inventory policies. Management Sci. 25(8):777–786.LinkGoogle Scholar
  • Ehrhardt R, Mosier C (1984) A revision of the power approximation for computing (s, s) policies. Management Sci. 30(5):618–622.LinkGoogle Scholar
  • Friedman J, Hastie T, Tibshirani R (2001) The Elements of Statistical Learning, vol. 1 (Springer, Berlin).Google Scholar
  • Gal A, Mandelbaum A, Senderovich A (2017) Time prediction in congested healthcare systems using feature mining from event logs. Working paper, University in Aachen, Aachen, Germany.Google Scholar
  • Gittins JC, Jones DM (1974) A dynamic allocation index for the sequential design of experiments. J. Gani, ed. Progress in Statistics (North-Holland, Amsterdam, Netherlands), 241–266.Google Scholar
  • Glasserman P (2013) Monte Carlo Methods in Financial Engineering, vol. 53. (Springer, New York).Google Scholar
  • Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. Lawrence N, Reid M, eds. Proc. 14th Internat. Conf. Artificial Intelligence Statist. (Microtome Publishing, Brookline, MA), 315–323.Google Scholar
  • Goodfellow I, Bengio Y, Courville A (2016) Deep Learning, vol. 1 (MIT Press, Cambridge, MA).Google Scholar
  • Herbrich R, Graepel T, Obermayer K (1999) Support vector learning for ordinal regression. Working paper, Technical University of Berlin, Berlin.Google Scholar
  • Ishikida T (1992) Informational Aspects of Decentralized Resource Allocation. PhD thesis, University of California, Berkeley, Berkeley.Google Scholar
  • Kenyon-Mathieu C, Schudy W (2007) How to rank with few errors. Proc. 39th Annual ACM Sympos. Theory Comput. (ACM, New York), 95–103.Google Scholar
  • Khalil EB, Dilkina B, Nemhauser GL, Ahmed S, Shao Y (2017a) Learning to run heuristics in tree search. 26th Internat. Joint Conf. Artificial Intelligence (ACM, New York), 659–666.Google Scholar
  • Khalil E, Dai H, Zhang Y, Dilkina B, Song L (2017b) Learning combinatorial optimization algorithms over graphs. Proc. 31st Internat. Conf. Neural Inform. Processing Systems (ACM, New York), 6348–6358.Google Scholar
  • Kuleshov V, Precup D (2014) Algorithms for multi-armed bandit problems. Preprint, submitted February 25, https://arxiv.org/abs/1402.6028.Google Scholar
  • Lakkaraju H, Kamar E, Caruana R, Leskovec J (2017) Interpretable & explorable approximations of black box models. Preprint, submitted July 4, https://arxiv.org/abs/1707.01154.Google Scholar
  • LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444.Google Scholar
  • Li L, Sourirajan K, Katircioglu K (2010) Empirical methods for two-echelon inventory management with service level constraints based on simulation-regression. Proc. 2010 Winter Simulation Conf. (IEEE, Piscataway, NJ), 1846–1859.Google Scholar
  • López V, Fernández A, García S, Palade V, Herrera F (2013) An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inform. Sci. 250:113–141.Google Scholar
  • Mahajan A, Teneketzis D (2008) Multi-armed bandit problems. Hero AO, Castañón DA, Cochrane D, Kastella K, eds. Foundations and Applications of Sensor Management (Springer, New York), 121–151.Google Scholar
  • Miller BL (1969) A queueing reward system with several customer classes. Management Sci. 16(3):234–245.LinkGoogle Scholar
  • Niño-Mora J (2007) A (2/3) n3 fast-pivoting algorithm for the gittins index and optimal stopping of a markov chain. INFORMS J. Comput. 19(4):596–606.LinkGoogle Scholar
  • Pandelis DG, Teneketzis D (1999) On the optimality of the gittins index rule for multi-armed bandits with multiple plays. Math. Methods Oper. Res. 50(3):449–461.Google Scholar
  • Pedregosa F, Varoquaux GL, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, et al.. (2011) Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12(85):2825–2830.Google Scholar
  • Powell WB (2007) Approximate Dynamic Programming: Solving the Curses of Dimensionality, vol. 703 (John Wiley & Sons, New York).Google Scholar
  • Powell WB (2016) Perspectives of approximate dynamic programming. Ann. Oper. Res. 241(1-2):319–356.Google Scholar
  • Press WH (2009) Bandit solutions provide unified ethical models for randomized clinical trials and comparative effectiveness research. Proc. Natl. Acad. Sci. USA 106(52):22387–22392.Google Scholar
  • Ribeiro MT, Singh S, Guestrin C (2016) Why should I trust you?: Explaining the predictions of any classifier. Proc. 22nd ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 1135–1144.Google Scholar
  • Rubinstein RY, Kroese DP (2016) Simulation and the Monte Carlo Method, vol. 10 (John Wiley & Sons, New York).Google Scholar
  • Scarf H (1960) The optimality of (s, s) policies in dynamic inventory problems. Technical report, Yale University, New Haven, CT.Google Scholar
  • Snyder LV, Shen ZJ (2011) Fundamentals of Supply Chain Theory (John Wiley & Sons, New York).Google Scholar
  • Sutton RS, Barto AG (1998) Reinforcement Learning: An Introduction, vol. 1. (MIT Press, Cambridge, MA).Google Scholar
  • Tsitsiklis JN (1994) A short proof of the gittins index theorem. Ann. Appl. Probab. 4(1):194–199.Google Scholar
  • Ustun B, Rudin C (2016) Supersparse linear integer models for optimized medical scoring systems. Machine Learn. 102(3):349–391.Google Scholar
  • Varaiya P, Walrand J, Buyukkoc C (1985) Extensions of the multiarmed bandit problem: the discounted case. IEEE Trans. Automat. Control. 30(5):426–439.Google Scholar
  • Wang F, Rudin C (2015) Falling rule lists. Proc. 18th Internat. Conf. Artificial Intelligence Statist. (Microtome Publishing, Brookline, MA), 1013–1022.Google Scholar
  • Wang T, Rudin C, Doshi-Velez F, Liu Y, Klampfl E, MacNeille P (2015) Or’s of and’s for interpretable classification, with application to context-aware recommender systems. Working paper, University of Oklahoma, Norman.Google Scholar
  • Weiss G (1988) Branching bandit processes. Probab. Engrg. Inform. Sci. 2(03):269–278.Google Scholar
  • Whittle P (1980) Multi-armed bandits and the gittins index. J. Roy. Statist. Soc. B 42(2):143–149.Google Scholar
  • Whittle P (1988) Restless bandits: Activity allocation in a changing world. J. Appl. Probab. 25(A):287–298.Google Scholar
  • Wikipedia (2018) Markov decision process. Accessed March 22, 2018, https://en.wikipedia.org/wiki/Markov_decision_process.Google Scholar
  • Zheng A, Casari A (2018) Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists (O’Reilly Media, Newton, MA).Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.