Free Access

Mining Optimal Policies: A Pattern Recognition Approach to Model Analysis

Fernanda Bravo
Fernanda Bravo
https://orcid.org/0000-0002-4625-7894
Anderson School of Management, University of California, Los Angeles, Los Angeles, California 90024;
Search for more papers by this author
,
Yaron Shaposhnik
Corresponding Author
Yaron Shaposhnik
https://orcid.org/0000-0002-6105-0154
Simon Business School, University of Rochester, Rochester, New York 14627
Search for more papers by this author

Fernanda Bravo

https://orcid.org/0000-0002-4625-7894

Anderson School of Management, University of California, Los Angeles, Los Angeles, California 90024;

Search for more papers by this author

Yaron Shaposhnik

Corresponding Author

Yaron Shaposhnik

https://orcid.org/0000-0002-6105-0154

Simon Business School, University of Rochester, Rochester, New York 14627

Search for more papers by this author

Published Online:21 May 2020https://doi.org/10.1287/ijoo.2019.0026

References

Abbeel P, Ng AY (2004) Apprenticeship learning via inverse reinforcement learning. Proc. 21st Internat. Conf. Machine Learn. (ACM, New York), 1.Google Scholar
Anantharam V, Varaiya P, Walrand J (1987a) Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-part i: iid rewards. IEEE Trans. Automat. Control. 32(11):968–976.Google Scholar
Anantharam V, Varaiya P, Walrand J (1987b) Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-part ii: Markovian rewards. IEEE Trans. Automat. Control. 32(11):977–982.Google Scholar
Aydın N, Birbil S, Frenk JBG, Noyan N (2012) Single-leg airline revenue management with overbooking. Transportation Sci. 47(4):560–583.Link, Google Scholar
Aydın N, Birbil SI, Topaloğlu H (2016) Delayed purchase options in single-leg revenue management. Transportation Sci. 51(4):1031–1045.Google Scholar
Badanidiyuru A, Kleinberg R, Slivkins A (2013) Bandits with knapsacks. 2013 IEEE 54th Annual Sympos. Foundations Comput. Sci. (FOCS), (IEEE, Piscataway, NJ), 207–216.Google Scholar
Ban G, Rudin C (2018) The big data newsvendor: practical insights from machine learning. Oper. Res. 67(1):90–108.Google Scholar
Bastani H, Bayati M (2017) Online decision-making with high-dimensional covariates. Working paper, The Wharton School, University of Pennsylvania, Philadelphia.Google Scholar
Berry DA, Fristedt B (1985) Bandit Problems: Sequential Allocation of Experiments, Monographs on Statistics and Applied Probability (Springer, New York).Google Scholar
Bertsekas DP, Tsitsiklis JN (1989) Parallel and Distributed Computation: Numerical Methods, vol. 23 (Prentice Hall, Englewood Cliffs, NJ).Google Scholar
Bertsekas DP (1995) Dynamic Programming and Optimal Control, vol. 1 (Athena Scientific, Belmont, MA).Google Scholar
Bertsekas DP, Tsitsiklis JN (1995) Neuro-dynamic programming: an overview. Proc. 1995 34th IEEE Conf. Decision Control, vol. 1. (IEEE, Piscataway, NJ), 560–564.Google Scholar
Bertsimas D, Kallus N (2014) From predictive to prescriptive analytics. Preprint, submitted February 22, https://arxiv.org/abs/1402.5481.Google Scholar
Bertsimas D, Niño-Mora J (1996) Conservation laws, extended polymatroids and multiarmed bandit problems; a polyhedral approach to indexable systems. Math. Oper. Res. 21(2):257–306.Link, Google Scholar
Bertsimas D, Niño-Mora J (2000) Restless bandits, linear programming relaxations, and a primal-dual index heuristic. Oper. Res. 48(1):80–90.Link, Google Scholar
Bertsimas D, Stellato B (2018) The voice of optimization. Preprint, submitted December 24, https://arxiv.org/abs/1812.09991.Google Scholar
Bishop CM (2006) Pattern Recognition and Machine Learning (Springer, New York).Google Scholar
Cao Z, Qin T, Liu T-Y, Tsai M-F, Li H (2007) Learning to rank: from pairwise approach to listwise approach. Proc. 24th Internat. Conf. Machine Learn. (ACM, New York), 129–136.Google Scholar
Chen H, Yao DD (1993) Dynamic scheduling of a multiclass fluid network. Oper. Res. 41(6):1104–1115.Link, Google Scholar
Ciocan D, Misic V (2018) Interpretable optimal stopping. Working paper, INSEAD, Fountainbleau, France.Google Scholar
Dunning I, Gupta S, Silberholz J (2018) What works best when? a systematic evaluation of heuristics for max-cut and qubo. INFORMS J. Comput. 30(3):608–624.Google Scholar
Ehrhardt R (1979) The power approximation for computing (s, s) inventory policies. Management Sci. 25(8):777–786.Link, Google Scholar
Ehrhardt R, Mosier C (1984) A revision of the power approximation for computing (s, s) policies. Management Sci. 30(5):618–622.Link, Google Scholar
Friedman J, Hastie T, Tibshirani R (2001) The Elements of Statistical Learning, vol. 1 (Springer, Berlin).Google Scholar
Gal A, Mandelbaum A, Senderovich A (2017) Time prediction in congested healthcare systems using feature mining from event logs. Working paper, University in Aachen, Aachen, Germany.Google Scholar
Gittins JC, Jones DM (1974) A dynamic allocation index for the sequential design of experiments. J. Gani, ed. Progress in Statistics (North-Holland, Amsterdam, Netherlands), 241–266.Google Scholar
Glasserman P (2013) Monte Carlo Methods in Financial Engineering, vol. 53. (Springer, New York).Google Scholar
Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. Lawrence N, Reid M, eds. Proc. 14th Internat. Conf. Artificial Intelligence Statist. (Microtome Publishing, Brookline, MA), 315–323.Google Scholar
Goodfellow I, Bengio Y, Courville A (2016) Deep Learning, vol. 1 (MIT Press, Cambridge, MA).Google Scholar
Herbrich R, Graepel T, Obermayer K (1999) Support vector learning for ordinal regression. Working paper, Technical University of Berlin, Berlin.Google Scholar
Ishikida T (1992) Informational Aspects of Decentralized Resource Allocation. PhD thesis, University of California, Berkeley, Berkeley.Google Scholar
Kenyon-Mathieu C, Schudy W (2007) How to rank with few errors. Proc. 39th Annual ACM Sympos. Theory Comput. (ACM, New York), 95–103.Google Scholar
Khalil EB, Dilkina B, Nemhauser GL, Ahmed S, Shao Y (2017a) Learning to run heuristics in tree search. 26th Internat. Joint Conf. Artificial Intelligence (ACM, New York), 659–666.Google Scholar
Khalil E, Dai H, Zhang Y, Dilkina B, Song L (2017b) Learning combinatorial optimization algorithms over graphs. Proc. 31st Internat. Conf. Neural Inform. Processing Systems (ACM, New York), 6348–6358.Google Scholar
Kuleshov V, Precup D (2014) Algorithms for multi-armed bandit problems. Preprint, submitted February 25, https://arxiv.org/abs/1402.6028.Google Scholar
Lakkaraju H, Kamar E, Caruana R, Leskovec J (2017) Interpretable & explorable approximations of black box models. Preprint, submitted July 4, https://arxiv.org/abs/1707.01154.Google Scholar
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444.Google Scholar
Li L, Sourirajan K, Katircioglu K (2010) Empirical methods for two-echelon inventory management with service level constraints based on simulation-regression. Proc. 2010 Winter Simulation Conf. (IEEE, Piscataway, NJ), 1846–1859.Google Scholar
López V, Fernández A, García S, Palade V, Herrera F (2013) An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inform. Sci. 250:113–141.Google Scholar
Mahajan A, Teneketzis D (2008) Multi-armed bandit problems. Hero AO, Castañón DA, Cochrane D, Kastella K, eds. Foundations and Applications of Sensor Management (Springer, New York), 121–151.Google Scholar
Miller BL (1969) A queueing reward system with several customer classes. Management Sci. 16(3):234–245.Link, Google Scholar
Niño-Mora J (2007) A (2/3) n3 fast-pivoting algorithm for the gittins index and optimal stopping of a markov chain. INFORMS J. Comput. 19(4):596–606.Link, Google Scholar
Pandelis DG, Teneketzis D (1999) On the optimality of the gittins index rule for multi-armed bandits with multiple plays. Math. Methods Oper. Res. 50(3):449–461.Google Scholar
Pedregosa F, Varoquaux GL, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, et al.. (2011) Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12(85):2825–2830.Google Scholar
Powell WB (2007) Approximate Dynamic Programming: Solving the Curses of Dimensionality, vol. 703 (John Wiley & Sons, New York).Google Scholar
Powell WB (2016) Perspectives of approximate dynamic programming. Ann. Oper. Res. 241(1-2):319–356.Google Scholar
Press WH (2009) Bandit solutions provide unified ethical models for randomized clinical trials and comparative effectiveness research. Proc. Natl. Acad. Sci. USA 106(52):22387–22392.Google Scholar
Ribeiro MT, Singh S, Guestrin C (2016) Why should I trust you?: Explaining the predictions of any classifier. Proc. 22nd ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 1135–1144.Google Scholar
Rubinstein RY, Kroese DP (2016) Simulation and the Monte Carlo Method, vol. 10 (John Wiley & Sons, New York).Google Scholar
Scarf H (1960) The optimality of (s, s) policies in dynamic inventory problems. Technical report, Yale University, New Haven, CT.Google Scholar
Snyder LV, Shen ZJ (2011) Fundamentals of Supply Chain Theory (John Wiley & Sons, New York).Google Scholar
Sutton RS, Barto AG (1998) Reinforcement Learning: An Introduction, vol. 1. (MIT Press, Cambridge, MA).Google Scholar
Tsitsiklis JN (1994) A short proof of the gittins index theorem. Ann. Appl. Probab. 4(1):194–199.Google Scholar
Ustun B, Rudin C (2016) Supersparse linear integer models for optimized medical scoring systems. Machine Learn. 102(3):349–391.Google Scholar
Varaiya P, Walrand J, Buyukkoc C (1985) Extensions of the multiarmed bandit problem: the discounted case. IEEE Trans. Automat. Control. 30(5):426–439.Google Scholar
Wang F, Rudin C (2015) Falling rule lists. Proc. 18th Internat. Conf. Artificial Intelligence Statist. (Microtome Publishing, Brookline, MA), 1013–1022.Google Scholar
Wang T, Rudin C, Doshi-Velez F, Liu Y, Klampfl E, MacNeille P (2015) Or’s of and’s for interpretable classification, with application to context-aware recommender systems. Working paper, University of Oklahoma, Norman.Google Scholar
Weiss G (1988) Branching bandit processes. Probab. Engrg. Inform. Sci. 2(03):269–278.Google Scholar
Whittle P (1980) Multi-armed bandits and the gittins index. J. Roy. Statist. Soc. B 42(2):143–149.Google Scholar
Whittle P (1988) Restless bandits: Activity allocation in a changing world. J. Appl. Probab. 25(A):287–298.Google Scholar
Wikipedia (2018) Markov decision process. Accessed March 22, 2018, https://en.wikipedia.org/wiki/Markov_decision_process.Google Scholar
Zheng A, Casari A (2018) Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists (O’Reilly Media, Newton, MA).Google Scholar

cover image INFORMS Journal on Optimization

Volume 2, Issue 3

Summer 2020

Pages 145-228, C3

Article Information

Supplemental Material

Metrics

Information

Received:December 26, 2018
Accepted:August 11, 2019
Published Online:May 21, 2020

Cite as

Fernanda Bravo, Yaron Shaposhnik (2020) Mining Optimal Policies: A Pattern Recognition Approach to Model Analysis. INFORMS Journal on Optimization 2(3):145-166.

https://doi.org/10.1287/ijoo.2019.0026

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Mining Optimal Policies: A Pattern Recognition Approach to Model Analysis

References

Volume 2, Issue 3

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News