Learning in Combinatorial Optimization: What and How to Explore
Published Online:19 Jun 2020https://doi.org/10.1287/opre.2019.1926
References
- (2008) Competing in the dark: An efficient algorithm for bandit linear optimization. Accessed November 27, 2017, http://repository.upenn.edu/statistics_papers/110.Google Scholar
- (2013) Mixed integer programming: Analyzing 12 years of progress. Jünger M, Reinelt G, eds. Facets of Combinatorial Optimization: Festschrift for Martin Grötschel (Springer, Berlin), 449–481.Crossref, Google Scholar
- (1995) The continuum-armed bandit problem. SIAM J. Control Optim. 33(6):1926–1951.Crossref, Google Scholar
- (1990) Multi-armed bandit problems with multiple plays and switching cost. Stochastics 29(4):437–459.Google Scholar
- (1987) Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays. Part I. IID rewards. Automatic Control IEEE Trans. 32(11):968–976.Crossref, Google Scholar
- (2011) The Traveling Salesman Problem: A Computational Study, Princeton Series in Applied Mathematics (Princeton University Press, Princeton, NJ).Google Scholar
- (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2–3):235–256.Crossref, Google Scholar
- (2003) The non-stochastic multi-armed bandit problem. SIAM J. Comput. 32(1):48–77.Crossref, Google Scholar
- (2004) Adaptive routing with end-to-end feedback: Distributed learning and geometric approaches. Proc. 36th Annual ACM Sympos. Theory Comput. (ACM, New York), 45–53.Google Scholar
- (1996) A dynamic subgradient-based branch-and-bound procedure for set covering. Oper. Res. 44(6):875–890.Link, Google Scholar
- (2018) A dynamic clustering approach to data-driven assortment personalization. Management Sci. 65(5):2095–2115.Google Scholar
- (1985) Bandit Problems (Chapman and Hall, London).Crossref, Google Scholar
- (2012) A brief history of linear and mixed-integer programming computation. Documenta Math. Extra Volume ISMP(2012):107–121.Google Scholar
- (2011) X-armed bandits. J. Machine Learn. Res. 12:1655–1695.Google Scholar
- (2007) Dynamic assortment with demand learning for seasonal consumer goods. Management Sci. 53(2):276–292.Link, Google Scholar
- (2013) Imposing connectivity constraints in forest planning models. Oper. Res. 61(4):824–836.Link, Google Scholar
- (2006) Prediction, Learning, and Games (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- (2012) Combinatorial bandits. J. Comput. System Sci. 78(5):1404–1422.Crossref, Google Scholar
- (2013) Combinatorial multi-armed bandit: General framework, results and applications. Proc. 30th Internat. Conf. Machine Learn PMLR 28(1):151–159.Google Scholar
- (1998) Combinatorial Optimization (John Wiley & Sons, Inc., New York).Google Scholar
- (2006) Elements of Information Theory (John Wiley & Sons, Inc., Hoboken, NJ).Google Scholar
- (2008) Stochastic linear optimization under bandit feedback. Proc. 21st Annual Conf. Learn. Theory, Helsinki, Finland, 355–366.Google Scholar
- (1977) The set-covering problem: A new implicit enumeration algorithm. Oper. Res. 25(5):760–772.Link, Google Scholar
- (2011) Heuristics in mixed integer programming. Cochran J, ed. Wiley Encyclopedia of Operations Research and Management Science, vol. 8 (Wiley, New York), 738–774.Crossref, Google Scholar
- (2012) Combinatorial network optimization with unknown variables: Multi-armed bandits with linear rewards and individual observations. IEEE/ACM Trans. Networking 20(5):1466–1478.Crossref, Google Scholar
- . (2016) The scip optimization suite 3.2. Technical Report 15-60, ZIB, Berlin.Google Scholar
- (1979) Bandit processes and dynamic allocation rules. J. Roy. Statist. Soc. Ser. A 41(2):148–177.Google Scholar
- . (2017) The scip optimization suite 5.0. Technical Report 17-61, ZIB, Berlin.Google Scholar
- (1993) Solving airline crew scheduling problems by branch-and-cut. Management Sci. 39(6):657–682.Link, Google Scholar
- (2010) 50 Years of Integer Programming 1958-2008: From the Early Years to the State-of-the-Art (Springer-Verlag, New York).Crossref, Google Scholar
- (2008) Multi-armed bandits in metric spaces. Proc. 40th Annual ACM Sympos. Theory Comput. (ACM, New York), 681–690.Google Scholar
- (1998) Solving steiner tree problems in graphs to optimality. Networks 32(3):207–232.Crossref, Google Scholar
- (1997) Minimax lower bounds for the two-armed bandit problem. Proc. 36th IEEE Conf. Decision Control, vol. 3 (IEEE, Piscataway, NJ), 2293–2297.Google Scholar
- (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.Crossref, Google Scholar
- (2012) Stochastic online learning for network optimization under random unknown weights. Working paper, University of California, Davis, Davis.Google Scholar
- (1995) Optimal trees. Ball MO, Magnanti TL, Monma CL, Nemhauser GL, eds. Handbooks in Operational Research and Management Science, vol. 7 (North-Holland, Amsterdam), 503–615.Google Scholar
- . (2017) The SCIP optimization suite 4.0. Technical Report 17-12, ZIB, Berlin.Google Scholar
- (1991) Using separation algorithms to generate mixed integer model reformulations. Oper. Res. Lett. 10(3):119–128.Crossref, Google Scholar
- (2009) A structured multiarmed bandit problem and the greedy policy. Automatic Control IEEE Trans. 54(12):2787–2802.Crossref, Google Scholar
- (1952) Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58(5):527–535.Crossref, Google Scholar
- (2013) Some 0/1 polytopes need exponential size extended formulations. Math. Programming 142(1–2):255–268.Crossref, Google Scholar
- (2017) The matching polytope has exponential extension complexity. J. ACM 64(6):Article 41.Crossref, Google Scholar
- (2010) Linearly parameterized bandits. Math. Oper. Res. 35(2):395–411.Link, Google Scholar
- (2010) Dynamic assortment optimization with a multinomial logit choice model and capacity constraint. Oper. Res. 58(6):1666–1680.Link, Google Scholar
- (2009) The knowledge gradient algorithm for online subset selection. Proc. 2009 IEEE Internat. Sympos. Adaptive Dynamic Programming Reinforcement Learn. (IEEE, Piscataway, NJ), 137–144.Google Scholar
- (2011) Information collection on a graph. Oper. Res. 59(1):188–201.Link, Google Scholar
- (2012) The knowledge gradient algorithm for a general class of online learning problems. Oper. Res. 60(1):180–195.Link, Google Scholar
- (2013) Optimal dynamic assortment planning with demand learning. Manufacturing Service Oper. Management 15(3):387–404.Link, Google Scholar
- (2003) Combinatorial Optimization—Polyhedra and Efficiency (Springer, Berlin).Google Scholar
- (1999) Enumerative Combinatorics, vol. 2, Cambridge Studies in Advanced Mathematics (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3–4):285–294.Crossref, Google Scholar
- (2012) Fitting piecewise linear continuous functions. Eur. J. Oper. Res. 219(1):86–95.Crossref, Google Scholar
- (2003) A compact linear program for testing optimality of perfect matchings. Oper. Res. Lett. 31(6):429–434.Crossref, Google Scholar
- (2015) Mixed integer linear programming formulation techniques. SIAM Rev. 57(1):3–57.Crossref, Google Scholar
- (1982) Optimization over Time, vol. 1 (John Wiley & Sons Ltd., Hoboken, NJ).Google Scholar
- (2011) The Design of Approximation Algorithms (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar

