Learning in Combinatorial Optimization: What and How to Explore

Published Online:https://doi.org/10.1287/opre.2019.1926

References

  • Abernethy JD, Hazan E, Rakhlin A (2008) Competing in the dark: An efficient algorithm for bandit linear optimization. Accessed November 27, 2017, http://repository.upenn.edu/statistics_papers/110.Google Scholar
  • Achterberg T, Wunderling R (2013) Mixed integer programming: Analyzing 12 years of progress. Jünger M, Reinelt G, eds. Facets of Combinatorial Optimization: Festschrift for Martin Grötschel (Springer, Berlin), 449–481.CrossrefGoogle Scholar
  • Agrawal R (1995) The continuum-armed bandit problem. SIAM J. Control Optim. 33(6):1926–1951.CrossrefGoogle Scholar
  • Agrawal R, Hegde M, Teneketzis D (1990) Multi-armed bandit problems with multiple plays and switching cost. Stochastics 29(4):437–459.Google Scholar
  • Anantharam V, Varaiya P, Walrand J (1987) Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays. Part I. IID rewards. Automatic Control IEEE Trans. 32(11):968–976.CrossrefGoogle Scholar
  • Applegate D, Bixby R, Chvátal V, Cook W (2011) The Traveling Salesman Problem: A Computational Study, Princeton Series in Applied Mathematics (Princeton University Press, Princeton, NJ).Google Scholar
  • Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2–3):235–256.CrossrefGoogle Scholar
  • Auer P, Cesa-Bianchi N, Freund Y, Schapire RE (2003) The non-stochastic multi-armed bandit problem. SIAM J. Comput. 32(1):48–77.CrossrefGoogle Scholar
  • Awerbuch B, Kleinberg RD (2004) Adaptive routing with end-to-end feedback: Distributed learning and geometric approaches. Proc. 36th Annual ACM Sympos. Theory Comput. (ACM, New York), 45–53.Google Scholar
  • Balas E, Carrera MC (1996) A dynamic subgradient-based branch-and-bound procedure for set covering. Oper. Res. 44(6):875–890.LinkGoogle Scholar
  • Bernstein F, Modaresi S, Sauré D (2018) A dynamic clustering approach to data-driven assortment personalization. Management Sci. 65(5):2095–2115.Google Scholar
  • Berry D, Fristedt B (1985) Bandit Problems (Chapman and Hall, London).CrossrefGoogle Scholar
  • Bixby RE (2012) A brief history of linear and mixed-integer programming computation. Documenta Math. Extra Volume ISMP(2012):107–121.Google Scholar
  • Bubeck S, Munos R, Stoltz G, Szepesvári C (2011) X-armed bandits. J. Machine Learn. Res. 12:1655–1695.Google Scholar
  • Caro F, Gallien J (2007) Dynamic assortment with demand learning for seasonal consumer goods. Management Sci. 53(2):276–292.LinkGoogle Scholar
  • Carvajal R, Constantino M, Goycoolea M, Vielma JP, Weintraub A (2013) Imposing connectivity constraints in forest planning models. Oper. Res. 61(4):824–836.LinkGoogle Scholar
  • Cesa-Bianchi N, Lugosi G (2006) Prediction, Learning, and Games (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Cesa-Bianchi N, Lugosi G (2012) Combinatorial bandits. J. Comput. System Sci. 78(5):1404–1422.CrossrefGoogle Scholar
  • Chen W, Wang Y, Yuan Y (2013) Combinatorial multi-armed bandit: General framework, results and applications. Proc. 30th Internat. Conf. Machine Learn PMLR 28(1):151–159.Google Scholar
  • Cook WJ, Cunningham WH, Pulleyblank WR, Schrijver A (1998) Combinatorial Optimization (John Wiley & Sons, Inc., New York).Google Scholar
  • Cover T, Thomas J (2006) Elements of Information Theory (John Wiley & Sons, Inc., Hoboken, NJ).Google Scholar
  • Dani V, Hayes TP, Kakade SM (2008) Stochastic linear optimization under bandit feedback. Proc. 21st Annual Conf. Learn. Theory, Helsinki, Finland, 355–366.Google Scholar
  • Etcheberry J (1977) The set-covering problem: A new implicit enumeration algorithm. Oper. Res. 25(5):760–772.LinkGoogle Scholar
  • Fischetti M, Lodi A (2011) Heuristics in mixed integer programming. Cochran J, ed. Wiley Encyclopedia of Operations Research and Management Science, vol. 8 (Wiley, New York), 738–774.CrossrefGoogle Scholar
  • Gai Y, Krishnamachari B, Jain R (2012) Combinatorial network optimization with unknown variables: Multi-armed bandits with linear rewards and individual observations. IEEE/ACM Trans. Networking 20(5):1466–1478.CrossrefGoogle Scholar
  • Gamrath G, Fischer T, Gally T, Gleixner AM, Hendel G, Koch T, Maher SJ, Müller B, Pfetsch ME, Puchert C, et al.. (2016) The scip optimization suite 3.2. Technical Report 15-60, ZIB, Berlin.Google Scholar
  • Gittins J (1979) Bandit processes and dynamic allocation rules. J. Roy. Statist. Soc. Ser. A 41(2):148–177.Google Scholar
  • Gleixner A, Eifler L, Gally T, Gamrath G, Gemander P, Gottwald RL, Hendel G, et al.. (2017) The scip optimization suite 5.0. Technical Report 17-61, ZIB, Berlin.Google Scholar
  • Hoffman KL, Padberg M (1993) Solving airline crew scheduling problems by branch-and-cut. Management Sci. 39(6):657–682.LinkGoogle Scholar
  • Jünger M, Liebling T, Naddef D, Nemhauser G, Pulleyblank W, Reinelt G, Rinaldi G, Wolsey L (2010) 50 Years of Integer Programming 1958-2008: From the Early Years to the State-of-the-Art (Springer-Verlag, New York).CrossrefGoogle Scholar
  • Kleinberg R, Slivkins A, Upfal E (2008) Multi-armed bandits in metric spaces. Proc. 40th Annual ACM Sympos. Theory Comput. (ACM, New York), 681–690.Google Scholar
  • Koch T, Martin A (1998) Solving steiner tree problems in graphs to optimality. Networks 32(3):207–232.CrossrefGoogle Scholar
  • Kulkarni S, Lugosi G (1997) Minimax lower bounds for the two-armed bandit problem. Proc. 36th IEEE Conf. Decision Control, vol. 3 (IEEE, Piscataway, NJ), 2293–2297.Google Scholar
  • Lai TL, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.CrossrefGoogle Scholar
  • Liu K, Vakili S, Zhao Q (2012) Stochastic online learning for network optimization under random unknown weights. Working paper, University of California, Davis, Davis.Google Scholar
  • Magnanti TL, Wolsey LA (1995) Optimal trees. Ball MO, Magnanti TL, Monma CL, Nemhauser GL, eds. Handbooks in Operational Research and Management Science, vol. 7 (North-Holland, Amsterdam), 503–615.Google Scholar
  • Maher SJ, Fischer T, Gally T, Gamrath G, Gleixner A, Gottwald RL, Hendel G, Koch T, Lübbecke ME, Miltenberger M, et al.. (2017) The SCIP optimization suite 4.0. Technical Report 17-12, ZIB, Berlin.Google Scholar
  • Martin RK (1991) Using separation algorithms to generate mixed integer model reformulations. Oper. Res. Lett. 10(3):119–128.CrossrefGoogle Scholar
  • Mersereau A, Rusmevichientong P, Tsitsiklis J (2009) A structured multiarmed bandit problem and the greedy policy. Automatic Control IEEE Trans. 54(12):2787–2802.CrossrefGoogle Scholar
  • Robbins H (1952) Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58(5):527–535.CrossrefGoogle Scholar
  • Rothvoß T (2013) Some 0/1 polytopes need exponential size extended formulations. Math. Programming 142(1–2):255–268.CrossrefGoogle Scholar
  • Rothvoß T (2017) The matching polytope has exponential extension complexity. J. ACM 64(6):Article 41.CrossrefGoogle Scholar
  • Rusmevichientong P, Tsitsiklis J (2010) Linearly parameterized bandits. Math. Oper. Res. 35(2):395–411.LinkGoogle Scholar
  • Rusmevichientong P, Shen Z, Shmoys D (2010) Dynamic assortment optimization with a multinomial logit choice model and capacity constraint. Oper. Res. 58(6):1666–1680.LinkGoogle Scholar
  • Ryzhov IO, Powell WB (2009) The knowledge gradient algorithm for online subset selection. Proc. 2009 IEEE Internat. Sympos. Adaptive Dynamic Programming Reinforcement Learn. (IEEE, Piscataway, NJ), 137–144.Google Scholar
  • Ryzhov IO, Powell WB (2011) Information collection on a graph. Oper. Res. 59(1):188–201.LinkGoogle Scholar
  • Ryzhov IO, Powell WB, Frazier PI (2012) The knowledge gradient algorithm for a general class of online learning problems. Oper. Res. 60(1):180–195.LinkGoogle Scholar
  • Sauré D, Zeevi A (2013) Optimal dynamic assortment planning with demand learning. Manufacturing Service Oper. Management 15(3):387–404.LinkGoogle Scholar
  • Schrijver A (2003) Combinatorial Optimization—Polyhedra and Efficiency (Springer, Berlin).Google Scholar
  • Stanley R (1999) Enumerative Combinatorics, vol. 2, Cambridge Studies in Advanced Mathematics (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Thompson WR (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3–4):285–294.CrossrefGoogle Scholar
  • Toriello A, Vielma JP (2012) Fitting piecewise linear continuous functions. Eur. J. Oper. Res. 219(1):86–95.CrossrefGoogle Scholar
  • Ventura P, Eisenbrand F (2003) A compact linear program for testing optimality of perfect matchings. Oper. Res. Lett. 31(6):429–434.CrossrefGoogle Scholar
  • Vielma JP (2015) Mixed integer linear programming formulation techniques. SIAM Rev. 57(1):3–57.CrossrefGoogle Scholar
  • Whittle P (1982) Optimization over Time, vol. 1 (John Wiley & Sons Ltd., Hoboken, NJ).Google Scholar
  • Williamson DP, Shmoys DB (2011) The Design of Approximation Algorithms (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.