Learning in Combinatorial Optimization: What and How to Explore

Sajad Modaresi
Sajad Modaresi
[email protected]
Kenan-Flagler Business School, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599;
Search for more papers by this author
,
Denis Sauré
Corresponding Author
Denis Sauré
[email protected]
https://orcid.org/0000-0002-8123-5009
Industrial Engineering Department, University of Chile, Beauchef 851, Santiago 8370456, Chile;
Search for more papers by this author
,
Juan Pablo Vielma
Juan Pablo Vielma
[email protected]
Sloan School of Management, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142
Search for more papers by this author

Sajad Modaresi

[email protected]

Kenan-Flagler Business School, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599;

Search for more papers by this author

Denis Sauré

Corresponding Author

Denis Sauré

[email protected]

https://orcid.org/0000-0002-8123-5009

Industrial Engineering Department, University of Chile, Beauchef 851, Santiago 8370456, Chile;

Search for more papers by this author

Juan Pablo Vielma

[email protected]

Sloan School of Management, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142

Search for more papers by this author

Published Online:19 Jun 2020https://doi.org/10.1287/opre.2019.1926

References

Abernethy JD, Hazan E, Rakhlin A (2008) Competing in the dark: An efficient algorithm for bandit linear optimization. Accessed November 27, 2017, http://repository.upenn.edu/statistics_papers/110.Google Scholar
Achterberg T, Wunderling R (2013) Mixed integer programming: Analyzing 12 years of progress. Jünger M, Reinelt G, eds. Facets of Combinatorial Optimization: Festschrift for Martin Grötschel (Springer, Berlin), 449–481.Crossref, Google Scholar
Agrawal R (1995) The continuum-armed bandit problem. SIAM J. Control Optim. 33(6):1926–1951.Crossref, Google Scholar
Agrawal R, Hegde M, Teneketzis D (1990) Multi-armed bandit problems with multiple plays and switching cost. Stochastics 29(4):437–459.Google Scholar
Anantharam V, Varaiya P, Walrand J (1987) Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays. Part I. IID rewards. Automatic Control IEEE Trans. 32(11):968–976.Crossref, Google Scholar
Applegate D, Bixby R, Chvátal V, Cook W (2011) The Traveling Salesman Problem: A Computational Study, Princeton Series in Applied Mathematics (Princeton University Press, Princeton, NJ).Google Scholar
Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2–3):235–256.Crossref, Google Scholar
Auer P, Cesa-Bianchi N, Freund Y, Schapire RE (2003) The non-stochastic multi-armed bandit problem. SIAM J. Comput. 32(1):48–77.Crossref, Google Scholar
Awerbuch B, Kleinberg RD (2004) Adaptive routing with end-to-end feedback: Distributed learning and geometric approaches. Proc. 36th Annual ACM Sympos. Theory Comput. (ACM, New York), 45–53.Google Scholar
Balas E, Carrera MC (1996) A dynamic subgradient-based branch-and-bound procedure for set covering. Oper. Res. 44(6):875–890.Link, Google Scholar
Bernstein F, Modaresi S, Sauré D (2018) A dynamic clustering approach to data-driven assortment personalization. Management Sci. 65(5):2095–2115.Google Scholar
Berry D, Fristedt B (1985) Bandit Problems (Chapman and Hall, London).Crossref, Google Scholar
Bixby RE (2012) A brief history of linear and mixed-integer programming computation. Documenta Math. Extra Volume ISMP(2012):107–121.Google Scholar
Bubeck S, Munos R, Stoltz G, Szepesvári C (2011) X-armed bandits. J. Machine Learn. Res. 12:1655–1695.Google Scholar
Caro F, Gallien J (2007) Dynamic assortment with demand learning for seasonal consumer goods. Management Sci. 53(2):276–292.Link, Google Scholar
Carvajal R, Constantino M, Goycoolea M, Vielma JP, Weintraub A (2013) Imposing connectivity constraints in forest planning models. Oper. Res. 61(4):824–836.Link, Google Scholar
Cesa-Bianchi N, Lugosi G (2006) Prediction, Learning, and Games (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Cesa-Bianchi N, Lugosi G (2012) Combinatorial bandits. J. Comput. System Sci. 78(5):1404–1422.Crossref, Google Scholar
Chen W, Wang Y, Yuan Y (2013) Combinatorial multi-armed bandit: General framework, results and applications. Proc. 30th Internat. Conf. Machine Learn PMLR 28(1):151–159.Google Scholar
Cook WJ, Cunningham WH, Pulleyblank WR, Schrijver A (1998) Combinatorial Optimization (John Wiley & Sons, Inc., New York).Google Scholar
Cover T, Thomas J (2006) Elements of Information Theory (John Wiley & Sons, Inc., Hoboken, NJ).Google Scholar
Dani V, Hayes TP, Kakade SM (2008) Stochastic linear optimization under bandit feedback. Proc. 21st Annual Conf. Learn. Theory, Helsinki, Finland, 355–366.Google Scholar
Etcheberry J (1977) The set-covering problem: A new implicit enumeration algorithm. Oper. Res. 25(5):760–772.Link, Google Scholar
Fischetti M, Lodi A (2011) Heuristics in mixed integer programming. Cochran J, ed. Wiley Encyclopedia of Operations Research and Management Science, vol. 8 (Wiley, New York), 738–774.Crossref, Google Scholar
Gai Y, Krishnamachari B, Jain R (2012) Combinatorial network optimization with unknown variables: Multi-armed bandits with linear rewards and individual observations. IEEE/ACM Trans. Networking 20(5):1466–1478.Crossref, Google Scholar
Gamrath G, Fischer T, Gally T, Gleixner AM, Hendel G, Koch T, Maher SJ, Müller B, Pfetsch ME, Puchert C, et al.. (2016) The scip optimization suite 3.2. Technical Report 15-60, ZIB, Berlin.Google Scholar
Gittins J (1979) Bandit processes and dynamic allocation rules. J. Roy. Statist. Soc. Ser. A 41(2):148–177.Google Scholar
Gleixner A, Eifler L, Gally T, Gamrath G, Gemander P, Gottwald RL, Hendel G, et al.. (2017) The scip optimization suite 5.0. Technical Report 17-61, ZIB, Berlin.Google Scholar
Hoffman KL, Padberg M (1993) Solving airline crew scheduling problems by branch-and-cut. Management Sci. 39(6):657–682.Link, Google Scholar
Jünger M, Liebling T, Naddef D, Nemhauser G, Pulleyblank W, Reinelt G, Rinaldi G, Wolsey L (2010) 50 Years of Integer Programming 1958-2008: From the Early Years to the State-of-the-Art (Springer-Verlag, New York).Crossref, Google Scholar
Kleinberg R, Slivkins A, Upfal E (2008) Multi-armed bandits in metric spaces. Proc. 40th Annual ACM Sympos. Theory Comput. (ACM, New York), 681–690.Google Scholar
Koch T, Martin A (1998) Solving steiner tree problems in graphs to optimality. Networks 32(3):207–232.Crossref, Google Scholar
Kulkarni S, Lugosi G (1997) Minimax lower bounds for the two-armed bandit problem. Proc. 36th IEEE Conf. Decision Control, vol. 3 (IEEE, Piscataway, NJ), 2293–2297.Google Scholar
Lai TL, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.Crossref, Google Scholar
Liu K, Vakili S, Zhao Q (2012) Stochastic online learning for network optimization under random unknown weights. Working paper, University of California, Davis, Davis.Google Scholar
Magnanti TL, Wolsey LA (1995) Optimal trees. Ball MO, Magnanti TL, Monma CL, Nemhauser GL, eds. Handbooks in Operational Research and Management Science, vol. 7 (North-Holland, Amsterdam), 503–615.Google Scholar
Maher SJ, Fischer T, Gally T, Gamrath G, Gleixner A, Gottwald RL, Hendel G, Koch T, Lübbecke ME, Miltenberger M, et al.. (2017) The SCIP optimization suite 4.0. Technical Report 17-12, ZIB, Berlin.Google Scholar
Martin RK (1991) Using separation algorithms to generate mixed integer model reformulations. Oper. Res. Lett. 10(3):119–128.Crossref, Google Scholar
Mersereau A, Rusmevichientong P, Tsitsiklis J (2009) A structured multiarmed bandit problem and the greedy policy. Automatic Control IEEE Trans. 54(12):2787–2802.Crossref, Google Scholar
Robbins H (1952) Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58(5):527–535.Crossref, Google Scholar
Rothvoß T (2013) Some 0/1 polytopes need exponential size extended formulations. Math. Programming 142(1–2):255–268.Crossref, Google Scholar
Rothvoß T (2017) The matching polytope has exponential extension complexity. J. ACM 64(6):Article 41.Crossref, Google Scholar
Rusmevichientong P, Tsitsiklis J (2010) Linearly parameterized bandits. Math. Oper. Res. 35(2):395–411.Link, Google Scholar
Rusmevichientong P, Shen Z, Shmoys D (2010) Dynamic assortment optimization with a multinomial logit choice model and capacity constraint. Oper. Res. 58(6):1666–1680.Link, Google Scholar
Ryzhov IO, Powell WB (2009) The knowledge gradient algorithm for online subset selection. Proc. 2009 IEEE Internat. Sympos. Adaptive Dynamic Programming Reinforcement Learn. (IEEE, Piscataway, NJ), 137–144.Google Scholar
Ryzhov IO, Powell WB (2011) Information collection on a graph. Oper. Res. 59(1):188–201.Link, Google Scholar
Ryzhov IO, Powell WB, Frazier PI (2012) The knowledge gradient algorithm for a general class of online learning problems. Oper. Res. 60(1):180–195.Link, Google Scholar
Sauré D, Zeevi A (2013) Optimal dynamic assortment planning with demand learning. Manufacturing Service Oper. Management 15(3):387–404.Link, Google Scholar
Schrijver A (2003) Combinatorial Optimization—Polyhedra and Efficiency (Springer, Berlin).Google Scholar
Stanley R (1999) Enumerative Combinatorics, vol. 2, Cambridge Studies in Advanced Mathematics (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Thompson WR (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3–4):285–294.Crossref, Google Scholar
Toriello A, Vielma JP (2012) Fitting piecewise linear continuous functions. Eur. J. Oper. Res. 219(1):86–95.Crossref, Google Scholar
Ventura P, Eisenbrand F (2003) A compact linear program for testing optimality of perfect matchings. Oper. Res. Lett. 31(6):429–434.Crossref, Google Scholar
Vielma JP (2015) Mixed integer linear programming formulation techniques. SIAM Rev. 57(1):3–57.Crossref, Google Scholar
Whittle P (1982) Optimization over Time, vol. 1 (John Wiley & Sons Ltd., Hoboken, NJ).Google Scholar
Williamson DP, Shmoys DB (2011) The Design of Approximation Algorithms (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar

Volume 68, Issue 5

September-October 2020

Pages iii-vi, 1285-1624, C2-C3

Article Information

Supplemental Material

Metrics

Information

Received:August 25, 2014
Accepted:July 22, 2019
Published Online:June 19, 2020

Cite as

Sajad Modaresi, Denis Sauré, Juan Pablo Vielma (2020) Learning in Combinatorial Optimization: What and How to Explore. Operations Research 68(5):1585-1604.

https://doi.org/10.1287/opre.2019.1926

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Learning in Combinatorial Optimization: What and How to Explore

References

Volume 68, Issue 5

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News