Free Access

Integer Programming for Learning Directed Acyclic Graphs from Continuous Data

Hasan Manzour
Hasan Manzour
[email protected]
https://orcid.org/0000-0003-0012-9771
Department of Industrial and Systems Engineering, University of Washington, Seattle, Washington 98195;
Search for more papers by this author
,
Simge Küçükyavuz
Corresponding Author
Simge Küçükyavuz
[email protected]
https://orcid.org/0000-0001-6548-9378
Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois 60208;
Search for more papers by this author
,
Hao-Hsiang Wu
Hao-Hsiang Wu
[email protected]
Department of Management Science, National Chiao Tung University, Hsinchu, Taiwan;
Search for more papers by this author
,
Ali Shojaie
Ali Shojaie
[email protected]
Department of Biostatistics, University of Washington, Seattle, Washington 98195
Search for more papers by this author

Hasan Manzour

[email protected]

https://orcid.org/0000-0003-0012-9771

Department of Industrial and Systems Engineering, University of Washington, Seattle, Washington 98195;

Search for more papers by this author

Simge Küçükyavuz

Corresponding Author

Simge Küçükyavuz

[email protected]

https://orcid.org/0000-0001-6548-9378

Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois 60208;

Search for more papers by this author

Hao-Hsiang Wu

[email protected]

Department of Management Science, National Chiao Tung University, Hsinchu, Taiwan;

Search for more papers by this author

Ali Shojaie

[email protected]

Department of Biostatistics, University of Washington, Seattle, Washington 98195

Search for more papers by this author

Published Online:3 Nov 2020https://doi.org/10.1287/ijoo.2019.0040

References

Aragam B, Zhou Q (2015) Concave penalized estimation of sparse Gaussian Bayesian networks. J. Machine Learn. Res. 16:2273–2328.Google Scholar
Bartlett M, Cussens J (2013) Advances in Bayesian network learning using integer programming. Preprint, submitted September 6, 2013, https://arxiv.org/abs/1309.6825.Google Scholar
Bartlett M, Cussens J (2017) Integer linear programming for the Bayesian network structure learning problem. Artificial Intelligence 244:258–271.Google Scholar
Bektaş T, Gouveia L (2014) Requiem for the Miller–Tucker–Zemlin subtour elimination constraints? Eur. J. Oper. Res. 236(3):820–832.Google Scholar
Bertsimas D, King A, Mazumder R (2016) Best subset selection via a modern optimization lens. Ann. Statist. 44(2):813–852.Google Scholar
Chen W, Drton M, Wang YS (2019) On causal discovery with an equal-variance assumption. Biometrika 106(4):973–980.Google Scholar
Chickering DM (1996) Learning Bayesian Networks Is NP-Complete. Learning from Data (Springer, New York).Google Scholar
Chickering DM (2002) Optimal structure identification with greedy search. J. Maching Learning Res. 3(Nov):507–554.Google Scholar
Cook WJ, Cunningham WH, Pulleyblank WR, Schrijver A (1997) Combinatorial Optimization (Wiley, New York).Google Scholar
Correia AH, Cussens J, de Campos CP (2019) On pruning for score-based Bayesian network structure learning. Preprint, submitted May 23, 2019, https://arxiv.org/abs/1905.09943.Google Scholar
Cussens J (2012) Bayesian network learning with cutting planes. Preprint, submitted February 14, 2012, https://arxiv.org/abs/1202.3713.Google Scholar
Cussens J, Bartlett M, Jones EM, Sheehan NA (2013) Maximum likelihood pedigree reconstruction using integer programming. Genetic Epidemiology 37:69–83.Google Scholar
Cussens J, Haws D, Studenỳ M (2017a) Polyhedral aspects of score equivalence in Bayesian network structure learning. Math. Programming 164(1-2):285–324.Google Scholar
Cussens J, Järvisalo M, Korhonen JH, Bartlett M (2017b) Bayesian network structure learning with integer programming: Polytopes, facets and complexity. J. Artificial Intelligence Res. 58:185–229.Google Scholar
Dantzig G, Fulkerson R, Johnson S (1954) Solution of a large-scale traveling-salesman problem. J. Oper. Res. Soc. Amer. 2(4):393–410.Link, Google Scholar
Dasgupta S (1999) Learning polytrees. Laskey KB, Prade H, eds. Proc. 15th Conf. Uncertainty Artificial Intelligence (Morgan Kaufmann Publishers, San Francisco), 134–141.Google Scholar
Desrochers M, Laporte G (1991) Improvements and extensions to the Miller-Tucker-Zemlin subtour elimination constraints. Oper. Res. Lett. 10(1):27–36.Google Scholar
Drton M, Maathuis MH (2017) Structure learning in graphical modeling. Annu. Rev. Statist. Appl. 4:365–393.Google Scholar
Eaton D, Murphy K (2007) Exact Bayesian structure learning from uncertain interventions. Artificial Intelligence Statist. 2:107–114.Google Scholar
Fu F, Zhou Q (2013) Learning sparse causal Gaussian networks with experimental intervention: Regularization and coordinate descent. J. Amer. Statist. Assoc. 108(501):288–300.Google Scholar
Grötschel M, Jünger M, Reinelt G (1985) On the acyclic subgraph polytope. Math. Programming 33(1):28–42.Google Scholar
Han SW, Chen G, Cheon MS, Zhong H (2016) Estimation of directed acyclic graphs through two-stage adaptive lasso for gene network inference. J. Amer. Statist. Assoc. 111(515):1004–1019.Google Scholar
Healy P, Nikolov NS (2002) A branch-and-cut approach to the directed acyclic graph layering problem. Goodrich M, Kobourov SG, eds. Internat. Sympos. Graph Drawing (Springer, New York), 98–109.Google Scholar
Hemmecke R, Lindner S, Studenỳ M (2012) Characteristic imsets for learning Bayesian network structure. Internat. J. Approximate Reasoning 53(9):1336–1349.Google Scholar
Jaakkola T, Sontag D, Globerson A, Meila M (2010) Learning Bayesian network structure using LP relaxations. Teh YW, Titterington M, eds. Proc. 13th Internat. Conf. Artificial Intelligence Statist. (PMLR, Sardinia, Italy), 358–365.Google Scholar
Kalisch M, Bühlmann P (2007) Estimating high-dimensional directed acyclic graphs with the PC-algorithm. J. Machine Learn. Res. 8(Mar):613–636.Google Scholar
Koivisto M (2012) Advances in exact Bayesian structure discovery in Bayesian networks. Preprint, submitted June 27, 0212, https://arxiv.org/abs/1206.6828.Google Scholar
Koivisto M, Sood K (2004) Exact Bayesian structure discovery in Bayesian networks. J. Machine Learn. Res. 5(May):549–573.Google Scholar
Koller D, Friedman N (2009) Probabilistic Graphical Models: Principles and Techniques (MIT Press, Cambridge, MA).Google Scholar
Kuipers J, Suter P, Moffa G (2018) Efficient structure learning and sampling of Bayesian networks. Preprint, submitted March 21, 2018, https://arxiv.org/abs/1803.07859.Google Scholar
Lauritzen SL (1996) Graphical Models (Oxford University Press, Oxford, UK).Google Scholar
Loh PL, Bühlmann P (2014) High-dimensional learning of linear causal networks via inverse covariance estimation. J. Maching Learn. Res. 15(1):3065–3105.Google Scholar
Malone B, Kangas K, Järvisalo M, Koivisto M, Myllymäki P (2014) Predicting the Hardness of Learning Bayesian Networks (AAAI, Palo Alto, CA).Google Scholar
Markowetz F, Spang R (2007) Inferring cellular networks–a review. BMC Bioinform. 8(6):S5.Google Scholar
Miller CE, Tucker AW, Zemlin RA (1960) Integer programming formulation of traveling salesman problems. J. ACM 7(4):326–329.Google Scholar
Nemhauser GL, Wolsey LA (1988) Integer Programming and Combinatorial Optimization (Wiley-Interscience, New York).Google Scholar
Oates CJ, Smith JQ, Mukherjee S (2016a) Estimating causal structure using conditional dag models. J. Machine Learn. Res. 17(54):1–23.Google Scholar
Oates CJ, Smith JQ, Mukherjee S, Cussens J (2016b) Exact estimation of multiple directed acyclic graphs. Statist. Comput. 26(4):797–811.Google Scholar
Öncan T, Altınel İK, Laporte G (2009) A comparative analysis of several asymmetric traveling salesman problem formulations. Comput. Oper. Res. 36(3):637–654.Google Scholar
Padberg M, Sung TY (1991) An analytical comparison of different formulations of the travelling salesman problem. Math. Programming 52(1-3):315–357.Google Scholar
Park YW, Klabjan D (2017) Bayesian network learning via topological order. J. Machine Learn. Res. 18(99):1–32.Google Scholar
Parviainen P, Koivisto M (2009) Exact structure discovery in Bayesian networks with less space. Blimes J, Ng AY, eds. Proc. 25th Conf. Uncertainty Artificial Intelligence (AUAI Press, Montreal), 436–443.Google Scholar
Pataki G (2003) Teaching integer programming formulations using the traveling salesman problem. SIAM Rev. 45(1):116–123.Google Scholar
Pearl J (2009) Causal inference in statistics: An overview. Stat. Survey 3:96–146.Google Scholar
Perrier E, Imoto S, Miyano S (2008) Finding optimal Bayesian network given a super-structure. J. Machine Learn. Res. 9(Oct):2251–2286.Google Scholar
Peters J, Bühlmann P (2013) Identifiability of Gaussian structural equation models with equal error variances. Biometrika 101(1):219–228.Google Scholar
Raskutti G, Uhler C (2013) Learning directed acyclic graphs based on sparsest permutations. Preprint, submitted July 1, 2013, https://arxiv.org/abs/1307.0366.Google Scholar
Sachs K, Perez O, Pe’er D, Lauffenburger DA, Nolan GP (2005) Causal protein-signaling networks derived from multiparameter single-cell data. Science 308(5721):523–529.Google Scholar
Sawik T (2016) A note on the Miller-Tucker-Zemlin model for the asymmetric traveling salesman problem. Bull. Polish Acad. Sci. Tech. Sci. 64(3):517–520.Google Scholar
Shojaie A, Michailidis G (2010) Penalized likelihood methods for estimation of sparse high-dimensional directed acyclic graphs. Biometrika 97(3):519–538.Google Scholar
Silander T, Myllymaki P (2012) A simple approach for finding the globally optimal Bayesian network structure. Preprint, submitted June 27, 2012, https://arxiv.org/abs/1206.6875.Google Scholar
Singh M, Valtorta M (1995) Construction of Bayesian network structures from data: A brief survey and an efficient algorithm. Internat. J. Approximation Reasoning 12(2):111–131.Google Scholar
Sondhi A, Shojaie A (2019) The reduced pc-algorithm: Improved causal structure learning in large random networks. J. Machine Learn. Res. 20(164):1–31.Google Scholar
Spirtes P, Glymour CN, Scheines R (2000) Causation, Prediction, and Search (MIT Press, Cambridge, MA).Google Scholar
Studenỳ M, Haws DC (2013) On polyhedral approximations of polytopes for learning Bayesian networks. J. Algebraic Statist. 4(1):58–91.Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. B 58(1):267–288.Google Scholar
Tsamardinos I, Brown LE, Aliferis CF (2006) The max-min hill-climbing Bayesian network structure learning algorithm. Machine Learn. 65(1):31–78.Google Scholar
van de Geer S, Bühlmann P (2013) ℓ0-penalized maximum likelihood for sparse directed acyclic graphs. Ann. Statist. 41(2):536–567.Google Scholar
Xiang J, Kim S (2013) A* lasso for learning a sparse Bayesian network structure for continuous variables. Adv. Neural Inform. Processing Systems 2:2418–2426.Google Scholar
Yuan C, Malone B (2013) Learning optimal Bayesian networks: A shortest path perspective. J. Artificial Intelligence Res. 48:23–65.Google Scholar
Yuan C, Malone B, Wu X (2011) Learning optimal Bayesian networks using A* search. Walsh T, ed. IJCAI Proc. (AAAI Press, Barcelona, Spain), vol. 22(3), 2186–2191.Google Scholar
Zhang B, Gaiteri C, Bodea LG, Wang Z, McElwee J, Podtelezhnikov AA, Zhang C, et al. (2013) Integrated systems approach identifies genetic nodes and networks in late-onset Alzheimer’s disease. Cell 153(3):707–720.Google Scholar
Zheng X, Aragam B, Ravikumar P, Xing EP (2018) DAGs with NO TEARS: smooth optimization for structure learning. Preprint, submitted March 4, 2018, https://arxiv.org/abs/1803.01422.Google Scholar
Zheng X, Dan C, Aragam B, Ravikumar P, Xing EP (2019) Learning sparse nonparametric DAGs. Preprint, submitted September 15, 2019, https://arxiv.org/abs/1909.13189.Google Scholar
Zou H (2006) The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101(476):1418–1429.Google Scholar

cover image INFORMS Journal on Optimization

Volume 3, Issue 1

January 2021

Pages 1-117

Article Information

Metrics

Information

Received:September 05, 2019
Accepted:June 23, 2020
Published Online:November 03, 2020

Cite as

Hasan Manzour, Simge Küçükyavuz, Hao-Hsiang Wu, Ali Shojaie (2020) Integer Programming for Learning Directed Acyclic Graphs from Continuous Data. INFORMS Journal on Optimization 3(1):46-73.

https://doi.org/10.1287/ijoo.2019.0040

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Integer Programming for Learning Directed Acyclic Graphs from Continuous Data

References

Volume 3, Issue 1

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News