Integer Programming for Learning Directed Acyclic Graphs from Continuous Data

Published Online:https://doi.org/10.1287/ijoo.2019.0040

References

  • Aragam B, Zhou Q (2015) Concave penalized estimation of sparse Gaussian Bayesian networks. J. Machine Learn. Res. 16:2273–2328.Google Scholar
  • Bartlett M, Cussens J (2013) Advances in Bayesian network learning using integer programming. Preprint, submitted September 6, 2013, https://arxiv.org/abs/1309.6825.Google Scholar
  • Bartlett M, Cussens J (2017) Integer linear programming for the Bayesian network structure learning problem. Artificial Intelligence 244:258–271.Google Scholar
  • Bektaş T, Gouveia L (2014) Requiem for the Miller–Tucker–Zemlin subtour elimination constraints? Eur. J. Oper. Res. 236(3):820–832.Google Scholar
  • Bertsimas D, King A, Mazumder R (2016) Best subset selection via a modern optimization lens. Ann. Statist. 44(2):813–852.Google Scholar
  • Chen W, Drton M, Wang YS (2019) On causal discovery with an equal-variance assumption. Biometrika 106(4):973–980.Google Scholar
  • Chickering DM (1996) Learning Bayesian Networks Is NP-Complete. Learning from Data (Springer, New York).Google Scholar
  • Chickering DM (2002) Optimal structure identification with greedy search. J. Maching Learning Res. 3(Nov):507–554.Google Scholar
  • Cook WJ, Cunningham WH, Pulleyblank WR, Schrijver A (1997) Combinatorial Optimization (Wiley, New York).Google Scholar
  • Correia AH, Cussens J, de Campos CP (2019) On pruning for score-based Bayesian network structure learning. Preprint, submitted May 23, 2019, https://arxiv.org/abs/1905.09943.Google Scholar
  • Cussens J (2012) Bayesian network learning with cutting planes. Preprint, submitted February 14, 2012, https://arxiv.org/abs/1202.3713.Google Scholar
  • Cussens J, Bartlett M, Jones EM, Sheehan NA (2013) Maximum likelihood pedigree reconstruction using integer programming. Genetic Epidemiology 37:69–83.Google Scholar
  • Cussens J, Haws D, Studenỳ M (2017a) Polyhedral aspects of score equivalence in Bayesian network structure learning. Math. Programming 164(1-2):285–324.Google Scholar
  • Cussens J, Järvisalo M, Korhonen JH, Bartlett M (2017b) Bayesian network structure learning with integer programming: Polytopes, facets and complexity. J. Artificial Intelligence Res. 58:185–229.Google Scholar
  • Dantzig G, Fulkerson R, Johnson S (1954) Solution of a large-scale traveling-salesman problem. J. Oper. Res. Soc. Amer. 2(4):393–410.LinkGoogle Scholar
  • Dasgupta S (1999) Learning polytrees. Laskey KB, Prade H, eds. Proc. 15th Conf. Uncertainty Artificial Intelligence (Morgan Kaufmann Publishers, San Francisco), 134–141.Google Scholar
  • Desrochers M, Laporte G (1991) Improvements and extensions to the Miller-Tucker-Zemlin subtour elimination constraints. Oper. Res. Lett. 10(1):27–36.Google Scholar
  • Drton M, Maathuis MH (2017) Structure learning in graphical modeling. Annu. Rev. Statist. Appl. 4:365–393.Google Scholar
  • Eaton D, Murphy K (2007) Exact Bayesian structure learning from uncertain interventions. Artificial Intelligence Statist. 2:107–114.Google Scholar
  • Fu F, Zhou Q (2013) Learning sparse causal Gaussian networks with experimental intervention: Regularization and coordinate descent. J. Amer. Statist. Assoc. 108(501):288–300.Google Scholar
  • Grötschel M, Jünger M, Reinelt G (1985) On the acyclic subgraph polytope. Math. Programming 33(1):28–42.Google Scholar
  • Han SW, Chen G, Cheon MS, Zhong H (2016) Estimation of directed acyclic graphs through two-stage adaptive lasso for gene network inference. J. Amer. Statist. Assoc. 111(515):1004–1019.Google Scholar
  • Healy P, Nikolov NS (2002) A branch-and-cut approach to the directed acyclic graph layering problem. Goodrich M, Kobourov SG, eds. Internat. Sympos. Graph Drawing (Springer, New York), 98–109.Google Scholar
  • Hemmecke R, Lindner S, Studenỳ M (2012) Characteristic imsets for learning Bayesian network structure. Internat. J. Approximate Reasoning 53(9):1336–1349.Google Scholar
  • Jaakkola T, Sontag D, Globerson A, Meila M (2010) Learning Bayesian network structure using LP relaxations. Teh YW, Titterington M, eds. Proc. 13th Internat. Conf. Artificial Intelligence Statist. (PMLR, Sardinia, Italy), 358–365.Google Scholar
  • Kalisch M, Bühlmann P (2007) Estimating high-dimensional directed acyclic graphs with the PC-algorithm. J. Machine Learn. Res. 8(Mar):613–636.Google Scholar
  • Koivisto M (2012) Advances in exact Bayesian structure discovery in Bayesian networks. Preprint, submitted June 27, 0212, https://arxiv.org/abs/1206.6828.Google Scholar
  • Koivisto M, Sood K (2004) Exact Bayesian structure discovery in Bayesian networks. J. Machine Learn. Res. 5(May):549–573.Google Scholar
  • Koller D, Friedman N (2009) Probabilistic Graphical Models: Principles and Techniques (MIT Press, Cambridge, MA).Google Scholar
  • Kuipers J, Suter P, Moffa G (2018) Efficient structure learning and sampling of Bayesian networks. Preprint, submitted March 21, 2018, https://arxiv.org/abs/1803.07859.Google Scholar
  • Lauritzen SL (1996) Graphical Models (Oxford University Press, Oxford, UK).Google Scholar
  • Loh PL, Bühlmann P (2014) High-dimensional learning of linear causal networks via inverse covariance estimation. J. Maching Learn. Res. 15(1):3065–3105.Google Scholar
  • Malone B, Kangas K, Järvisalo M, Koivisto M, Myllymäki P (2014) Predicting the Hardness of Learning Bayesian Networks (AAAI, Palo Alto, CA).Google Scholar
  • Markowetz F, Spang R (2007) Inferring cellular networks–a review. BMC Bioinform. 8(6):S5.Google Scholar
  • Miller CE, Tucker AW, Zemlin RA (1960) Integer programming formulation of traveling salesman problems. J. ACM 7(4):326–329.Google Scholar
  • Nemhauser GL, Wolsey LA (1988) Integer Programming and Combinatorial Optimization (Wiley-Interscience, New York).Google Scholar
  • Oates CJ, Smith JQ, Mukherjee S (2016a) Estimating causal structure using conditional dag models. J. Machine Learn. Res. 17(54):1–23.Google Scholar
  • Oates CJ, Smith JQ, Mukherjee S, Cussens J (2016b) Exact estimation of multiple directed acyclic graphs. Statist. Comput. 26(4):797–811.Google Scholar
  • Öncan T, Altınel İK, Laporte G (2009) A comparative analysis of several asymmetric traveling salesman problem formulations. Comput. Oper. Res. 36(3):637–654.Google Scholar
  • Padberg M, Sung TY (1991) An analytical comparison of different formulations of the travelling salesman problem. Math. Programming 52(1-3):315–357.Google Scholar
  • Park YW, Klabjan D (2017) Bayesian network learning via topological order. J. Machine Learn. Res. 18(99):1–32.Google Scholar
  • Parviainen P, Koivisto M (2009) Exact structure discovery in Bayesian networks with less space. Blimes J, Ng AY, eds. Proc. 25th Conf. Uncertainty Artificial Intelligence (AUAI Press, Montreal), 436–443.Google Scholar
  • Pataki G (2003) Teaching integer programming formulations using the traveling salesman problem. SIAM Rev. 45(1):116–123.Google Scholar
  • Pearl J (2009) Causal inference in statistics: An overview. Stat. Survey 3:96–146.Google Scholar
  • Perrier E, Imoto S, Miyano S (2008) Finding optimal Bayesian network given a super-structure. J. Machine Learn. Res. 9(Oct):2251–2286.Google Scholar
  • Peters J, Bühlmann P (2013) Identifiability of Gaussian structural equation models with equal error variances. Biometrika 101(1):219–228.Google Scholar
  • Raskutti G, Uhler C (2013) Learning directed acyclic graphs based on sparsest permutations. Preprint, submitted July 1, 2013, https://arxiv.org/abs/1307.0366.Google Scholar
  • Sachs K, Perez O, Pe’er D, Lauffenburger DA, Nolan GP (2005) Causal protein-signaling networks derived from multiparameter single-cell data. Science 308(5721):523–529.Google Scholar
  • Sawik T (2016) A note on the Miller-Tucker-Zemlin model for the asymmetric traveling salesman problem. Bull. Polish Acad. Sci. Tech. Sci. 64(3):517–520.Google Scholar
  • Shojaie A, Michailidis G (2010) Penalized likelihood methods for estimation of sparse high-dimensional directed acyclic graphs. Biometrika 97(3):519–538.Google Scholar
  • Silander T, Myllymaki P (2012) A simple approach for finding the globally optimal Bayesian network structure. Preprint, submitted June 27, 2012, https://arxiv.org/abs/1206.6875.Google Scholar
  • Singh M, Valtorta M (1995) Construction of Bayesian network structures from data: A brief survey and an efficient algorithm. Internat. J. Approximation Reasoning 12(2):111–131.Google Scholar
  • Sondhi A, Shojaie A (2019) The reduced pc-algorithm: Improved causal structure learning in large random networks. J. Machine Learn. Res. 20(164):1–31.Google Scholar
  • Spirtes P, Glymour CN, Scheines R (2000) Causation, Prediction, and Search (MIT Press, Cambridge, MA).Google Scholar
  • Studenỳ M, Haws DC (2013) On polyhedral approximations of polytopes for learning Bayesian networks. J. Algebraic Statist. 4(1):58–91.Google Scholar
  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. B 58(1):267–288.Google Scholar
  • Tsamardinos I, Brown LE, Aliferis CF (2006) The max-min hill-climbing Bayesian network structure learning algorithm. Machine Learn. 65(1):31–78.Google Scholar
  • van de Geer S, Bühlmann P (2013) ℓ0-penalized maximum likelihood for sparse directed acyclic graphs. Ann. Statist. 41(2):536–567.Google Scholar
  • Xiang J, Kim S (2013) A* lasso for learning a sparse Bayesian network structure for continuous variables. Adv. Neural Inform. Processing Systems 2:2418–2426.Google Scholar
  • Yuan C, Malone B (2013) Learning optimal Bayesian networks: A shortest path perspective. J. Artificial Intelligence Res. 48:23–65.Google Scholar
  • Yuan C, Malone B, Wu X (2011) Learning optimal Bayesian networks using A* search. Walsh T, ed. IJCAI Proc. (AAAI Press, Barcelona, Spain), vol. 22(3), 2186–2191.Google Scholar
  • Zhang B, Gaiteri C, Bodea LG, Wang Z, McElwee J, Podtelezhnikov AA, Zhang C, et al. (2013) Integrated systems approach identifies genetic nodes and networks in late-onset Alzheimer’s disease. Cell 153(3):707–720.Google Scholar
  • Zheng X, Aragam B, Ravikumar P, Xing EP (2018) DAGs with NO TEARS: smooth optimization for structure learning. Preprint, submitted March 4, 2018, https://arxiv.org/abs/1803.01422.Google Scholar
  • Zheng X, Dan C, Aragam B, Ravikumar P, Xing EP (2019) Learning sparse nonparametric DAGs. Preprint, submitted September 15, 2019, https://arxiv.org/abs/1909.13189.Google Scholar
  • Zou H (2006) The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101(476):1418–1429.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.