Free Access

Conjecturing-Based Discovery of Patterns in Data

J. Paul Brooks
Corresponding Author
J. Paul Brooks
[email protected]
https://orcid.org/0000-0003-0423-8422
Department of Information Systems, Virginia Commonwealth University, Richmond, Virginia 23284;
Search for more papers by this author
,
David J. Edwards
David J. Edwards
[email protected]
Department of Statistical Sciences and Operations Research, Virginia Commonwealth University, Richmond, Virginia 23284;
Search for more papers by this author
,
Craig E. Larson
Craig E. Larson
[email protected]
Department of Mathematics and Applied Mathematics, Virginia Commonwealth University, Richmond, Virginia 23284;
Search for more papers by this author
,
Nico Van Cleemput
Nico Van Cleemput
[email protected]
Department of Applied Mathematics, Computer Science and Statistics, Ghent University, 9000 Ghent, Belgium
Search for more papers by this author

J. Paul Brooks

Corresponding Author

J. Paul Brooks

[email protected]

https://orcid.org/0000-0003-0423-8422

Department of Information Systems, Virginia Commonwealth University, Richmond, Virginia 23284;

Search for more papers by this author

David J. Edwards

[email protected]

Department of Statistical Sciences and Operations Research, Virginia Commonwealth University, Richmond, Virginia 23284;

Search for more papers by this author

Craig E. Larson

[email protected]

Department of Mathematics and Applied Mathematics, Virginia Commonwealth University, Richmond, Virginia 23284;

Search for more papers by this author

Nico Van Cleemput

[email protected]

Department of Applied Mathematics, Computer Science and Statistics, Ghent University, 9000 Ghent, Belgium

Search for more papers by this author

Published Online:2 Feb 2024https://doi.org/10.1287/ijds.2021.0043

References

Abolafia D, Norouzi M, Shen J, Zhao R, Le Q (2018) Neural program synthesis with priority queue training. Preprint, submitted January 10, https://arxiv.org/abs/1801.03526.Google Scholar
Aghaei S, Gómez A, Vayanos P (2021) Strong optimal classification trees. Preprint, submitted March 29, https://arxiv.org/abs/2103.15965.Google Scholar
Bellomarini L, Benedetto D, Gottlob G, Sallinger E (2020) Vadalog: A modern architecture for automated reasoning with large knowledge graphs. Inform. Systems 105:101528.Google Scholar
Bertsimas D, Dunn J (2017) Optimal classification trees. Machine Learning 106:1039–1082.Google Scholar
Blanquero R, Carrizosa E, Molero-Río C, Romero Morales D (2021) Optimal randomized classification trees. Comput. Oper. Res. 132:105281.Google Scholar
Bradford A, Day J, Hutchinson L, Larson CE, Mills M, Muncy D, Kaperick B, Van Cleemput N (2020) Automated conjecturing II: Chomp and intelligent game play. J. Artificial Intelligence Res. 68:447–461.Google Scholar
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and Regression Trees (Routledge, New York).Google Scholar
Brunton S, Proctor J, Kutz J (2016) Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc. Natl. Acad. Sci. USA 113:3932–3937.Google Scholar
Chattopadhyay I, Lipson H (2014) Data smashing: Uncovering lurking order in data. J. Royal Soc. Interface 11:20140826.Google Scholar
Chvátal V (1972) On Hamilton’s ideals. J. Combin. Theory Ser. B 12:163–168.Google Scholar
Chvátal V, Erdös P (1972) A note on Hamiltonian circuits. Discrete Math. 2(2):111–113.Google Scholar
Dash S, Günlük O, Wei D (2018) Boolean decision rules via column generation. 32nd Conf. Neural Inform. Processing Systems (NeurIPS-18) (Curran Associates, Red Hook, NY), 4660–4670.Google Scholar
Elton D (2020) Self-explaining AI as an alternative to interpretable AI. Preprint, submitted February 12, https://arxiv.org/abs/2002.05149.Google Scholar
Fajtlowicz S (1995) On conjectures of graffiti. Graph Theory, Combinatorics, and Algorithms, vol. 1 (Wiley, New York), 367–376.Google Scholar
Fürnkranz J, Kliegr T, Paulheim H (2020) On cognitive preferences and the plausibility of rule-based models. Machine Learning 109:853–898.Google Scholar
Haemers W (1979) On some problems of Lovász concerning the shannon capacity of a graph. IEEE Trans. Inform. Theory 25(2):231–232.Google Scholar
Hammer P, Bonates T (2006) Logical analysis of data—An overview: From combinatorial optimization to medical applications. Ann. Oper. Res. 148: 203–225.Google Scholar
Hu D, Li J, Gao R, Wang S, Li Q, Chen S, Huang J, et al. (2021) Decreased CO2 levels as indicators of possible mechanical ventilation-induced hyperventilation in COVID-19 patients: A retrospective analysis. Frontiers Public Health 8:596168.Google Scholar
Jantzen B (2016) Dynamical kinds and their discovery. Preprint, submitted December 15. https://arxiv.org/abs/1612.04933.Google Scholar
Kanter JM, Veeramachaneni K (2015) Deep feature synthesis: Toward automating data science endeavors. 2015 IEEE Internat. Conf. Data Sci. Advanced Analytics (Institute of Electrical and Electronics Engineers, Piscataway, NJ).Google Scholar
Katz G, Shin ECR, Song D (2016) ExploreKit: Automatic feature generation and selection. 16th IEEE Internat. Conf. Data Mining (Institute of Electrical and Electronics Engineers, Piscataway, NJ).Google Scholar
Khurana U, Samulowitz H, Turaga D (2018) Feature engineering for predictive modeling using reinforcement learning. 32nd AAAI Conf. Artificial Intelligence (AAAI-18) (Association for the Advancement of Artificial Intelligence, Palo Alto, CA).Google Scholar
Langely P, Simon HA, Bradshaw GL, Zytkow JM (1987) Scientific Discovery: Computational Explorations of the Creative Process (MIT Press, Cambridge, MA).Google Scholar
Langley P (2019) Scientific discovery, causal explanation, and process model induction. Mind Soc. 18:43–56.Google Scholar
Larson CE, Van Cleemput N (2016) Automated conjecturing I: Fajtlowicz’s Dalmatian heuristic revisited. Artificial Intelligence 231:17–38.Google Scholar
Larson CE, Van Cleemput N (2017) Automated conjecturing III: Property-relations conjectures. Ann. Math. Artificial Intelligence 81(3):315–327.Google Scholar
Lemadjeng AC, Rober T, Akyuz MH, Birbil SI (2023) Rule generation for classification: Scalability, interpretability, and fairness. Preprint, submitted August 30, https://arxiv.org/abs/2104.10751v3.Google Scholar
Lovász L (1979) On the Shannon capacity of a graph. IEEE Transactions Information Theory 25(1):1–7.Google Scholar
Lu J, Lee DK, Kim T, Danks D (2019) Good explanation for algorithmic transparency. Preprint, submitted November 11, https://dx.doi.org/10.2139/ssrn.3503603.Google Scholar
Nguyen Q, Nguyen X, O’Neill M, McKay R, Galván-López E (2011) Semantically-based crossover in genetic programming: Application to real-valued symbolic regression. Genetic Programming Evolvable Machines 12:91–119.Google Scholar
Nicolau M, Agapitos A (2021) Choosing function sets with better generalisation performance for symbolic regression models. Genetic Programming Evolvable Machines 22:73–100.Google Scholar
Noori M, Nejadghaderi S, Sullman M, Carson-Chahhoud K, Kolahi AA, Safiri S (2022) Epidemiology, prognosis and management of potassium disorders in Covid-19. Rev. Medical Virology 32:e2262.Google Scholar
Petersen B, Larma M, Mundhenk T, Santiago C, Kim S, Kim J (2021) Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients. Proc. Internat. Conf. Learning Representation (ICLR) (International Conference on Learning Representations, Appleton, WI).Google Scholar
Rudin C (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence 1:206–215.Google Scholar
Rudin C, Ertekin S (2018) Learning customized and optimized lists of rules with mathematical programming. Math. Programming Comput. 10:659–702.Google Scholar
Samek W, Müller KR (2019) Toward explainable artificial intelligence. Samek W, Montavon G, Vedaldi A, Hanson L, Müller KR, eds. Explainable AI: Interpreting, Explaining and Visualizing Deep Learning (Springer Nature, Cham, Switzerland), 5–22.Google Scholar
Schmidt M, Lipson H (2009) Distilling free-form natural laws from experimental data. Science 324(5923):81–85.Google Scholar
Schrijver A (2003) Combinatorial Optimization: Polyhedra and Efficiency, vol. 24 (Springer-Verlag, Berlin, Heidelberg, Germany).Google Scholar
Tallorin L, Wang JL, Kim WE, Sahu S, Kosa NM, Yang P, Thompson M, et al. (2018) Discovering de novo peptide substrates for enzymes using machine learning. Nature Comm. 9(1):1–10.Google Scholar
Therneau T, Atkinson B (2019) rpart: Recursive partitioning and regression trees. R package version 4.1-15. Retrieved May 19, 2021, https://CRAN.R-project.org/package=rpart.Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the LASSO. J. Royal Statist. Soc. B 58:267–288.Google Scholar
Tsang M, Cheng D, Liu Y (2018a) Detecting statistical interactions from neural network weights. Sixth Internat. Conf. Learn. Representations (ICLR-18) (International Conference on Learning Representations, Appleton, WI).Google Scholar
Tsang M, Rambhatla H, Liu Y (2020) How does this interaction affect me? Interpretable attribution for feature interactions. 34th Conf. Neural Inform. Processing Systems (NeurIPS-20) (Curran Associates, Red Hook, NY), 6147–6159.Google Scholar
Tsang M, Liu H, Purushotham S, Pavankumar M, Liu Y (2018b) Neural interaction transparency (NIT): Disentangling learned interactions for improved interpretability. 32nd Conf. Neural Inform. Processing Systems (NeurIPS-18) (Curran Associates, Red Hook, NY), 5809–5818.Google Scholar
Udrescu SM, Tegmark M (2020) AI Feynman: A physics-inspired method for symbolic regression. Sci. Adv. 6:eaay2631.Google Scholar
Verwer S, Zhang Y (2019) Learning optimal classification trees using a binary linear program formulation. 33rd AAAI Conf. Artificial Intelligence (AAAI-19) (Association for the Advancement of Artificial Intelligence, Palo Alto, CA).Google Scholar
Vilone G, Longo L (2020) Explainable artificial intelligence: A systematic review. Preprint, submitted May 29, https://arxiv.org/abs/2006.00093.Google Scholar
Wang F, Rudin C (2015) Falling rule lists. 18th Internat. Conf. Artificial Intelligence Statist. (AISTATS) (Machine Learning Research Press, Ft. Lauderdale, FL).Google Scholar
Wang T, Rudin C, Doshi-Velez F, Liu Y, Klampfl E, MacNeille P (2017) A Bayesian framework for learning rule sets for interpretable classification. J. Machine Learning Res. 18:1–37.Google Scholar
West DB (2001) Introduction to Graph Theory. 2nd ed. (Prentice Hall, Hoboken, NJ).Google Scholar

cover image INFORMS Journal on Data Science

Volume 3, Issue 2

October-December 2024

Pages 105-218, C2

Article Information

Supplemental Material

Metrics

Information

Received:September 18, 2021
Accepted:December 15, 2023
Published Online:February 02, 2024

Cite as

J. Paul Brooks, David J. Edwards, Craig E. Larson, Nico Van Cleemput (2024) Conjecturing-Based Discovery of Patterns in Data. INFORMS Journal on Data Science 3(2):179-202.

https://doi.org/10.1287/ijds.2021.0043

Keywords

Acknowledgments

High-performance computing resources provided by the High Performance Research Computing (HPRC) Core Facility at Virginia Commonwealth University (https://chipc.vcu.edu) were used for conducting the research reported in this work.

PDF download

Available Issues

Available Issues

Conjecturing-Based Discovery of Patterns in Data

References

Volume 3, Issue 2

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News