Abou-Jaoude S (1976) Conditions nécessaires et suffisantes de convergence l1 en probabilité de l’histogramme pour une densité. Ann. Inst. Henri Poincare Probab. Statist. 12(3):213–231.Google Scholar
Akaike H (1954) An approximation to the density function. Ann. Inst. Statist. Math. 6(2):127–132.Google Scholar
Anderlini L (2015) Density estimation trees in high energy physics. Preprint, submitted February 3, https://arxiv.org/abs/1502.00932.Google Scholar
Anderlini L (2016) Density estimation trees as fast non-parametric modelling tools. J. Physics Conf. Ser. 762(1):012042.Google Scholar
Angelino E, Larus-Stone N, Alabi D, Seltzer M, Rudin C (2018) Learning certifiably optimal rule lists for categorical data. J. Machine Learn. Res. 18(234):1–78.Google Scholar
Cacoullos T (1966) Estimation of a multivariate density. Ann. Inst. Statist. Math. 18(1):179–189.Google Scholar
Caesar H, Uijlings J, Ferrari V (2018) COCO-Stuff: Thing and stuff classes in context. 2018 IEEE Conf. Comput. Vision Pattern Recognition (CVPR) (IEEE, Piscataway, NJ).Google Scholar
Cattaneo MD, Jansson M, Ma X (2020) Simple local polynomial density estimators. J. Amer. Statist. Assoc. 115(531):1449–1455.Google Scholar
Chakrabarti S, Ester M, Fayyad U, Gehrke J, Han J, Morishita S, Piatetsky-Shapiro G, Wang W (2006) Data mining curriculum: A proposal (version 1.0). Intensive Working Group of ACM SIGKDD Curriculum Committee, vol. 140 (ACM, New York), 1–10.Google Scholar
Chen T, Morris J, Martin E (2006) Probability density estimation via an infinite gaussian mixture model: Application to statistical process monitoring. J. Roy. Statist. Soc. Ser. C Appl. Statist. 55(5):699–715.Google Scholar
Chipman HA, George EI, McCulloch RE (2010) Bart: Bayesian additive regression trees. Ann. Appl. Statist. 4(1):266–298.Google Scholar
Devroye L (1991) Exponential inequalities in nonparametric estimation. Roussas G, ed. Nonparametric Functional Estimation and Related Topics, NATO ASI Series, vol. 335 (Springer, Dordrecht, Netherlands), 31–44.Google Scholar
Devroye L, Györfi L (1983) Distribution-free exponential bound on the l1 error of partitioning estimates of a regression function. Konecny F, Mogyoródi J, Wertz W, eds. Probability and Statistical Decision Theory (D. Reidel, Dordrecht, Netherlands), 67–76.Google Scholar
Devroye L, Györfi L, Lugosi G (1996) A Probabilistic Theory of Pattern Recognition (Springer, Berlin).Google Scholar
Fisher A, Rudin C, Dominici F (2019) All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously. J. Machine Learn. Res. 20(177):1–81.Google Scholar
Friedman JH, Stuetzle W, Schroeder A (1984) Projection pursuit density estimation. J. Amer. Statist. Assoc. 79(387):599–608.Google Scholar
Holmström L, Karttunen K, Klemelä J (2015) Estimation of level set trees using adaptive partitions. Comput. Statist. 32(3):1139–1163.Google Scholar
Jebara TS (2008) Bayesian out-trees. Proc. Twenty-Fourth Conf. Uncertainty Artificial Intelligence (UAI) (AUAI Press, Arlington, VA), 315–324.Google Scholar
Larson J, Mattu S, Kirchner L, Angwin J (2016) How we analyzed the COMPAS recidivism algorithm. ProPublica, https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm.Google Scholar
Letham B, Rudin C, McCormick TH, Madigan D (2015) Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model. Ann. Appl. Statist. 9(3):1350–1371.Google Scholar
Li JQ, Barron AR (2000) Mixture density estimation. Solla SA, Leen TK, Müller K, eds. Adv. Neural Inform. Processing Systems, vol. 12 (MIT Press, Cambridge, MA), 279–285.Google Scholar
Li D, Yang K, Wong WH (2016) Density estimation via discrepancy based adaptive sequential partition. Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 29 (Curran Associates, Inc., Red Hook, NY), 1091–1099.Google Scholar
Liu H, Lafferty J, Wasserman L (2007) Sparse nonparametric density estimation in high dimensions using the rodeo. Meila M, Shen X, eds. Proc. Eleventh Internat. Conf. Artificial Intelligence Statist., vol. 2 (PMLR, New York), 283–290.Google Scholar
Liu H, Xu M, Gu H, Gupta A, Lafferty J, Wasserman L (2011) Forest density estimation. J. Machine Learn. Res. 12(25):907–951.Google Scholar
Liu Q, Xu J, Jiang R, Wong WH (2021) Density estimation using deep generative neural networks. Proc. Natl. Acad. Sci. USA 118(15):e2101344118.Google Scholar
Lu L, Jiang H, Wong WH (2013) Multivariate density estimation by Bayesian sequential partitioning. J. Amer. Statist. Assoc. 108(504):1402–1410.Google Scholar
Lugosi G, Nobel A (1996) Consistency of data-driven histogram methods for density estimation and classification. Ann. Statist. 24(2):687–706.Google Scholar
Luo R, Liu A, Wang Y (2019) Combining smoothing spline with conditional Gaussian graphical model for density and graph estimation. Preprint, submitted March 30, https://arxiv.org/abs/1904.00204.Google Scholar
Mahapatruni RSG, Gray A (2011) Cake: Convex adaptive kernel density estimation. Gordon G, Dunson D, Dudík M, eds. Proc. Fourteenth Internat. Conf. Artificial Intelligence Statist., vol. 15 (PMLR, New York), 498–506.Google Scholar
Müller P, Quintana FA (2004) Nonparametric Bayesian data analysis. Statist. Sci. 19(1):95–110.Google Scholar
Nadaraya ÉA (1970) Remarks on non-parametric estimates for density functions and regression curves. Theory Probab. Its Appl. 15(1):134–137.Google Scholar
Ooi H (2012) Density visualization and mode hunting using trees. J. Comput. Graphical Statist. 11(2):328–347.Google Scholar
Ormoneit D, Tresp V (1996) Improved gaussian mixture density estimates using Bayesian penalty terms and network averaging. Touretzky DS, Mozer MC, Hasselmo ME, eds. Adv. Neural Inform. Processing Systems, vol. 8 (MIT Press, Cambridge, MA), 542–548.Google Scholar
Ormoneit D, Tresp V (1998) Averaging, maximum penalized likelihood and Bayesian estimation for improving gaussian mixture probability density estimates. IEEE Trans. Neural Networks Learn. Systems 9(4):639–650.Google Scholar
Parzen E (1962) On estimation of a probability density function and mode. Ann. Math. Statist. 33(3):1065–1076.Google Scholar
Patki N, Wedge R, Veeramachaneni K (2016) The synthetic data vault. 2016 IEEE Internat. Conf. Data Sci. Adv. Analytics (DSAA) (IEEE, Piscataway, NJ), 399–410.Google Scholar
Ram P, Gray AG (2011) Density estimation trees. Proc. 17th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 627–635.Google Scholar
Rehn P, Ahmadi Z, Kramer S (2018) Forest of normalized trees: Fast and accurate density estimation of streaming data. 2018 IEEE 5th Internat. Conf. Data Sci. Adv. Analytics (DSAA) (IEEE, Piscataway, NJ), 199–208.Google Scholar
Rejtö L, Révész P (1973) Density estimation and pattern classification. Problems Control Inform. Theory 2(1):67–80.Google Scholar
Rosenblatt M (1956) Remarks on some nonparametric estimates of a density function. Ann. Math. Statist. 27(3):832–837.Google Scholar
Rudin C, Ertekin S (2018) Learning customized and optimized lists of rules with mathematical programming. Math. Programming Comput. 10:659–702.Google Scholar
Sasaki H, Hyvärinen A (2018) Neural-kernelized conditional density estimation. Preprint, submitted June 5, https://arxiv.org/abs/1806.01754.Google Scholar
Scott DW (1979) On optimal and data-based histograms. Biometrika 66(3):605–610.Google Scholar
Seidl T, Assent I, Kranen P, Krieger R, Herrmann J (2009) Indexing density models for incremental learning and anytime classification on data streams. Proc. 12th EDBT/ICDT (ACM, New York), 311–322.Google Scholar
Semenova L, Rudin C, Parr R (2022) On the existence of simpler machine learning models. Proc. ACM Conf. Fairness Accountability Transparency (ACM FAccT) (ACM, New York), 1827–1858.Google Scholar
Silverman BW (1986) Density Estimation for Statistics and Data Analysis, vol. 26 (CRC Press, Boca Raton, FL).Google Scholar
Varet S, Lacour C, Massart P, Rivoirard V (2023) Numerical performance of penalized comparison with overfitting for multivariate kernel density estimation. ESAIM Probab. Statist. 27:621–667.Google Scholar
Wand MP (1997) Data-based choice of histogram bin width. Amer. Statist. 51(1):59–64.Google Scholar
Wasserman L (2006) All of Nonparametric Statistics (Springer Science & Business Media, New York).Google Scholar
Willett RM, Nowak RD (2007) Minimax optimal level-set estimation. IEEE Trans. Image Processing 16(12):2965–2979.Google Scholar
Wong WH, Ma L (2010) Optional pólya tree and Bayesian inference. Ann. Statist. 38(3):1433–1459.Google Scholar
Wu K, Hou W, Yang H (2018) Density estimation via the random forest method. Comm. Statist. Theory Methods 47(4):877–889.Google Scholar
Wu Y, Tjelmeland H, West M (2007) Bayesian CART: Prior specification and posterior simulation. J. Comput. Graphical Statist. 16(1):44–66.Google Scholar
Yang K, Wong WH (2014a) Density estimation via adaptive partition and discrepancy control. Preprint, submitted April 5, https://arxiv.org/abs/1404.1425.Google Scholar
Yang K, Wong WH (2014b) Discovering and visualizing hierarchy in multivariate data. Preprint, submitted March 18, https://arxiv.org/abs/1403.4370.Google Scholar
Yang H, Rudin C, Seltzer M (2017) Scalable Bayesian rule lists. Precup D, Teh YW, eds. Proc. 34th Internat. Conf. Machine Learn., vol. 70 (PMLR, New York), 3921–3930.Google Scholar
Zhao LC, Krishnaiah PR, Chen XR (1991) Almost sure ℓr-norm convergence for data-based histogram density estimates. Theory Probab. Its Appl. 35(2):396–403.Google Scholar
Zhuang X, Huang Y, Palaniappan K, Zhao Y (1996) Gaussian mixture density modeling, decomposition, and applications. IEEE Trans. Image Processing 5(9):1293–1302.Google Scholar

cover image INFORMS Journal on Data Science

Volume 3, Issue 1

April-June 2024

Pages 1-104, C2

Article Information

Supplemental Material

Metrics

Information

Received:January 01, 2021
Accepted:October 25, 2023
Published Online:January 11, 2024

Cite as

Siong Thye Goh, Lesia Semenova, Cynthia Rudin (2024) Sparse Density Trees and Lists: An Interpretable Alternative to High-Dimensional Histograms. INFORMS Journal on Data Science 3(1):28-48.

https://doi.org/10.1287/ijds.2021.0001

Keywords

Acknowledgments

The authors thank the review and editorial team of INFORMS Journal on Data Science for their valuable feedback on the manuscript. S. T. Goh and L. Semenova contributed equally.

PDF download

Available Issues