Sparse Density Trees and Lists: An Interpretable Alternative to High-Dimensional Histograms

Published Online:https://doi.org/10.1287/ijds.2021.0001

References

  • Abou-Jaoude S (1976) Conditions nécessaires et suffisantes de convergence l1 en probabilité de l’histogramme pour une densité. Ann. Inst. Henri Poincare Probab. Statist. 12(3):213–231.Google Scholar
  • Akaike H (1954) An approximation to the density function. Ann. Inst. Statist. Math. 6(2):127–132.Google Scholar
  • Anderlini L (2015) Density estimation trees in high energy physics. Preprint, submitted February 3, https://arxiv.org/abs/1502.00932.Google Scholar
  • Anderlini L (2016) Density estimation trees as fast non-parametric modelling tools. J. Physics Conf. Ser. 762(1):012042.Google Scholar
  • Angelino E, Larus-Stone N, Alabi D, Seltzer M, Rudin C (2018) Learning certifiably optimal rule lists for categorical data. J. Machine Learn. Res. 18(234):1–78.Google Scholar
  • Cacoullos T (1966) Estimation of a multivariate density. Ann. Inst. Statist. Math. 18(1):179–189.Google Scholar
  • Caesar H, Uijlings J, Ferrari V (2018) COCO-Stuff: Thing and stuff classes in context. 2018 IEEE Conf. Comput. Vision Pattern Recognition (CVPR) (IEEE, Piscataway, NJ).Google Scholar
  • Cattaneo MD, Jansson M, Ma X (2020) Simple local polynomial density estimators. J. Amer. Statist. Assoc. 115(531):1449–1455.Google Scholar
  • Chakrabarti S, Ester M, Fayyad U, Gehrke J, Han J, Morishita S, Piatetsky-Shapiro G, Wang W (2006) Data mining curriculum: A proposal (version 1.0). Intensive Working Group of ACM SIGKDD Curriculum Committee, vol. 140 (ACM, New York), 1–10.Google Scholar
  • Chen T, Morris J, Martin E (2006) Probability density estimation via an infinite gaussian mixture model: Application to statistical process monitoring. J. Roy. Statist. Soc. Ser. C Appl. Statist. 55(5):699–715.Google Scholar
  • Chipman HA, George EI, McCulloch RE (2010) Bart: Bayesian additive regression trees. Ann. Appl. Statist. 4(1):266–298.Google Scholar
  • Devroye L (1991) Exponential inequalities in nonparametric estimation. Roussas G, ed. Nonparametric Functional Estimation and Related Topics, NATO ASI Series, vol. 335 (Springer, Dordrecht, Netherlands), 31–44.Google Scholar
  • Devroye L, Györfi L (1983) Distribution-free exponential bound on the l1 error of partitioning estimates of a regression function. Konecny F, Mogyoródi J, Wertz W, eds. Probability and Statistical Decision Theory (D. Reidel, Dordrecht, Netherlands), 67–76.Google Scholar
  • Devroye L, Györfi L, Lugosi G (1996) A Probabilistic Theory of Pattern Recognition (Springer, Berlin).Google Scholar
  • Fisher A, Rudin C, Dominici F (2019) All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously. J. Machine Learn. Res. 20(177):1–81.Google Scholar
  • Friedman JH, Stuetzle W, Schroeder A (1984) Projection pursuit density estimation. J. Amer. Statist. Assoc. 79(387):599–608.Google Scholar
  • Holmström L, Karttunen K, Klemelä J (2015) Estimation of level set trees using adaptive partitions. Comput. Statist. 32(3):1139–1163.Google Scholar
  • Jebara TS (2008) Bayesian out-trees. Proc. Twenty-Fourth Conf. Uncertainty Artificial Intelligence (UAI) (AUAI Press, Arlington, VA), 315–324.Google Scholar
  • Larson J, Mattu S, Kirchner L, Angwin J (2016) How we analyzed the COMPAS recidivism algorithm. ProPublica, https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm.Google Scholar
  • Letham B, Rudin C, McCormick TH, Madigan D (2015) Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model. Ann. Appl. Statist. 9(3):1350–1371.Google Scholar
  • Li JQ, Barron AR (2000) Mixture density estimation. Solla SA, Leen TK, Müller K, eds. Adv. Neural Inform. Processing Systems, vol. 12 (MIT Press, Cambridge, MA), 279–285.Google Scholar
  • Li D, Yang K, Wong WH (2016) Density estimation via discrepancy based adaptive sequential partition. Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 29 (Curran Associates, Inc., Red Hook, NY), 1091–1099.Google Scholar
  • Liu H, Lafferty J, Wasserman L (2007) Sparse nonparametric density estimation in high dimensions using the rodeo. Meila M, Shen X, eds. Proc. Eleventh Internat. Conf. Artificial Intelligence Statist., vol. 2 (PMLR, New York), 283–290.Google Scholar
  • Liu H, Xu M, Gu H, Gupta A, Lafferty J, Wasserman L (2011) Forest density estimation. J. Machine Learn. Res. 12(25):907–951.Google Scholar
  • Liu Q, Xu J, Jiang R, Wong WH (2021) Density estimation using deep generative neural networks. Proc. Natl. Acad. Sci. USA 118(15):e2101344118.Google Scholar
  • Lu L, Jiang H, Wong WH (2013) Multivariate density estimation by Bayesian sequential partitioning. J. Amer. Statist. Assoc. 108(504):1402–1410.Google Scholar
  • Lugosi G, Nobel A (1996) Consistency of data-driven histogram methods for density estimation and classification. Ann. Statist. 24(2):687–706.Google Scholar
  • Luo R, Liu A, Wang Y (2019) Combining smoothing spline with conditional Gaussian graphical model for density and graph estimation. Preprint, submitted March 30, https://arxiv.org/abs/1904.00204.Google Scholar
  • Mahapatruni RSG, Gray A (2011) Cake: Convex adaptive kernel density estimation. Gordon G, Dunson D, Dudík M, eds. Proc. Fourteenth Internat. Conf. Artificial Intelligence Statist., vol. 15 (PMLR, New York), 498–506.Google Scholar
  • Müller P, Quintana FA (2004) Nonparametric Bayesian data analysis. Statist. Sci. 19(1):95–110.Google Scholar
  • Nadaraya ÉA (1970) Remarks on non-parametric estimates for density functions and regression curves. Theory Probab. Its Appl. 15(1):134–137.Google Scholar
  • Ooi H (2012) Density visualization and mode hunting using trees. J. Comput. Graphical Statist. 11(2):328–347.Google Scholar
  • Ormoneit D, Tresp V (1996) Improved gaussian mixture density estimates using Bayesian penalty terms and network averaging. Touretzky DS, Mozer MC, Hasselmo ME, eds. Adv. Neural Inform. Processing Systems, vol. 8 (MIT Press, Cambridge, MA), 542–548.Google Scholar
  • Ormoneit D, Tresp V (1998) Averaging, maximum penalized likelihood and Bayesian estimation for improving gaussian mixture probability density estimates. IEEE Trans. Neural Networks Learn. Systems 9(4):639–650.Google Scholar
  • Parzen E (1962) On estimation of a probability density function and mode. Ann. Math. Statist. 33(3):1065–1076.Google Scholar
  • Patki N, Wedge R, Veeramachaneni K (2016) The synthetic data vault. 2016 IEEE Internat. Conf. Data Sci. Adv. Analytics (DSAA) (IEEE, Piscataway, NJ), 399–410.Google Scholar
  • Ram P, Gray AG (2011) Density estimation trees. Proc. 17th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 627–635.Google Scholar
  • Rehn P, Ahmadi Z, Kramer S (2018) Forest of normalized trees: Fast and accurate density estimation of streaming data. 2018 IEEE 5th Internat. Conf. Data Sci. Adv. Analytics (DSAA) (IEEE, Piscataway, NJ), 199–208.Google Scholar
  • Rejtö L, Révész P (1973) Density estimation and pattern classification. Problems Control Inform. Theory 2(1):67–80.Google Scholar
  • Rosenblatt M (1956) Remarks on some nonparametric estimates of a density function. Ann. Math. Statist. 27(3):832–837.Google Scholar
  • Rudin C, Ertekin S (2018) Learning customized and optimized lists of rules with mathematical programming. Math. Programming Comput. 10:659–702.Google Scholar
  • Sasaki H, Hyvärinen A (2018) Neural-kernelized conditional density estimation. Preprint, submitted June 5, https://arxiv.org/abs/1806.01754.Google Scholar
  • Scott DW (1979) On optimal and data-based histograms. Biometrika 66(3):605–610.Google Scholar
  • Seidl T, Assent I, Kranen P, Krieger R, Herrmann J (2009) Indexing density models for incremental learning and anytime classification on data streams. Proc. 12th EDBT/ICDT (ACM, New York), 311–322.Google Scholar
  • Semenova L, Rudin C, Parr R (2022) On the existence of simpler machine learning models. Proc. ACM Conf. Fairness Accountability Transparency (ACM FAccT) (ACM, New York), 1827–1858.Google Scholar
  • Silverman BW (1986) Density Estimation for Statistics and Data Analysis, vol. 26 (CRC Press, Boca Raton, FL).Google Scholar
  • Varet S, Lacour C, Massart P, Rivoirard V (2023) Numerical performance of penalized comparison with overfitting for multivariate kernel density estimation. ESAIM Probab. Statist. 27:621–667.Google Scholar
  • Wand MP (1997) Data-based choice of histogram bin width. Amer. Statist. 51(1):59–64.Google Scholar
  • Wasserman L (2006) All of Nonparametric Statistics (Springer Science & Business Media, New York).Google Scholar
  • Willett RM, Nowak RD (2007) Minimax optimal level-set estimation. IEEE Trans. Image Processing 16(12):2965–2979.Google Scholar
  • Wong WH, Ma L (2010) Optional pólya tree and Bayesian inference. Ann. Statist. 38(3):1433–1459.Google Scholar
  • Wu K, Hou W, Yang H (2018) Density estimation via the random forest method. Comm. Statist. Theory Methods 47(4):877–889.Google Scholar
  • Wu Y, Tjelmeland H, West M (2007) Bayesian CART: Prior specification and posterior simulation. J. Comput. Graphical Statist. 16(1):44–66.Google Scholar
  • Yang K, Wong WH (2014a) Density estimation via adaptive partition and discrepancy control. Preprint, submitted April 5, https://arxiv.org/abs/1404.1425.Google Scholar
  • Yang K, Wong WH (2014b) Discovering and visualizing hierarchy in multivariate data. Preprint, submitted March 18, https://arxiv.org/abs/1403.4370.Google Scholar
  • Yang H, Rudin C, Seltzer M (2017) Scalable Bayesian rule lists. Precup D, Teh YW, eds. Proc. 34th Internat. Conf. Machine Learn., vol. 70 (PMLR, New York), 3921–3930.Google Scholar
  • Zhao LC, Krishnaiah PR, Chen XR (1991) Almost sure ℓr-norm convergence for data-based histogram density estimates. Theory Probab. Its Appl. 35(2):396–403.Google Scholar
  • Zhuang X, Huang Y, Palaniappan K, Zhao Y (1996) Gaussian mixture density modeling, decomposition, and applications. IEEE Trans. Image Processing 5(9):1293–1302.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.