Data Exploration by Representative Region Selection: Axioms and Convergence
Published Online:18 Mar 2021https://doi.org/10.1287/moor.2020.1115
References
- [1] (2005) Geometric approximation via coresets. Goodman JE , Pach J , Welzl E , eds. Combinatorial and Computational Geometry, vol. 52 (Cambridge University Press, Cambridge, UK), 1–30.Google Scholar
- [2] (2002) Approximate clustering via core-sets. Vitter JS, Spirakis P, Yannakakis M, eds. Proc. 34th Annual ACM Sympos. Theory Comput. (Association for Computing Machinery, New York), 250–257. Google Scholar
- [3] (2009) Measures of clustering quality: A working set of axioms for clustering . Koller D , Schuurmans D , Bengio Y , Bottou L , eds. Advances in Neural Information Processing Systems, vol. 21 (Curran Associates, Inc., Red Hook, NY), 121–128.Google Scholar
- [4] (2002) VAT: A tool for visual assessment of (cluster) tendency. Giles CL, ed. Proc. 2002 Internat. Joint Conf. Neural Networks, vol. 3 (Institute of Electrical and Electronics Engineers, Piscataway, NJ), 2225–2230.Google Scholar
- [5] (2008) Consistency of random forests and other averaging classifiers. J. Machine Learn. Res. 9:2015–2033.Google Scholar
- [6] (2012) Frequent item set mining. WIREs Data Mining Knowledge Discovery 2(6):437–456.Google Scholar
- [7] (2010) Dimension reduction: A guided tour. Found. Trends Machine Learn. 2(4):275–365.Google Scholar
- [8] (2009) Multivariate mode hunting: Data analytic tools with measures of significance. J. Multivariate Anal. 100(6):1198–1218.Google Scholar
- [9] (2010) Characterization, stability and convergence of hierarchical clustering methods. J. Machine Learn. Res. 11:1425–1470.Google Scholar
- [10] (2003) Dominant, an algorithm for the p-center problem. Eur. J. Oper. Res. 149(1):53–64.Google Scholar
- [11] (2009) New relaxation-based algorithms for the optimal solution of the continuous and discrete p-center problems. Comput. Oper. Res. 36(5):1646–1655.Google Scholar
- [12] (1997) OptiSim: An extended dissimilarity selection method for finding diverse representative subsets. J. Chemical Inform. Comput. Sci. 37(6):1181–1188.Google Scholar
- [13] (2002) Representative subset selection. Analytica Chimica Acta 468(1):91–103.Google Scholar
- [14] (2011) Bee colony optimization for the p-center problem. Comput. Oper. Res. 38(10):1367–1376.Google Scholar
- [15] (2004) A new formulation and resolution method for the p-center problem. INFORMS J. Comput. 16(1):84–94.Google Scholar
- [16] (1996) A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise. Simoudis E, Han J, Fayyad U, eds. Proc. Second Internat. Conf. Knowledge Discovery Data Mining (AAAI Press, Menlo Park, CA), 226–231.Google Scholar
- [17] (2018) Unsupervised prototype reduction for data exploration and an application to air traffic management initiatives. EURO J. Transportation Logist. 8(5):467–510.Google Scholar
- [18] (2014) A survey of clustering algorithms for big data: Taxonomy and empirical analysis. IEEE Trans. Emerging Topics Comput. 2(3):267–279.Google Scholar
- [19] (2008) A survey of kernel and spectral methods for clustering. Pattern Recognition 41(1):176–190.Google Scholar
- [20] (2002) A survey of dimension reduction techniques. Technical Report UCRL-ID-148494, Lawrence Livermore National Laboratory, Livermore, CA.Google Scholar
- [21] (1999) Bump hunting in high-dimensional data. Statist. Comput. 9(2):123–143.Google Scholar
- [22] (1986) On improving density estimators which are not bona fide functions. Ann. Statist. 14(4):1612–1618.Google Scholar
- [23] (2012) Prototype selection for nearest neighbor classification: Taxonomy and empirical study. IEEE Trans. Pattern Anal. Machine Intelligence 34(3):417–435.Google Scholar
- [24] (1990) Computers and Intractability: A Guide to the Theory of NP-Completeness (W. H. Freeman & Co., New York).Google Scholar
- [25] (2003) Correction of density estimators that are not densities. Scandinavian J. Statist. 30(2):415–427.Google Scholar
- [26] (1985) Clustering to minimize the maximum intercluster distance. Theoret. Comput. Sci. 38(Supplement C):293–306.Google Scholar
- [27] (2017) Identifying similar days for air traffic management. J. Air Transport Management 65(October):144–155.Google Scholar
- [28] (2001) On clustering validation techniques. J. Intelligent Inform. Systems 17(2):107–145.Google Scholar
- [29] (2004) On coresets for k-means and k-median clustering. Kleinberg J, ed. Proc. 36th Annual ACM Sympos. Theory Comput. (Association for Computing Machinery, New York), 291–300.Google Scholar
- [30] (1981) Consistency of single linkage for high-density clusters. J. Amer. Statist. Assoc. 76(374):388–394.Google Scholar
- [31] (2003) Lexicographic local search and the p-center problem. Eur. J. Oper. Res. 151(2):265–279.Google Scholar
- [32] (2012) An efficient formulation of the improved visual assessment of cluster tendency (iVAT) algorithm. IEEE Trans. Knowledge Data Engrg. 24(5):813–822.Google Scholar
- [33] (1998) Fundamentals of Domination in Graphs (CRC Press, New York).Google Scholar
- [34] (1991) Bibliography on domination in graphs and some basic definitions of domination parameters. Hedetniemi ST, ed. Topics on Domination. Annals of Discrete Mathematics, vol. 48 (Elsevier, Amsterdam), 257–277.Google Scholar
- [35] (2006) An enhanced ant colony optimization metaheuristic for the minimum dominating set problem. Applied Artificial Intelligence 20(10):881–903.Google Scholar
- [36] (1985) A best possible heuristic for the k-center problem. Math. Oper. Res. 10(2):180–184.Google Scholar
- [37] (2008) A survey of manifold-based learning methods. Liao TW, Triantaphyllou E, eds. Recent Advances in Data Mining of Enterprise Data: Algorithms and Applications (World Scientific Publishing Co., Singapore), 691–746.Crossref, Google Scholar
- [38] (2002) An efficient exact algorithm for the vertex p-center problem and computational experiments for different set covering subproblems. Technical Report 588, Department of Industrial Engineering, Bilkent University, Ankara, Turkey.Google Scholar
- [39] (2007) Automatic summarising: The state of the art. Inform. Processing Management 43(6):1449–1481.Google Scholar
- [40] (1998) On the Devroye-Györfi methods of correcting density estimators. Statist. Probab. Lett. 37(3):249–257.Google Scholar
- [41] (1969) Computer aided design of experiments. Technometrics 11(1):137–148.Google Scholar
- [42] (2002) An impossibility theorem for clustering. Becker S, Thrun S, Obermayer K, eds. Proc. 15th Internat. Conf. Neural Inform. Processing Systems (MIT Press, Cambridge, MA), 463–470.Google Scholar
- [43] (2007) Introduction to Empirical Processes and Semiparametric Inference (Springer, New York).Google Scholar
- [44] (2009) Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans. Knowledge Discovery Data 3(1):1.Google Scholar
- [45] (2012) Text summarisation in progress: A literature review. Artifcial Intelligence Rev. 37(1):1–41.Google Scholar
- [46] (2013) Summarizing categorical data by clustering attributes. Data Mining Knowledge Discovery 26(1):130–173.Google Scholar
- [47] (2012) Summarizing data succinctly with the most informative itemsets. ACM Trans. Knowledge Discovery Data 6(4):16.Google Scholar
- [48] (2006) Quantile regression forests. J. Machine Learn. Res. 7:983–999.Google Scholar
- [49] (2003) Approximation algorithms for the k-center problem: An experimental evaluation. Leopold-Wildburger U, Rendl F, Wäscher G, eds. Internat. Conf. Oper. Res. 2002 (Springer, Berlin), 371–376.Google Scholar
- [50] (1970) The m-center problem. SIAM Rev. 12(1):138–139.Google Scholar
- [51] (2003) Solving the p-center problem with tabu search and variable neighborhood search. Networks 42(1):48–64.Google Scholar
- [52] (2017) Algorithms for hierarchical clustering: An overview, II. WIREs Data Mining Knowledge Discovery 7(6):e1219.Google Scholar
- [53] (2011) Automatic summarization. Found. Trends Inform. Retrieval 5(23):103–233.Google Scholar
- [54] (1991) Analysis of a greedy heuristic for finding small dominating sets in graphs. Inform. Processing Lett. 39(5):237–240.Google Scholar
- [55] (1981) Strong consistency of k-means clustering. Ann. Statist. 9(1):135–140.Google Scholar
- [56] (2000) A theory of proximity based clustering: structure detection by optimization. Pattern Recognition 33(4):617–634.Google Scholar
- [57] (2011) Density estimation trees. Ghosh J, Smyth P, eds. Proc. 17th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 627–635.Google Scholar
- [58] (2005) Solving the k-center problem efficiently with a dominating set algorithm. J. Comput. Inform. Tech. 13(3):225–234.Crossref, Google Scholar
- [59] (2002) Experimental analysis of heuristic algorithms for the dominating set problem. Algorithmica 33(1):3–18.Google Scholar
- [60] (2017) DBSCAN revisited, revisited: Why and how you should (still) use DBSCAN. ACM Trans. Database Systems 42(3):19:1–19:21.Google Scholar
- [61] (2015) Consistency of random forests. Ann. Statist. 43(4):1716–1741.Google Scholar
- [62] (2015) Multivariate Density Estimation: Theory, Practice, and Visualization (John Wiley & Sons, New York, NY).Google Scholar
- [63] (2004) Density estimation. Statist. Sci. 19(4):588–597.Google Scholar
- [64] (2009) Empirical Processes with Applications to Statistics (Society for Industrial and Applied Mathematics, Philadelphia).Google Scholar
- [65] (2012) Regularized k-means clustering of high-dimensional data and its asymptotic consistency. Electronic J. Statist. 6:148–167.Google Scholar
- [66] (2012) A taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Trans. Systems, Man, Cybernetics, Part C (Appl. Rev.) 42(1):86–100.Google Scholar
- [67] (2014) Axioms for graph clustering quality functions. J. Machine Learn. Res. 15(6):193–215.Google Scholar
- [68] (2013) On the uniform convergence of the frequencies of occurrence of events to their probabilities. Schölkopf B, Luo Z, Vovk V, eds. Empirical Inference: Festschrift in Honor of Vladimir N. Vapnik (Springer, Berlin), 7–12.Crossref, Google Scholar
- [69] (2008) Consistency of spectral clustering. Ann. Statist. 36(2):555–586.Google Scholar
- [70] (2010) iVAT and aVAT: Enhanced visual analysis for cluster tendency assessment. Zaki MJ, Yu JX, Ravindran B, Pudi V, eds. Advances in Knowledge Discovery and Data Mining (Springer, Berlin), 16–27.Google Scholar
- [71] (1963) Hierarchical grouping to optimize an objective function. J. Amer. Statist. Assoc. 58(301):236–244.Google Scholar
- [72] (2005) Survey of clustering algorithms. IEEE Trans. Neural Networks 16(3):645–678.Google Scholar
- [73] (2005) Summarizing itemset patterns: A profile-based approach. Bayardo R, Bennett K, eds. Proc. 11th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 314–323.Google Scholar

