Data Exploration by Representative Region Selection: Axioms and Convergence

Published Online:https://doi.org/10.1287/moor.2020.1115

References

  • [1] Agarwal PK , Har-Peled S , Varadarajan KR (2005) Geometric approximation via coresets. Goodman JE , Pach J , Welzl E , eds. Combinatorial and Computational Geometry, vol. 52 (Cambridge University Press, Cambridge, UK), 1–30.Google Scholar
  • [2] Bādoiu M , Har-Peled S , Indyk P (2002) Approximate clustering via core-sets. Vitter JS, Spirakis P, Yannakakis M, eds. Proc. 34th Annual ACM Sympos. Theory Comput. (Association for Computing Machinery, New York), 250–257. Google Scholar
  • [3] Ben-David S , Ackerman M (2009) Measures of clustering quality: A working set of axioms for clustering . Koller D , Schuurmans D , Bengio Y , Bottou L , eds. Advances in Neural Information Processing Systems, vol. 21 (Curran Associates, Inc., Red Hook, NY), 121–128.Google Scholar
  • [4] Bezdek JC , Hathaway RJ (2002) VAT: A tool for visual assessment of (cluster) tendency. Giles CL, ed. Proc. 2002 Internat. Joint Conf. Neural Networks, vol. 3 (Institute of Electrical and Electronics Engineers, Piscataway, NJ), 2225–2230.Google Scholar
  • [5] Biau G , Devroye L , Lugosi G (2008) Consistency of random forests and other averaging classifiers. J. Machine Learn. Res. 9:2015–2033.Google Scholar
  • [6] Borgelt C (2012) Frequent item set mining. WIREs Data Mining Knowledge Discovery 2(6):437–456.Google Scholar
  • [7] Burges CJC (2010) Dimension reduction: A guided tour. Found. Trends Machine Learn. 2(4):275–365.Google Scholar
  • [8] Burman P , Polonik W (2009) Multivariate mode hunting: Data analytic tools with measures of significance. J. Multivariate Anal. 100(6):1198–1218.Google Scholar
  • [9] Carlsson G , Mémoli F (2010) Characterization, stability and convergence of hierarchical clustering methods. J. Machine Learn. Res. 11:1425–1470.Google Scholar
  • [10] Caruso C , Colorni A , Aloi L (2003) Dominant, an algorithm for the p-center problem. Eur. J. Oper. Res. 149(1):53–64.Google Scholar
  • [11] Chen D , Chen R (2009) New relaxation-based algorithms for the optimal solution of the continuous and discrete p-center problems. Comput. Oper. Res. 36(5):1646–1655.Google Scholar
  • [12] Clark RD (1997) OptiSim: An extended dissimilarity selection method for finding diverse representative subsets. J. Chemical Inform. Comput. Sci. 37(6):1181–1188.Google Scholar
  • [13] Daszykowski M , Walczak B , Massart D (2002) Representative subset selection. Analytica Chimica Acta 468(1):91–103.Google Scholar
  • [14] Davidović T , Ramljak D , Šelmić M , Teodorović D (2011) Bee colony optimization for the p-center problem. Comput. Oper. Res. 38(10):1367–1376.Google Scholar
  • [15] Elloumi S , Labbé M , Pochet Y (2004) A new formulation and resolution method for the p-center problem. INFORMS J. Comput. 16(1):84–94.Google Scholar
  • [16] Ester M , Kriegel HP , Sander J , Xu X (1996) A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise. Simoudis E, Han J, Fayyad U, eds. Proc. Second Internat. Conf. Knowledge Discovery Data Mining (AAAI Press, Menlo Park, CA), 226–231.Google Scholar
  • [17] Estes A , Lovell DJ , Ball MO (2018) Unsupervised prototype reduction for data exploration and an application to air traffic management initiatives. EURO J. Transportation Logist. 8(5):467–510.Google Scholar
  • [18] Fahad A , Alshatri N , Tari Z , Alamri A , Khalil I , Zomaya AY , Foufou S , Bouras A (2014) A survey of clustering algorithms for big data: Taxonomy and empirical analysis. IEEE Trans. Emerging Topics Comput. 2(3):267–279.Google Scholar
  • [19] Filippone M , Camastra F , Masulli F , Rovetta S (2008) A survey of kernel and spectral methods for clustering. Pattern Recognition 41(1):176–190.Google Scholar
  • [20] Fodor IK (2002) A survey of dimension reduction techniques. Technical Report UCRL-ID-148494, Lawrence Livermore National Laboratory, Livermore, CA.Google Scholar
  • [21] Friedman JH , Fisher NI (1999) Bump hunting in high-dimensional data. Statist. Comput. 9(2):123–143.Google Scholar
  • [22] Gajek L (1986) On improving density estimators which are not bona fide functions. Ann. Statist. 14(4):1612–1618.Google Scholar
  • [23] Garcia S , Derrac J , Cano J , Herrera F (2012) Prototype selection for nearest neighbor classification: Taxonomy and empirical study. IEEE Trans. Pattern Anal. Machine Intelligence 34(3):417–435.Google Scholar
  • [24] Garey MR , Johnson DS (1990) Computers and Intractability: A Guide to the Theory of NP-Completeness (W. H. Freeman & Co., New York).Google Scholar
  • [25] Glad IK , Hjort NL , Ushakov NG (2003) Correction of density estimators that are not densities. Scandinavian J. Statist. 30(2):415–427.Google Scholar
  • [26] Gonzalez TF (1985) Clustering to minimize the maximum intercluster distance. Theoret. Comput. Sci. 38(Supplement C):293–306.Google Scholar
  • [27] Gorripaty S , Liu Y , Hansen M , Pozdnukhov A (2017) Identifying similar days for air traffic management. J. Air Transport Management 65(October):144–155.Google Scholar
  • [28] Halkidi M , Batistakis Y , Vazirgiannis M (2001) On clustering validation techniques. J. Intelligent Inform. Systems 17(2):107–145.Google Scholar
  • [29] Har-Peled S , Mazumdar S (2004) On coresets for k-means and k-median clustering. Kleinberg J, ed. Proc. 36th Annual ACM Sympos. Theory Comput. (Association for Computing Machinery, New York), 291–300.Google Scholar
  • [30] Hartigan JA (1981) Consistency of single linkage for high-density clusters. J. Amer. Statist. Assoc. 76(374):388–394.Google Scholar
  • [31] Hassin R , Levin A , Morad D (2003) Lexicographic local search and the p-center problem. Eur. J. Oper. Res. 151(2):265–279.Google Scholar
  • [32] Havens TC , Bezdek JC (2012) An efficient formulation of the improved visual assessment of cluster tendency (iVAT) algorithm. IEEE Trans. Knowledge Data Engrg. 24(5):813–822.Google Scholar
  • [33] Haynes TW , Hedetniemi S , Slater P (1998) Fundamentals of Domination in Graphs (CRC Press, New York).Google Scholar
  • [34] Hedetniemi S , Laskar R (1991) Bibliography on domination in graphs and some basic definitions of domination parameters. Hedetniemi ST, ed. Topics on Domination. Annals of Discrete Mathematics, vol. 48 (Elsevier, Amsterdam), 257–277.Google Scholar
  • [35] Ho CK , Singh YP , Ewe HT (2006) An enhanced ant colony optimization metaheuristic for the minimum dominating set problem. Applied Artificial Intelligence 20(10):881–903.Google Scholar
  • [36] Hochbaum DS , Shmoys DB (1985) A best possible heuristic for the k-center problem. Math. Oper. Res. 10(2):180–184.Google Scholar
  • [37] Huo X , Ni XS , Smith AK (2008) A survey of manifold-based learning methods. Liao TW, Triantaphyllou E, eds. Recent Advances in Data Mining of Enterprise Data: Algorithms and Applications (World Scientific Publishing Co., Singapore), 691–746.CrossrefGoogle Scholar
  • [38] Ilhan T , Özsoy FA , Pinar M (2002) An efficient exact algorithm for the vertex p-center problem and computational experiments for different set covering subproblems. Technical Report 588, Department of Industrial Engineering, Bilkent University, Ankara, Turkey.Google Scholar
  • [39] Jones KS (2007) Automatic summarising: The state of the art. Inform. Processing Management 43(6):1449–1481.Google Scholar
  • [40] Kaluszka M (1998) On the Devroye-Györfi methods of correcting density estimators. Statist. Probab. Lett. 37(3):249–257.Google Scholar
  • [41] Kennard RW , Stone LA (1969) Computer aided design of experiments. Technometrics 11(1):137–148.Google Scholar
  • [42] Kleinberg J (2002) An impossibility theorem for clustering. Becker S, Thrun S, Obermayer K, eds. Proc. 15th Internat. Conf. Neural Inform. Processing Systems (MIT Press, Cambridge, MA), 463–470.Google Scholar
  • [43] Kosorok MR (2007) Introduction to Empirical Processes and Semiparametric Inference (Springer, New York).Google Scholar
  • [44] Kriegel HP , Kröger P , Zimek A (2009) Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans. Knowledge Discovery Data 3(1):1.Google Scholar
  • [45] Lloret E , Palomar M (2012) Text summarisation in progress: A literature review. Artifcial Intelligence Rev. 37(1):1–41.Google Scholar
  • [46] Mampaey M , Vreeken J (2013) Summarizing categorical data by clustering attributes. Data Mining Knowledge Discovery 26(1):130–173.Google Scholar
  • [47] Mampaey M , Vreeken J , Tatti N (2012) Summarizing data succinctly with the most informative itemsets. ACM Trans. Knowledge Discovery Data 6(4):16.Google Scholar
  • [48] Meinshausen N (2006) Quantile regression forests. J. Machine Learn. Res. 7:983–999.Google Scholar
  • [49] Mihelič J , Robič B (2003) Approximation algorithms for the k-center problem: An experimental evaluation. Leopold-Wildburger U, Rendl F, Wäscher G, eds. Internat. Conf. Oper. Res. 2002 (Springer, Berlin), 371–376.Google Scholar
  • [50] Minieka E (1970) The m-center problem. SIAM Rev. 12(1):138–139.Google Scholar
  • [51] Mladenović N , Labbé M , Hansen P (2003) Solving the p-center problem with tabu search and variable neighborhood search. Networks 42(1):48–64.Google Scholar
  • [52] Murtagh F , Contreras P (2017) Algorithms for hierarchical clustering: An overview, II. WIREs Data Mining Knowledge Discovery 7(6):e1219.Google Scholar
  • [53] Nenkova A , McKeown K (2011) Automatic summarization. Found. Trends Inform. Retrieval 5(23):103–233.Google Scholar
  • [54] Parekh AK (1991) Analysis of a greedy heuristic for finding small dominating sets in graphs. Inform. Processing Lett. 39(5):237–240.Google Scholar
  • [55] Pollard D (1981) Strong consistency of k-means clustering. Ann. Statist. 9(1):135–140.Google Scholar
  • [56] Puzicha J , Hofmann T , Buhmann JM (2000) A theory of proximity based clustering: structure detection by optimization. Pattern Recognition 33(4):617–634.Google Scholar
  • [57] Ram P , Gray AG (2011) Density estimation trees. Ghosh J, Smyth P, eds. Proc. 17th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 627–635.Google Scholar
  • [58] Robič B , Mihelič J (2005) Solving the k-center problem efficiently with a dominating set algorithm. J. Comput. Inform. Tech. 13(3):225–234.CrossrefGoogle Scholar
  • [59] Sanchis LA (2002) Experimental analysis of heuristic algorithms for the dominating set problem. Algorithmica 33(1):3–18.Google Scholar
  • [60] Schubert E , Sander J , Ester M , Kriegel HP , Xu X (2017) DBSCAN revisited, revisited: Why and how you should (still) use DBSCAN. ACM Trans. Database Systems 42(3):19:1–19:21.Google Scholar
  • [61] Scornet E , Biau G , Vert JP (2015) Consistency of random forests. Ann. Statist. 43(4):1716–1741.Google Scholar
  • [62] Scott DW (2015) Multivariate Density Estimation: Theory, Practice, and Visualization (John Wiley & Sons, New York, NY).Google Scholar
  • [63] Sheather SJ (2004) Density estimation. Statist. Sci. 19(4):588–597.Google Scholar
  • [64] Shorack GR , Wellner JA (2009) Empirical Processes with Applications to Statistics (Society for Industrial and Applied Mathematics, Philadelphia).Google Scholar
  • [65] Sun W , Wang J , Fang Y (2012) Regularized k-means clustering of high-dimensional data and its asymptotic consistency. Electronic J. Statist. 6:148–167.Google Scholar
  • [66] Triguero I , Derrac J , Garcia S , Herrera F (2012) A taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Trans. Systems, Man, Cybernetics, Part C (Appl. Rev.) 42(1):86–100.Google Scholar
  • [67] van Laarhoven T , Marchiori E (2014) Axioms for graph clustering quality functions. J. Machine Learn. Res. 15(6):193–215.Google Scholar
  • [68] Vapnik VN , Chervonenkis AY (2013) On the uniform convergence of the frequencies of occurrence of events to their probabilities. Schölkopf B, Luo Z, Vovk V, eds. Empirical Inference: Festschrift in Honor of Vladimir N. Vapnik (Springer, Berlin), 7–12.CrossrefGoogle Scholar
  • [69] von Luxburg U , Belkin M , Bousquet O (2008) Consistency of spectral clustering. Ann. Statist. 36(2):555–586.Google Scholar
  • [70] Wang L , Nguyen UTV , Bezdek JC , Leckie CA , Ramamohanarao K (2010) iVAT and aVAT: Enhanced visual analysis for cluster tendency assessment. Zaki MJ, Yu JX, Ravindran B, Pudi V, eds. Advances in Knowledge Discovery and Data Mining (Springer, Berlin), 16–27.Google Scholar
  • [71] Ward JH Jr (1963) Hierarchical grouping to optimize an objective function. J. Amer. Statist. Assoc. 58(301):236–244.Google Scholar
  • [72] Xu R , Wunsch D (2005) Survey of clustering algorithms. IEEE Trans. Neural Networks 16(3):645–678.Google Scholar
  • [73] Yan X , Cheng H , Han J , Xin D (2005) Summarizing itemset patterns: A profile-based approach. Bayardo R, Bennett K, eds. Proc. 11th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 314–323.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.