Clustering and Representative Selection for High-Dimensional Data with Human-in-the-Loop

Published Online:https://doi.org/10.1287/ijds.2022.9014

References

  • Afrabandpey H, Peltola T, Kaski S (2019) Human-in-the-loop active covariance learning for improving prediction in small data sets. Kraus S, ed. Proc. 28th Internat. Joint Conf. Artificial Intelligence, 1959–1966 (AAAI Press, Washinton, DC).Google Scholar
  • Amershi S, Cakmak M, Knox WB, Kulesza T (2014) Power to the people: The role of humans in interactive machine learning. AI Magazine 35(4):105–120.Google Scholar
  • Basu S, Banerjee A, Mooney RJ (2004) Active semi-supervision for pairwise constrained clustering. Berry MW, Dayal U, Kamath C, Skillicorn D, eds. Proc. SIAM Internat. Conf. Data Mining (SIAM, Philadelphia, PA), 333–344.Google Scholar
  • Breheny P, Huang J (2011) Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Statist. 5(1):232.Google Scholar
  • Bühlmann P, Rütimann P, van de Geer S, Zhang CH (2013) Correlated variables in regression: Clustering and sparse estimation. J. Statist. Planning Inference 143(11):1835–1858.Google Scholar
  • Cantor DS, Stevens E (2009) QEEG correlates of auditory-visual entrainment treatment efficacy of refractory depression. J. Neurotherapy 13(2):100–108.Google Scholar
  • Conforti M, Cornuéjols G, Zambelli G (2014) Integer Programming Models (Springer International Publishing, Cham, Switzerland).Google Scholar
  • Dettling M, Bühlmann P (2004) Finding predictive gene groups from microarray data. J. Multivariate Anal. (Oxford) 90(1):106–131.Google Scholar
  • Fails JA, Olsen DR Jr (2003) Interactive machine learning. Johnson WL, Andre E, Domingue J, eds. Proc. 8th Internat. Conf. Intelligent User Interfaces (Association for Computing Machinery, New York), 39–45.Google Scholar
  • Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96(456):1348–1360.Google Scholar
  • Fan J, Shao QM, Zhou WX (2018) Are discoveries spurious? Distributions of maximum spurious correlations and their applications. Ann. Statist. 46(3):989.Google Scholar
  • Friedman J, Hastie T, Höfling H, Tibshirani R (2007) Pathwise coordinate optimization. Ann. Appl. Statist. 1(2):302–332.Google Scholar
  • Gkorou D, Larrañaga M, Ypma A, Hasibi F, Wijk R (2020) Get a human-in-the-loop: Feature engineering via interactive visualizations. Kottke D, Krempl G, Lemaire V, Holzinger A, Calma A, eds. Proc. Workshop Interactive Adaptive Learn. Co-located European Conf. Machine Learn. Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2020) (Vilnius, Lithuania), vol. 2660, 90–95.Google Scholar
  • Hastie T, Tibshirani R, Botstein D, Brown P (2001) Supervised harvesting of expression trees. Genome Biology 2(1):research0003–1.Google Scholar
  • Hennig C (2007) Cluster-wise assessment of cluster stability. Comput. Statist. Data Anal. (Oxford) 52(1):258–271.Google Scholar
  • Hunter DR, Li R (2005) Variable selection using MM algorithms. Ann. Statist. 33(4):1617.Google Scholar
  • Kaufman L, Rousseeuw PJ (2009) Finding Groups in Data: An Introduction to Cluster Analysis, vol. 344 (John Wiley & Sons, New York).Google Scholar
  • Park MY, Hastie T, Tibshirani R (2007) Averaged gene expressions for regression. Biostatistics 8(2):212–227.Google Scholar
  • Pochet Y, Wolsey LA (2006) Production Planning by Mixed Integer Programming, vol. 149 (Springer Science & Business Media, New York).Google Scholar
  • Raghupathi W, Raghupathi V (2014) Big data analytics in healthcare: Promise and potential. Health Inform. Sci. Systems 2(1):1–10.Google Scholar
  • Schubert E, Rousseeuw PJ (2019) Faster k-medoids clustering: Improving the PAM, CLARA, and CLARANS algorithms. Amato G, Gennaro C, Oria V, Radovanovic M, eds. Proc. Internat. Conf. Similarity Search Applications (Springer, Cham, Switzerland), 171–187.Google Scholar
  • Scott DW (1991) Feasibility of multivariate density estimates. Biometrika 78(1):197–205.Google Scholar
  • Sharma DB, Bondell HD, Zhang HH (2013) Consistent group identification and variable selection in regression with correlated predictors. J. Comput. Graphics Statist. 22(2):319–340.Google Scholar
  • Storey JD, Tibshirani R (2003) Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA 100(16):9440–9445.Google Scholar
  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B (Methodological) 58(1):267–288.Google Scholar
  • Tseng GC (2007) Penalized and weighted k-means for clustering with scattered objects and prior information in high-throughput biological data. Bioinformatics 23(17):2247–2255.Google Scholar
  • Van der Maaten L, Hinton G (2008) Visualizing data using T-SNE. J. Machine Learn. Res. 9(86):2579–2605.Google Scholar
  • van Hateren JH, Ruderman DL (1998) Independent component analysis of natural image sequences yields spatio-temporal filters similar to simple cells in primary visual cortex. Proc. Biological Sci. 265(1412):2315–2320.Google Scholar
  • Wang G, Sarkar A, Carbonetto P, Stephens M (2020) A simple new approach to variable selection in regression, with application to genetic fine-mapping. J. Roy. Statist. Soc. Ser. B Statist. Methodology 82(5):1273–1300.Google Scholar
  • Witten DM, Shojaie A, Zhang F (2014) The cluster elastic net for high-dimensional regression with unknown variable grouping. Technometrics 56(1):112–122.Google Scholar
  • Yang ST (2023) Analysis of high-dimensional data with variable clustering and selection. PhD thesis, Georgia Institute of Technology, Atlanta.Google Scholar
  • Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J. Roy. Statist. Soc. Ser. B (Statist. Methodological) 68(1):49–67.Google Scholar
  • Zhang CH (2010) Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38(2):894–942.Google Scholar
  • Zou H (2006) The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101(476):1418–1429.Google Scholar
  • Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J. Roy. Statist. Soc. Ser. B (Statist. Methodological) 67(2):301–320.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.