On Cluster-Aware Supervised Learning: Frameworks, Convergent Algorithms, and Applications

Published Online:https://doi.org/10.1287/ijoc.2020.1053

References

  • Aliaga V, Ferrelli F, Piccolo M (2017) Regionalization of climate over the Argentine Pampas. Internat. J. Climatol. 37(1):1237–1247.CrossrefGoogle Scholar
  • Arumugam P, Christy V (2018) Analysis of clustering and classification methods for actionable knowledge. Materials Today Proc. 5(1, Part 1):1839–1845.CrossrefGoogle Scholar
  • Attouch H, Bolte J, Redont P, Soubeyran A (2008) Alternating minimization and projection methods for nonconvex problems. Preprint, submitted January 11, https://arxiv.org/abs/0801.1780.Google Scholar
  • Attouch H, Bolte J, Redont P, Soubeyran A (2010) Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality. Math. Oper. Res. 35(2):438–457.LinkGoogle Scholar
  • Bagirov AM, Ugon J, Mirzayeva HG (2015) An algorithm for clusterwise linear regression based on smoothing techniques. Optim. Lett. 9(2):375–390.CrossrefGoogle Scholar
  • Beck A (2015) On the convergence of alternating minimization for convex programming with applications to iteratively reweighted least squares and decomposition schemes. SIAM J. Optim. 25(1):185–209.CrossrefGoogle Scholar
  • Ben-Tal A, Nemirovski A (2001) Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications, vol. 2 (SIAM, Philadelphia).CrossrefGoogle Scholar
  • Boyd S, Dattorro J (2003) Alternating projections. Class Note EE392o, Stanford University, Stanford, CA.Google Scholar
  • Brègman LM (1965) Finding the common point of convex sets by the method of successive projection. Dokl. Akad. Nauk. 162(3):487–490.Google Scholar
  • Chen Y, Jiang H, Li C, Jia X, Ghamisi P (2016) Deep feature extraction and classification of hyperspectral images based on convolutional neural networks. IEEE Trans. Geosci. Remote Sensing 54(10):6232–6251.CrossrefGoogle Scholar
  • Choi E, Bahadori MT, Sun J, Kulas J, Schuetz A, Stewart W, Sun J (2016) RETAIN: An interpretable predictive model for healthcare using reverse time attention mechanism. Lee DD, von Luxburg U, Garnett R, Sugiyama M, Guyon I, eds. Advances in Neural Information Processing Systems (Curran Associates, Red Hook, NY), 3512–3520.Google Scholar
  • Csiszár I, Tusnády G (1984) Information geometry and alternating minimization procedures. Statist. Decis. (Suppl 1):205–237.Google Scholar
  • Cui Z-X, Fan Q (2017) A nonconvex nonsmooth regularization method for compressed sensing and low rank matrix completion. Digital Signal Processing 62(March):101–111.CrossrefGoogle Scholar
  • Czepiel S (2019) Maximum likelihood estimation of logistic regression models: Theory and implementation.Google Scholar
  • Dass J, Sarin V, Mahapatra RN (2019) Fast and communication-efficient algorithm for distributed support vector machine training. IEEE Trans. Parallel Distributed Systems 30(5):1065–1076.CrossrefGoogle Scholar
  • Day NE (1969) Estimating the components of a mixture of normal distributions. Biometrika 56(3):463–474.CrossrefGoogle Scholar
  • Derisma, Silvana M, Imelda (2018) Optimization of neural network with genetic algorithm for breast cancer classification. 2018 Internat. Conf. Information Tech. Systems Innovation (IEEE, Piscataway, NJ), 398–403.Google Scholar
  • Devijver E (2017) Model-based regression clustering for high-dimensional data: Application to functional data. Adv. Data Anal. Classification 11(2):243–279.CrossrefGoogle Scholar
  • Di Mari R, Rocci R, Gattone SA (2017) Clusterwise linear regression modeling with soft scale constraints. Internat. J. Approximate Reasoning 91(December):160–178.CrossrefGoogle Scholar
  • Drusvyatskiy D, Ioffe AD, Lewis AS (2015) Transversality and alternating projections for nonconvex sets. Foundations Comput. Math. 15(6):1637–1651.CrossrefGoogle Scholar
  • Duda RO, Hart PE, Stork DG (2012) Pattern Classification (John Wiley & Sons, New York).Google Scholar
  • Friedman J, Hastie T, Tibshirani R (2010) A note on the group lasso and a sparse group lasso. Preprint, submitted January 5, https://arxiv.org/abs/1001.0736.Google Scholar
  • Gallego A-J, Calvo-Zaragoza J, Valero-Mas JJ, Rico-Juan JR (2018) Clustering-based k-nearest neighbor classification for large-scale data with neural codes representation. Pattern Recognition 74(February):531–543.CrossrefGoogle Scholar
  • Hasselblad V (1966) Estimation of parameters for a mixture of normal distributions. Technometrics 8(3):431–444.CrossrefGoogle Scholar
  • Jacobs RA, Jordan MI, Nowlan SJ, Hinton GE (1991) Adaptive mixtures of local experts. Neural Comput. 3(1):79–87.CrossrefGoogle Scholar
  • Jain P, Netrapalli P, Sanghavi S (2012) Low-rank matrix completion using alternating minimization. Preprint, submitted December 3, https://arxiv.org/abs/1212.0467.Google Scholar
  • Khabia A, Chandak MB (2014) A cluster based approach for classification of web results. Internat. J. Advanced Comput. Res. 4(4):934–938.Google Scholar
  • Khadka M, Paz A, Singh A (2020) Generalised clusterwise regression for simultaneous estimation of optimal pavement clusters and performance models. Internat. J. Pavement Engrg. 21(9):1122–1134.CrossrefGoogle Scholar
  • Kumar CS, George KK, Ramachandran K, Panda A (2015) Weighted cosine distance features for speaker verification. 2015 Annual IEEE India Conf. (IEEE, Piscataway, NJ), 1–5.Google Scholar
  • Lai M-J, Mckenzie D (2020) Compressive sensing approach to cut improvement and local clustering. SIAM J. Math. Data Sci. 2(2):368–395.Google Scholar
  • Lai MJ, Varghese A (2017) On convergence of the alternating projection method for matrix completion and sparse recovery problems. Preprint, submitted November 6, https://arxiv.org/abs/1711.02151.Google Scholar
  • Lei L, Kun S (2016) Speaker recognition using wavelet cepstral coefficient, i-vector, and cosine distance scoring and its application for forensics. J. Electr. Comput. Engrg. 2016:Article 4908412.Google Scholar
  • Lewis AS, Luke DR, Malick J (2009) Local linear convergence for alternating and averaged nonconvex projections. Foundations Comput. Math. 9(4):485–513.CrossrefGoogle Scholar
  • Li D, Deogun JS, Spaulding W, Shuart B (2004) Towards missing data imputation: A study of fuzzy K-means clustering method. Tsumoto S, Slowiński R, Komorowski J, Grzymala-Busse JW, eds. Rough Sets and Current Trends in Computing (Springer, Berlin), 573–579.CrossrefGoogle Scholar
  • Liao X, Li H, Carin L (2014) Generalized alternating projection for weighted-2,1 minimization with applications to model-based compressive sensing. SIAM J. Imaging Sci. 7(2):797–823.CrossrefGoogle Scholar
  • Liu H, Han J, Nie F, Li X (2017) Balanced clustering with least square regression. Shlomo Z, ed. 31st AAAI Conf. Artificial Intelligence (AAAI Press, Menlo Park, CA), 2231–2237.Google Scholar
  • Luo ZQ, Tseng P (1993) Error bounds and convergence analysis of feasible descent methods: A general approach. Ann. Oper. Res. 46–47(1):157–178.CrossrefGoogle Scholar
  • Ma S, Song X, Huang J (2007) Supervised group lasso with applications to microarray data analysis. BMC Bioinformatics 8(1):Article 60.CrossrefGoogle Scholar
  • Mahajan M, Nimbhorkar P, Varadarajan K (2009) The planar k-means problem is NP-hard. Das S, Uehara R, eds. Internat. Workshop Algorithms and Comput. (Springer, Berlin), 274–285.Google Scholar
  • Mary NAB, Dharma D (2017) Coral reef image classification employing improved ldp for feature extraction. J. Visual Comm. Image Representation 49(November):225–242.CrossrefGoogle Scholar
  • Panigrahi L, Verma K, Singh BK (2019) Ultrasound image segmentation using a novel multi-scale Gaussian kernel fuzzy clustering and multi-scale vector field convolution. Expert Systems Appl. 115(January):486–498.CrossrefGoogle Scholar
  • Park YW, Jiang Y, Klabjan D, Williams L (2017) Algorithms for generalized clusterwise linear regression. INFORMS J. Comput. 29(2):301–317.LinkGoogle Scholar
  • Ren D, Hui M, Hu N, Zhan T (2018) A weighted sparse neighbor representation based on Gaussian kernel function to face recognition. Optik 167(August):7–14.CrossrefGoogle Scholar
  • Rokach L, Maimon O (2005) Clustering methods. Maimon O, Rokach L, eds. Data Mining and Knowledge Discovery Handbook (Springer, New York), 321–352.CrossrefGoogle Scholar
  • Rosso G (2014) Outliers emphasis on cluster analysis—The use of squared Euclidean distance and fuzzy clustering to detect outliers in a data set. Preprint, submitted March 21, https://arxiv.org/abs/1403.5417.Google Scholar
  • Schrijver A (1998) Theory of Linear and Integer Programming (John Wiley & Sons, Chichester, UK).Google Scholar
  • Simon N, Friedman J, Hastie T, Tibshirani R (2013) A sparse-group lasso. J. Comput. Graphical Statist. 22(2):231–245.CrossrefGoogle Scholar
  • Späth H (1982) A fast algorithm for clusterwise linear regression. Computing 29(2):175–181.CrossrefGoogle Scholar
  • Stokes MD, Deane GB (2009) Automated processing of coral reef benthic images. Limnol. Oceanogr. Methods 7(2):157–168.CrossrefGoogle Scholar
  • Suo M, Zhu B, Zhang Y, An R, Li S (2018) Fuzzy Bayes risk based on Mahalanobis distance and Gaussian kernel for weight assignment in labeled multiple attribute decision making. Knowledge- Based Systems 152(July):26–39.CrossrefGoogle Scholar
  • Tanner J, Wei K (2016) Low rank matrix completion by alternating steepest descent methods. Appl. Comput. Harmonic Anal. 40(2):417–429.Google Scholar
  • Tran CT, Zhang M, Andreae P, Xue B, Bui LT (2018) Improving performance of classification on incomplete data using feature selection and clustering. Appl. Soft Comput. 73(December):848–861.CrossrefGoogle Scholar
  • von Neumann J (1950) Functional Operators: Measures and Integrals, vol. 1 (Princeton University Press, Princeton, NJ).Google Scholar
  • Vu T (2019) Plug-n-play alternating projection algorithm for large-scale security constraint optimal power flow. Preprint, submitted July 6, https://arxiv.org/abs/1907.03173.Google Scholar
  • Wang B, Kong Y, Zhang Y, Liu D, Ning L (2019) Integration of unsupervised and supervised machine learning algorithms for credit risk assessment. Expert Systems Appl. 128(August):301–315.Google Scholar
  • Weston J, Watkins C (1999) Support vector machines for multi-class pattern recognition. Eur. Sympos. Artificial Neural Networks Proc., Bruges, Belgium, 219–224.Google Scholar
  • Yang B, Wang M, Xu Z, Zhang T (2018) Streaming algorithm for big data logistic regression. Donald K, Bing L, eds. 2018 IEEE Internat. Conf. Big Data (IEEE, Piscataway, NJ), 2940–2950.Google Scholar
  • Yue J, Zhao W, Mao S, Liu H (2015) Spectral–spatial classification of hyperspectral images using deep convolutional neural networks. Remote Sensing Lett. 6(6):468–477.CrossrefGoogle Scholar
  • Zhang B (2003) Regression clustering. Third IEEE Internat. Conf. Data Mining (IEEE, Piscataway, NJ), 451–458.Google Scholar
  • Zhang C, Qin Y, Zhu X, Zhang J, Zhang S (2006) Clustering-based missing value imputation for data preprocessing. 2006 4th IEEE Internat. Conf. Indust. Informatics (IEEE, Piscataway, NJ), 1081–1086.Google Scholar
  • Zhang S, Zhang J, Zhu X, Qin Y, Zhang C (2008) Missing value imputation based on data clustering. Gavrilova ML, Tan CJK, eds. Transactions on Computational Science, vol. 1 (Springer, Berlin), 128–138.Google Scholar
  • Zhao W, Du S (2016) Spectral–spatial feature extraction for hyperspectral image classification: A dimension reduction and deep learning approach. IEEE Trans. Geosci. Remote Sensing 54(8):4544–4554.CrossrefGoogle Scholar
  • Zhao Y, Wang P-H, Li Y-G, Li M-Y (2018) Fuzzy weighted c-harmonic regressions clustering algorithm. Soft Comput. 22(14):4595–4611.CrossrefGoogle Scholar
  • Zheng Y, Hong L (2018) The model of wind power short-term prediction based on artificial fish swarm algorithm of support vector machine. Huachuan Y, ed. 2018 IEEE Internat. Conf. Safety Produce Informatization (IEEE, Piscataway, NJ), 570–574.Google Scholar
  • Zhu Z, Li X (2018) Convergence analysis of alternating nonconvex projections. Preprint, submitted February 12, https://arxiv.org/abs/1802.03889.Google Scholar
  • Zhu Z, Li Y, Kong N (2012) Clusterwise linear regression with the least sum of absolute deviations—An MIP approach. Internat. J. Oper. Res. 9(3):162–172.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.