On Cluster-Aware Supervised Learning: Frameworks, Convergent Algorithms, and Applications

Shutong Chen
Shutong Chen
[email protected]
School of Business and Management, Donghua University, 200051 Shanghai, China;
Search for more papers by this author
,
Weijun Xie
Corresponding Author
Weijun Xie
[email protected]
https://orcid.org/0000-0001-5157-1194
Department of Industrial and Systems Engineering, Virginia Tech, Blacksburg, Virginia 24061
Search for more papers by this author

Shutong Chen

[email protected]

School of Business and Management, Donghua University, 200051 Shanghai, China;

Search for more papers by this author

Weijun Xie

Corresponding Author

Weijun Xie

[email protected]

https://orcid.org/0000-0001-5157-1194

Department of Industrial and Systems Engineering, Virginia Tech, Blacksburg, Virginia 24061

Search for more papers by this author

Published Online:9 Mar 2021https://doi.org/10.1287/ijoc.2020.1053

References

Aliaga V, Ferrelli F, Piccolo M (2017) Regionalization of climate over the Argentine Pampas. Internat. J. Climatol. 37(1):1237–1247.Crossref, Google Scholar
Arumugam P, Christy V (2018) Analysis of clustering and classification methods for actionable knowledge. Materials Today Proc. 5(1, Part 1):1839–1845.Crossref, Google Scholar
Attouch H, Bolte J, Redont P, Soubeyran A (2008) Alternating minimization and projection methods for nonconvex problems. Preprint, submitted January 11, https://arxiv.org/abs/0801.1780.Google Scholar
Attouch H, Bolte J, Redont P, Soubeyran A (2010) Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality. Math. Oper. Res. 35(2):438–457.Link, Google Scholar
Bagirov AM, Ugon J, Mirzayeva HG (2015) An algorithm for clusterwise linear regression based on smoothing techniques. Optim. Lett. 9(2):375–390.Crossref, Google Scholar
Beck A (2015) On the convergence of alternating minimization for convex programming with applications to iteratively reweighted least squares and decomposition schemes. SIAM J. Optim. 25(1):185–209.Crossref, Google Scholar
Ben-Tal A, Nemirovski A (2001) Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications, vol. 2 (SIAM, Philadelphia).Crossref, Google Scholar
Boyd S, Dattorro J (2003) Alternating projections. Class Note EE392o, Stanford University, Stanford, CA.Google Scholar
Brègman LM (1965) Finding the common point of convex sets by the method of successive projection. Dokl. Akad. Nauk. 162(3):487–490.Google Scholar
Chen Y, Jiang H, Li C, Jia X, Ghamisi P (2016) Deep feature extraction and classification of hyperspectral images based on convolutional neural networks. IEEE Trans. Geosci. Remote Sensing 54(10):6232–6251.Crossref, Google Scholar
Choi E, Bahadori MT, Sun J, Kulas J, Schuetz A, Stewart W, Sun J (2016) RETAIN: An interpretable predictive model for healthcare using reverse time attention mechanism. Lee DD, von Luxburg U, Garnett R, Sugiyama M, Guyon I, eds. Advances in Neural Information Processing Systems (Curran Associates, Red Hook, NY), 3512–3520.Google Scholar
Csiszár I, Tusnády G (1984) Information geometry and alternating minimization procedures. Statist. Decis. (Suppl 1):205–237.Google Scholar
Cui Z-X, Fan Q (2017) A nonconvex nonsmooth regularization method for compressed sensing and low rank matrix completion. Digital Signal Processing 62(March):101–111.Crossref, Google Scholar
Czepiel S (2019) Maximum likelihood estimation of logistic regression models: Theory and implementation.Google Scholar
Dass J, Sarin V, Mahapatra RN (2019) Fast and communication-efficient algorithm for distributed support vector machine training. IEEE Trans. Parallel Distributed Systems 30(5):1065–1076.Crossref, Google Scholar
Day NE (1969) Estimating the components of a mixture of normal distributions. Biometrika 56(3):463–474.Crossref, Google Scholar
Derisma, Silvana M, Imelda (2018) Optimization of neural network with genetic algorithm for breast cancer classification. 2018 Internat. Conf. Information Tech. Systems Innovation (IEEE, Piscataway, NJ), 398–403.Google Scholar
Devijver E (2017) Model-based regression clustering for high-dimensional data: Application to functional data. Adv. Data Anal. Classification 11(2):243–279.Crossref, Google Scholar
Di Mari R, Rocci R, Gattone SA (2017) Clusterwise linear regression modeling with soft scale constraints. Internat. J. Approximate Reasoning 91(December):160–178.Crossref, Google Scholar
Drusvyatskiy D, Ioffe AD, Lewis AS (2015) Transversality and alternating projections for nonconvex sets. Foundations Comput. Math. 15(6):1637–1651.Crossref, Google Scholar
Duda RO, Hart PE, Stork DG (2012) Pattern Classification (John Wiley & Sons, New York).Google Scholar
Friedman J, Hastie T, Tibshirani R (2010) A note on the group lasso and a sparse group lasso. Preprint, submitted January 5, https://arxiv.org/abs/1001.0736.Google Scholar
Gallego A-J, Calvo-Zaragoza J, Valero-Mas JJ, Rico-Juan JR (2018) Clustering-based k-nearest neighbor classification for large-scale data with neural codes representation. Pattern Recognition 74(February):531–543.Crossref, Google Scholar
Hasselblad V (1966) Estimation of parameters for a mixture of normal distributions. Technometrics 8(3):431–444.Crossref, Google Scholar
Jacobs RA, Jordan MI, Nowlan SJ, Hinton GE (1991) Adaptive mixtures of local experts. Neural Comput. 3(1):79–87.Crossref, Google Scholar
Jain P, Netrapalli P, Sanghavi S (2012) Low-rank matrix completion using alternating minimization. Preprint, submitted December 3, https://arxiv.org/abs/1212.0467.Google Scholar
Khabia A, Chandak MB (2014) A cluster based approach for classification of web results. Internat. J. Advanced Comput. Res. 4(4):934–938.Google Scholar
Khadka M, Paz A, Singh A (2020) Generalised clusterwise regression for simultaneous estimation of optimal pavement clusters and performance models. Internat. J. Pavement Engrg. 21(9):1122–1134.Crossref, Google Scholar
Kumar CS, George KK, Ramachandran K, Panda A (2015) Weighted cosine distance features for speaker verification. 2015 Annual IEEE India Conf. (IEEE, Piscataway, NJ), 1–5.Google Scholar
Lai M-J, Mckenzie D (2020) Compressive sensing approach to cut improvement and local clustering. SIAM J. Math. Data Sci. 2(2):368–395.Google Scholar
Lai MJ, Varghese A (2017) On convergence of the alternating projection method for matrix completion and sparse recovery problems. Preprint, submitted November 6, https://arxiv.org/abs/1711.02151.Google Scholar
Lei L, Kun S (2016) Speaker recognition using wavelet cepstral coefficient, i-vector, and cosine distance scoring and its application for forensics. J. Electr. Comput. Engrg. 2016:Article 4908412.Google Scholar
Lewis AS, Luke DR, Malick J (2009) Local linear convergence for alternating and averaged nonconvex projections. Foundations Comput. Math. 9(4):485–513.Crossref, Google Scholar
Li D, Deogun JS, Spaulding W, Shuart B (2004) Towards missing data imputation: A study of fuzzy K-means clustering method. Tsumoto S, Slowiński R, Komorowski J, Grzymala-Busse JW, eds. Rough Sets and Current Trends in Computing (Springer, Berlin), 573–579.Crossref, Google Scholar
Liao X, Li H, Carin L (2014) Generalized alternating projection for weighted-2,1 minimization with applications to model-based compressive sensing. SIAM J. Imaging Sci. 7(2):797–823.Crossref, Google Scholar
Liu H, Han J, Nie F, Li X (2017) Balanced clustering with least square regression. Shlomo Z, ed. 31st AAAI Conf. Artificial Intelligence (AAAI Press, Menlo Park, CA), 2231–2237.Google Scholar
Luo ZQ, Tseng P (1993) Error bounds and convergence analysis of feasible descent methods: A general approach. Ann. Oper. Res. 46–47(1):157–178.Crossref, Google Scholar
Ma S, Song X, Huang J (2007) Supervised group lasso with applications to microarray data analysis. BMC Bioinformatics 8(1):Article 60.Crossref, Google Scholar
Mahajan M, Nimbhorkar P, Varadarajan K (2009) The planar k-means problem is NP-hard. Das S, Uehara R, eds. Internat. Workshop Algorithms and Comput. (Springer, Berlin), 274–285.Google Scholar
Mary NAB, Dharma D (2017) Coral reef image classification employing improved ldp for feature extraction. J. Visual Comm. Image Representation 49(November):225–242.Crossref, Google Scholar
Panigrahi L, Verma K, Singh BK (2019) Ultrasound image segmentation using a novel multi-scale Gaussian kernel fuzzy clustering and multi-scale vector field convolution. Expert Systems Appl. 115(January):486–498.Crossref, Google Scholar
Park YW, Jiang Y, Klabjan D, Williams L (2017) Algorithms for generalized clusterwise linear regression. INFORMS J. Comput. 29(2):301–317.Link, Google Scholar
Ren D, Hui M, Hu N, Zhan T (2018) A weighted sparse neighbor representation based on Gaussian kernel function to face recognition. Optik 167(August):7–14.Crossref, Google Scholar
Rokach L, Maimon O (2005) Clustering methods. Maimon O, Rokach L, eds. Data Mining and Knowledge Discovery Handbook (Springer, New York), 321–352.Crossref, Google Scholar
Rosso G (2014) Outliers emphasis on cluster analysis—The use of squared Euclidean distance and fuzzy clustering to detect outliers in a data set. Preprint, submitted March 21, https://arxiv.org/abs/1403.5417.Google Scholar
Schrijver A (1998) Theory of Linear and Integer Programming (John Wiley & Sons, Chichester, UK).Google Scholar
Simon N, Friedman J, Hastie T, Tibshirani R (2013) A sparse-group lasso. J. Comput. Graphical Statist. 22(2):231–245.Crossref, Google Scholar
Späth H (1982) A fast algorithm for clusterwise linear regression. Computing 29(2):175–181.Crossref, Google Scholar
Stokes MD, Deane GB (2009) Automated processing of coral reef benthic images. Limnol. Oceanogr. Methods 7(2):157–168.Crossref, Google Scholar
Suo M, Zhu B, Zhang Y, An R, Li S (2018) Fuzzy Bayes risk based on Mahalanobis distance and Gaussian kernel for weight assignment in labeled multiple attribute decision making. Knowledge- Based Systems 152(July):26–39.Crossref, Google Scholar
Tanner J, Wei K (2016) Low rank matrix completion by alternating steepest descent methods. Appl. Comput. Harmonic Anal. 40(2):417–429.Google Scholar
Tran CT, Zhang M, Andreae P, Xue B, Bui LT (2018) Improving performance of classification on incomplete data using feature selection and clustering. Appl. Soft Comput. 73(December):848–861.Crossref, Google Scholar
von Neumann J (1950) Functional Operators: Measures and Integrals, vol. 1 (Princeton University Press, Princeton, NJ).Google Scholar
Vu T (2019) Plug-n-play alternating projection algorithm for large-scale security constraint optimal power flow. Preprint, submitted July 6, https://arxiv.org/abs/1907.03173.Google Scholar
Wang B, Kong Y, Zhang Y, Liu D, Ning L (2019) Integration of unsupervised and supervised machine learning algorithms for credit risk assessment. Expert Systems Appl. 128(August):301–315.Google Scholar
Weston J, Watkins C (1999) Support vector machines for multi-class pattern recognition. Eur. Sympos. Artificial Neural Networks Proc., Bruges, Belgium, 219–224.Google Scholar
Yang B, Wang M, Xu Z, Zhang T (2018) Streaming algorithm for big data logistic regression. Donald K, Bing L, eds. 2018 IEEE Internat. Conf. Big Data (IEEE, Piscataway, NJ), 2940–2950.Google Scholar
Yue J, Zhao W, Mao S, Liu H (2015) Spectral–spatial classification of hyperspectral images using deep convolutional neural networks. Remote Sensing Lett. 6(6):468–477.Crossref, Google Scholar
Zhang B (2003) Regression clustering. Third IEEE Internat. Conf. Data Mining (IEEE, Piscataway, NJ), 451–458.Google Scholar
Zhang C, Qin Y, Zhu X, Zhang J, Zhang S (2006) Clustering-based missing value imputation for data preprocessing. 2006 4th IEEE Internat. Conf. Indust. Informatics (IEEE, Piscataway, NJ), 1081–1086.Google Scholar
Zhang S, Zhang J, Zhu X, Qin Y, Zhang C (2008) Missing value imputation based on data clustering. Gavrilova ML, Tan CJK, eds. Transactions on Computational Science, vol. 1 (Springer, Berlin), 128–138.Google Scholar
Zhao W, Du S (2016) Spectral–spatial feature extraction for hyperspectral image classification: A dimension reduction and deep learning approach. IEEE Trans. Geosci. Remote Sensing 54(8):4544–4554.Crossref, Google Scholar
Zhao Y, Wang P-H, Li Y-G, Li M-Y (2018) Fuzzy weighted c-harmonic regressions clustering algorithm. Soft Comput. 22(14):4595–4611.Crossref, Google Scholar
Zheng Y, Hong L (2018) The model of wind power short-term prediction based on artificial fish swarm algorithm of support vector machine. Huachuan Y, ed. 2018 IEEE Internat. Conf. Safety Produce Informatization (IEEE, Piscataway, NJ), 570–574.Google Scholar
Zhu Z, Li X (2018) Convergence analysis of alternating nonconvex projections. Preprint, submitted February 12, https://arxiv.org/abs/1802.03889.Google Scholar
Zhu Z, Li Y, Kong N (2012) Clusterwise linear regression with the least sum of absolute deviations—An MIP approach. Internat. J. Oper. Res. 9(3):162–172.Google Scholar

cover image INFORMS Journal on Computing

Volume 34, Issue 1

January-February 2022

Pages 1-669, C2

Article Information

Supplemental Material

Metrics

Information

Received:October 30, 2019
Accepted:October 27, 2020
Published Online:March 09, 2021

Cite as

Shutong Chen, Weijun Xie (2021) On Cluster-Aware Supervised Learning: Frameworks, Convergent Algorithms, and Applications. INFORMS Journal on Computing 34(1):481-502.

https://doi.org/10.1287/ijoc.2020.1053

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

On Cluster-Aware Supervised Learning: Frameworks, Convergent Algorithms, and Applications

References

Volume 34, Issue 1

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News