Agresti A (2015) Foundations of Linear and Generalized Linear Models (John Wiley & Sons, Hoboken, NJ).Google Scholar
Baker SG (1994) The multinomial-Poisson transformation. J. Roy. Statist. Soc. Ser. D 43(4):495–504.Google Scholar
Bickel PJ, Ritov Y, Tsybakov AB (2009) Simultaneous analysis of Lasso and Dantzig selector. Ann. Stat. 37(4):1705–1732.Crossref, Google Scholar
Bishop CM (2006) Pattern Recognition and Machine Learning (Springer, New York).Google Scholar
Bishop CM, Bishop H (2023) Deep Learning: Foundations and Concepts (Springer, Cham, Switzerland).Google Scholar
Böhning D (1992) Multinomial logistic regression algorithm. Ann. Inst. Statist. Math. 44(1):197–200.Crossref, Google Scholar
Boyd S, Vandenberghe L (2004) Convex Optimization (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Bühlmann P, van de Geer S (2011) Statistics for High-Dimensional Data: Methods, Theory and Applications (Springer, Berlin, Heidelberg).Crossref, Google Scholar
Dedieu A (2019) Error bounds for sparse classifiers in high-dimensions. Proc. 22nd Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 48–56.Google Scholar
Dedieu A (2021) Improved error rates for sparse (group) learning with Lipschitz loss functions. Preprint, submitted September 22, https://arxiv.org/abs/1910.08880.Google Scholar
Fan J, Guo Y, Wang K (2023) Communication-efficient accurate statistical estimation. J. Amer. Statist. Assoc. 118(542):1000–1010.Crossref, Google Scholar
Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J. Statist. Software 33(1):1–22.Crossref, Google Scholar
Friedman J, Hastie T, Höfling H, Tibshirani R (2007) Pathwise coordinate optimization. Ann. Appl. Statist. 1(2):302–332.Crossref, Google Scholar
Fu S, Chen P, Ye Z (2023a) Simplex-based proximal multicategory support vector machine. IEEE Trans. Inform. Theory 69(4):2427–2451.Crossref, Google Scholar
Fu S, Zhang S, Liu Y (2018) Adaptively weighted large-margin angle-based classifiers. J. Multivariate Anal. 166:282–299.Crossref, Google Scholar
Fu S, Chen P, Liu Y, Ye Z (2023b) Simplex-based multinomial logistic regression with diverging numbers of categories and covariates. Statist. Sinica 33(4):2463–2493.Google Scholar
Fu S, He Q, Zhang S, Liu Y (2019) Robust outcome weighted learning for optimal individualized treatment rules. J. Biopharmaceutical Statist. 29(4):606–624.Crossref, Google Scholar
Fu S, Li S, Yu K, Chen P, Ye Z (2026) Fast multinomial logistic regression with group sparsity. https://doi.org/10.1287/ijoc.2024.0796.cd, https://github.com/INFORMSJoC/2024.0796.Google Scholar
Ghaoui LE, Viallon V, Rabbani T (2012) Safe feature elimination for the LASSO and sparse supervised learning problems. Pacific J. Optim. 8(4):667–698.Google Scholar
Hastie T, Tibshirani R, Friedman J (2009) The Elements of Statistical Learning: Data Mining, Inference and Prediction (Springer, New York).Crossref, Google Scholar
Hastie T, Tibshirani R, Wainwright M (2015) Statistical Learning with Sparsity: The Lasso and Generalizations (Chapman and Hall/CRC, Boca Raton, FL).Crossref, Google Scholar
Hosmer DW, Lemeshow S, Sturdivant RX (2013) Applied Logistic Regression (John Wiley & Sons, Hoboken, NJ).Crossref, Google Scholar
Huang J, Zhang T (2010) The benefit of group sparsity. Ann. Statist. 38(4):1978–2004.Crossref, Google Scholar
Hunter DR, Lange K (2004) A tutorial on MM algorithms. Amer. Statist. 58(1):30–37.Crossref, Google Scholar
James G, Witten D, Hastie T, Tibshirani R (2021) An Introduction to Statistical Learning: With Applications in R (Springer, New York).Crossref, Google Scholar
Jordan MI, Lee JD, Yang Y (2019) Communication-efficient distributed statistical inference. J. Amer. Statist. Assoc. 114(526):668–681.Crossref, Google Scholar
Krishnapuram B, Carin L, Figueiredo MA, Hartemink AJ (2005) Sparse multinomial logistic regression: Fast algorithms and generalization bounds. IEEE Trans. Pattern Anal. Machine Intelligence 27(6):957–968.Crossref, Google Scholar
Lai W-T, Chen R-B (2021) A review of Bayesian group selection approaches for linear regression models. Wiley Interdisciplinary Rev. Comput. Statist. 13(4):e1513.Crossref, Google Scholar
Lemmens A, Gupta S (2020) Managing churn to maximize profits. Marketing Sci. 39(5):956–973.Link, Google Scholar
Li Y, Lu F, Yin Y (2022) Applying logistic LASSO regression for the diagnosis of atypical Crohn’s disease. Sci. Rep. 12(1):11340.Crossref, Google Scholar
Liang J, Poon C (2023) Variable screening for sparse online regression. J. Comput. Graphical Statist. 32(1):275–293.Crossref, Google Scholar
Lipkovich I, Svensson D, Ratitch B, Dmitrienko A (2024) Modern approaches for evaluating treatment effect heterogeneity from clinical trials and observational data. Statist. Med. 43(22):4388–4436.Crossref, Google Scholar
Lounici K, Pontil M, van de Geer S, Tsybakov AB (2011) Oracle inequalities and optimal inference under group sparsity. Ann. Statist. 39(4):2164–2204.Crossref, Google Scholar
Lücker F, Timonina-Farkas A, Seifert RW (2025) Balancing resilience and efficiency: A literature review on overcoming supply chain disruptions. Production Oper. Management 34(6):1495–1511.Crossref, Google Scholar
Meier L, Van De Geer S, Bühlmann P (2008) The group lasso for logistic regression. J. Roy. Statist. Soc. Ser. B 70(1):53–71.Crossref, Google Scholar
Murphy KP (2022) Probabilistic Machine Learning: An Introduction (MIT Press, Cambridge, MA).Google Scholar
Ndiaye E, Fercoq O, Gramfort A, Salmon J (2015) GAP safe screening rules for sparse multi-task and multi-class models. Adv. Neural Inform. Processing Systems 28:811–819.Google Scholar
Ndiaye E, Fercoq O, Gramfort, A, Salmon, J (2017) Gap safe screening rules for sparsity enforcing penalties. J. Machine Learn. Res. 18(1):4671–4703.Google Scholar
Negahban SN, Ravikumar P, Wainwright MJ, Yu B (2012) A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers. Statist. Sci. 27(4):538–557.Crossref, Google Scholar
Nibbering D, Hastie TJ (2022) Multiclass-penalized logistic regression. Comput. Statist. Data Anal. 169:107414.Crossref, Google Scholar
Okumusoglu BC, Basciftci B, Kocuk B (2024) An integrated predictive maintenance and operations scheduling framework for power systems under failure uncertainty. INFORMS J. Comput. 36(5):1335–1358.Link, Google Scholar
R Core Team (2023) R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, Vienna).Google Scholar
Salehi F, Abbasi E, Hassibi B (2019) The impact of regularization on high-dimensional logistic regression. Adv. Neural Inform. Processing Systems 32:12005–12015.Google Scholar
Shinn LM, Li Y, Mansharamani A, Auvil LS, Welge ME, Bushell C, Khan NA, et al. (2021) Fecal bacteria as biomarkers for predicting food intake in healthy adults. J. Nutrition 151(2):423–433.Crossref, Google Scholar
Simon N, Friedman J, Hastie T (2013) A blockwise descent algorithm for group-penalized multiresponse and multinomial regression. Preprint, submitted November 26, https://arxiv.org/abs/1311.6529.Google Scholar
Sun Q, Hu J, Ye Z-S (2025) Optimal abort policy for mission-critical systems under imperfect condition monitoring. Oper. Res. 73(5):2396–2416.Link, Google Scholar
Tan Y, Shenoy PP, Sherwood B, Shenoy C, Gaddy M, Oehlert ME (2024) Bayesian network models for PTSD screening in veterans. INFORMS J. Comput. 36(2):495–509.Link, Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58(1):267–288.Crossref, Google Scholar
Tibshirani R, Bien J, Friedman J, Hastie T, Simon N, Taylor J, Tibshirani RJ (2012) Strong rules for discarding predictors in lasso-type problems. J. Roy. Statist. Soc. Ser. B 74(2):245–266.Crossref, Google Scholar
Tutz G (2011) Regression for Categorical Data (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Vincent M, Hansen NR (2014) Sparse group lasso and high dimensional multinomial classification. Comput. Statist. Data Anal. 71:771–786.Crossref, Google Scholar
Wang J, Zhou J, Liu J, Wonka P, Ye J (2014) A safe screening rule for sparse logistic regression. Adv. Neural Inform. Processing Systems 27:1053–1061.Google Scholar
Wen C, Li Z, Dong R, Ni Y, Pan W (2023) Simultaneous dimension reduction and variable selection for multinomial logistic regression. INFORMS J. Comput. 35(5):1044–1060.Link, Google Scholar
Yang Y, Zou H (2015) A fast unified algorithm for solving group-lasso penalize learning problems. Statist. Comput. 25(6):1129–1141.Crossref, Google Scholar
Yang B, Matos MGD, Ferreira P (2020) The effect of shortening lock-in periods in telecommunication services. MIS Quart. 44(3):1391–1409.Crossref, Google Scholar
Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J. Roy. Statist. Soc. Ser. B 68(1):49–67.Crossref, Google Scholar
Yuan M, Xu Y (2023) Feature screening strategy for non-convex sparse logistic regression with log sum penalty. Inform. Sci. 624:732–747.Crossref, Google Scholar
Zhang H, Chen S (2021) Concentration inequalities for statistical inference. Commun. Math. Res. 37(1):1–85.Crossref, Google Scholar
Zhang C, Liu Y (2014) Multicategory angle-based large-margin classification. Biometrika 101(3):625–640.Crossref, Google Scholar
Zhang C, Liu Y, Wang J, Zhu H (2016) Reinforced angle-based multicategory support vector machines. J. Comput. Graphical Statist. 25(3):806–825.Crossref, Google Scholar
Zhang C, Pham M, Fu S, Liu Y (2018) Robust multicategory support vector machines using difference convex algorithm. Math. Programming 169(1):277–305.Crossref, Google Scholar
Zhu J, Hastie T (2004) Classification of gene microarrays by penalized logistic regression. Biostatistics 5(3):427–443.Crossref, Google Scholar

cover image INFORMS Journal on Computing

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Received:May 29, 2024
Accepted:January 24, 2026
Published Online:March 24, 2026

Cite as

Sheng Fu, Shixiang Li, Kai Yu, Piao Chen, Zhisheng Ye (2026) Fast Multinomial Logistic Regression with Group Sparsity. INFORMS Journal on Computing 0(0).

https://doi.org/10.1287/ijoc.2024.0796

Keywords

Acknowledgments

The authors sincerely thank the editor, the associate editor, and the anonymous reviewers for their constructive comments, which have led to a substantial improvement to the initial version of this article.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Fast Multinomial Logistic Regression with Group Sparsity

References

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News