Linear Classifiers Under Infinite Imbalance

Paul Glasserman
Paul Glasserman
[email protected]
https://orcid.org/0000-0002-9577-0205
Columbia Business School, New York, New York 10027
Search for more papers by this author
,
Mike Li
Corresponding Author
Mike Li
[email protected]
https://orcid.org/0000-0002-4895-5047
Columbia Business School, New York, New York 10027
Search for more papers by this author

Columbia Business School, New York, New York 10027

Search for more papers by this author

Mike Li

Corresponding Author

Mike Li

[email protected]

https://orcid.org/0000-0002-4895-5047

Columbia Business School, New York, New York 10027

Search for more papers by this author

Published Online:21 Dec 2023https://doi.org/10.1287/opre.2021.0376

References

Anderson TW (2003) An Introduction to Multivariate Statistical Analysis, 3rd ed. (Wiley, Hoboken, NJ).Google Scholar
Brown LD (1986) Fundamental of Statistical Exponential Families (Institute of Mathematical Statistics, Hayward, CA).Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: Synthetic minority over-sampling technique. J. Artificial Intelligence Res. 16:321–357.Crossref, Google Scholar
Csiszar I (1975) I-divergence geometry of probability distributions and minimization problems. Ann. Probab. 3(1):146–158.Crossref, Google Scholar
Deo A, Juneja S (2021) Credit risk: Simple closed-form approximate maximum likelihood estimator. Oper. Res. 69(2):361–379.Link, Google Scholar
Drummond C, Holte RC (2003) C4.5, class imbalance, and cost sensitivity: Why under-sampling beats over-sampling. Fawcett T, Mishra N, eds. ICML ‘03: Proc. Twentieth Internat. Conf. Machine Learn. Workshop Learn. Imbalanced Datasets II (AAAI Press, Washington, DC).Google Scholar
Durrett R (2019) Probability: Theory and Examples, 5th ed. (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Eguchi S, Copas J (2002) A class of logistic-type discriminant functions. Biometrika 89(1):1–22.Crossref, Google Scholar
Embrechts P, Klüppelberg C, Mikosch T (1997) Modelling Extremal Events: For Insurance and Finance (Springer, Berlin), 169–170.Crossref, Google Scholar
Freddie Mac (2021) Single Family Loan-Level Data Set General User Guide (Freddie Mac, McLean, VA).Google Scholar
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci. 55(1):119–139.Crossref, Google Scholar
Friedman J, Hastie T, Tibshirani R (2000) Additive logistic regression: A statistical view of boosting. Ann. Statist. 28(2):337–407.Crossref, Google Scholar
Gill PM, Pearce CEM, Pečarić J (1997) Hadamard’s inequality for r-convex functions. J. Math. Anal. Appl. 215:461–470.Crossref, Google Scholar
Kubat M, Matwin S (1997) Addressing the curse of imbalanced training sets: One-sided selection. Fisher DH, ed. Proc. 14th Internat. Conf. Machine Learn. (Morgan Kaufmann, San Francisco), 179–186.Google Scholar
Kullback S, Leibler RA (1951) On information and sufficiency. Ann. Math. Statist. 22(1):79–86.Crossref, Google Scholar
Lehmann EL, Romano JP (2005) Testing Statistical Hypotheses, 3rd ed. (Springer, Berlin).Google Scholar
Li Y, Bellotti T, Adams N (2019) Issues using logistic regression with class imbalance, with a case study from credit risk modelling. Foundations Data Sci. 1(4):389–417.Crossref, Google Scholar
Liu X-Y, Wu J, Zhou Z-H (2008) Exploratory undersampling for class-imbalance learning. IEEE Trans. Systems Man Cybern. B Cybern. 39(2):539–550.Google Scholar
McClish D (1989) Analyzing a portion of the ROC curve. Medical Decision Making 9(3):190–195.Crossref, Google Scholar
Owen AB (2007) Infinitely imbalanced logistic regression. J. Machine Learn. Res. 8(27):761–773.Google Scholar
Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez J-C, Müller M (2011) pROC: An open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 12:77.Crossref, Google Scholar
Silvapulle M (1981) On the existence of maximum likelihood estimates for the binomial response models. J. Royal Statist. Soc. B 43(3):310–313.Crossref, Google Scholar
van Erven T, Harremoës P (2014) Rényi divergence and Kullback-Leibler divergence. IEEE Trans. Inform. Theory 60(7):3793–3820.Crossref, Google Scholar

Volume 73, Issue 2

March-April 2025

Pages iii-viii, 583-1150, C2-C3

Article Information

Supplemental Material

Metrics

Information

Received:June 10, 2021
Accepted:October 20, 2023
Published Online:December 21, 2023

Cite as

Paul Glasserman, Mike Li (2023) Linear Classifiers Under Infinite Imbalance. Operations Research 73(2):1075-1101.

https://doi.org/10.1287/opre.2021.0376

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Linear Classifiers Under Infinite Imbalance

References

Volume 73, Issue 2

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News