Linear Classifiers Under Infinite Imbalance

Published Online:https://doi.org/10.1287/opre.2021.0376

References

  • Anderson TW (2003) An Introduction to Multivariate Statistical Analysis, 3rd ed. (Wiley, Hoboken, NJ).Google Scholar
  • Brown LD (1986) Fundamental of Statistical Exponential Families (Institute of Mathematical Statistics, Hayward, CA).Google Scholar
  • Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: Synthetic minority over-sampling technique. J. Artificial Intelligence Res. 16:321–357.CrossrefGoogle Scholar
  • Csiszar I (1975) I-divergence geometry of probability distributions and minimization problems. Ann. Probab. 3(1):146–158.CrossrefGoogle Scholar
  • Deo A, Juneja S (2021) Credit risk: Simple closed-form approximate maximum likelihood estimator. Oper. Res. 69(2):361–379.LinkGoogle Scholar
  • Drummond C, Holte RC (2003) C4.5, class imbalance, and cost sensitivity: Why under-sampling beats over-sampling. Fawcett T, Mishra N, eds. ICML ‘03: Proc. Twentieth Internat. Conf. Machine Learn. Workshop Learn. Imbalanced Datasets II (AAAI Press, Washington, DC).Google Scholar
  • Durrett R (2019) Probability: Theory and Examples, 5th ed. (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Eguchi S, Copas J (2002) A class of logistic-type discriminant functions. Biometrika 89(1):1–22.CrossrefGoogle Scholar
  • Embrechts P, Klüppelberg C, Mikosch T (1997) Modelling Extremal Events: For Insurance and Finance (Springer, Berlin), 169–170.CrossrefGoogle Scholar
  • Freddie Mac (2021) Single Family Loan-Level Data Set General User Guide (Freddie Mac, McLean, VA).Google Scholar
  • Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci. 55(1):119–139.CrossrefGoogle Scholar
  • Friedman J, Hastie T, Tibshirani R (2000) Additive logistic regression: A statistical view of boosting. Ann. Statist. 28(2):337–407.CrossrefGoogle Scholar
  • Gill PM, Pearce CEM, Pečarić J (1997) Hadamard’s inequality for r-convex functions. J. Math. Anal. Appl. 215:461–470.CrossrefGoogle Scholar
  • Kubat M, Matwin S (1997) Addressing the curse of imbalanced training sets: One-sided selection. Fisher DH, ed. Proc. 14th Internat. Conf. Machine Learn. (Morgan Kaufmann, San Francisco), 179–186.Google Scholar
  • Kullback S, Leibler RA (1951) On information and sufficiency. Ann. Math. Statist. 22(1):79–86.CrossrefGoogle Scholar
  • Lehmann EL, Romano JP (2005) Testing Statistical Hypotheses, 3rd ed. (Springer, Berlin).Google Scholar
  • Li Y, Bellotti T, Adams N (2019) Issues using logistic regression with class imbalance, with a case study from credit risk modelling. Foundations Data Sci. 1(4):389–417.CrossrefGoogle Scholar
  • Liu X-Y, Wu J, Zhou Z-H (2008) Exploratory undersampling for class-imbalance learning. IEEE Trans. Systems Man Cybern. B Cybern. 39(2):539–550.Google Scholar
  • McClish D (1989) Analyzing a portion of the ROC curve. Medical Decision Making 9(3):190–195.CrossrefGoogle Scholar
  • Owen AB (2007) Infinitely imbalanced logistic regression. J. Machine Learn. Res. 8(27):761–773.Google Scholar
  • Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez J-C, Müller M (2011) pROC: An open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 12:77.CrossrefGoogle Scholar
  • Silvapulle M (1981) On the existence of maximum likelihood estimates for the binomial response models. J. Royal Statist. Soc. B 43(3):310–313.CrossrefGoogle Scholar
  • van Erven T, Harremoës P (2014) Rényi divergence and Kullback-Leibler divergence. IEEE Trans. Inform. Theory 60(7):3793–3820.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.