A Nonparametric Subspace Analysis Approach with Application to Anomaly Detection Ensembles

Published Online:https://doi.org/10.1287/ijds.2023.0027

References

  • Aggarwal CC, Subbian K (2012) Event detection in social streams. Ghosh J, Liu H, Davidson I, Domeniconi C, Kamath C, eds. Proc. 2012 SIAM Internat. Conf. Data Mining (SIAM, Philadelphia), 624–635.Google Scholar
  • Aggarwal CC, Yu PS (2001) Outlier detection for high dimensional data. Sellis T, Mehrotra S, eds. Proc. 2001 ACM SIGMOD Internat. Conf. Management Data (ACM, New York), 37–46.Google Scholar
  • Bache K, Lichman M (2013) UCI Machine Learning Repository (University of California School of Information and Computer Science, Irvine, CA).Google Scholar
  • Bacher M, Ben-Gal I, Shmueli E (2016) Subspace selection for anomaly detection: An information theory approach. 2016 IEEE Internat. Conf. Sci. Electr. Engrg. (ICSEE) (IEEE, Piscataway, NJ), 1–5.Google Scholar
  • Bacher M, Ben-Gal I, Shmueli E (2017) An information theory subspace analysis approach with application to anomaly detection ensembles. Fred ALN, Filipe J, eds. Proc. 9th Internat. Joint Conf. Knowledge Discovery Knowledge Engrg Knowledge Management—KDIR (SciTePress, Setúbal, Portugal), 27–39.Google Scholar
  • Bajovic D, Sinopoli B, Xavier J (2011) Sensor selection for event detection in wireless sensor networks. IEEE Trans. Signal Processing 59(10):4938–4953.Google Scholar
  • Ben-Gal I (2010) Outlier detection. Maimon O, Rokach L, eds. Data Mining and Knowledge Discovery Handbook (Springer, Boston), 117–130.Google Scholar
  • Ben-Gal I, Morag G, Shmilovici A (2003) Context-based statistical process control: A monitoring procedure for state-dependent processes. Technometrics 45(4):293–311.Google Scholar
  • Bishop CM, Nasrabadi NM (2006) Pattern Recognition and Machine Learning, vol. 4 (Springer, New York), 738.Google Scholar
  • Breiman L (2001) Random forests. Machine Learn. 45(1):5–32.Google Scholar
  • Chandola V, Banerjee A, Kumar V (2007) Outlier detection: A survey. ACM Comput. Surveys 14:15.Google Scholar
  • Cheng CH, Fu AW, Zhang Y (1999) Entropy-based subspace clustering for mining numerical data. Fayyad U, Chaudhuri S, Madigan D, eds. Proc. Fifth ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 84–93.Google Scholar
  • Cover TM, Thomas JA (2006) Elements of Information Theory, 2nd ed. (John Wiley & Sons, Hoboken, NJ).Google Scholar
  • Gan G, Ng MKP (2015) Subspace clustering with automatic feature grouping. Pattern Recognition 48(11):3703–3713.Google Scholar
  • Garcia S, Luengo J, Sáez JA, Lopez V, Herrera F (2012) A survey of discretization techniques: Taxonomy and empirical analysis in supervised learning. IEEE Trans. Knowledge Data Engrg. 25(4):734–750.Google Scholar
  • García-Torres M, Gómez-Vela F, Melián-Batista B, Moreno-Vega JM (2016) High-dimensional feature selection via feature grouping: A variable neighborhood search approach. Inform. Sci. 326:102–118.Google Scholar
  • Ge Z, Song Z (2012) Multivariate Statistical Process Control: Process Monitoring Methods and Applications (Springer Science & Business Media, New York).Google Scholar
  • Guyon I, Gunn S, Nikravesh M, Zadeh LA, eds. (2008) Feature Extraction: Foundations and Applications, vol. 207 (Springer, Berlin).Google Scholar
  • Ha J, Seok S, Lee JS (2015) A precise ranking method for outlier detection. Inform. Sci. 324:88–107.Google Scholar
  • Jakulin A (2005) Machine learning based on attribute interactions. Doctoral dissertation, University of Ljubljana, Ljubljana, Slovenia.Google Scholar
  • Jing L, Ng MK, Huang JZ (2007) An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data. IEEE Trans. Knowledge Data Engrg. 19(8):1026–1041.Google Scholar
  • Jyothsna VVRPV, Prasad R, Prasad KM (2011) A review of anomaly based intrusion detection systems. Internat. J. Comput. Appl. 28(7):26–35.Google Scholar
  • Kagan E, Ben-Gal I (2013) Probabilistic Search for Tracking Targets (John Wiley & Sons, Hoboken, NJ).Google Scholar
  • Kagan E, Ben-Gal I (2014) A group testing algorithm with online informational learning. IIE Trans. 46(2):164–184.Google Scholar
  • Keller F, Muller E, Bohm K (2012) HICS: High contrast subspaces for density-based outlier ranking. Kementsietsidis A, Vaz Salles MA, eds. Proc. 2012 IEEE 28th Internat. Conf. Data Engrg. (IEEE, Piscataway, NJ), 1037–1048.Google Scholar
  • Kenett RS, Zacks S (2021) Modern Industrial Statistics: With Applications in R, MINITAB and JMP (John Wiley & Sons, Hoboken, NJ).Google Scholar
  • Kuratowski K (2014) Introduction to Set Theory and Topology (Elsevier, Amsterdam).Google Scholar
  • Lazarevic A, Kumar V (2005) Feature bagging for outlier detection. Proc. Eleventh ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 157–166.Google Scholar
  • Liu FT, Ting KM, Zhou ZH (2008) Isolation forest. Giannotti F, Gunopulos D, Turini F, Zaniolo C, Ramakrishnan N, Wu X, eds. Proc. 2008 Eighth IEEE Internat. Conf. Data Mining (IEEE, Piscataway, NJ), 413–422.Google Scholar
  • McGill W (1954) Multivariate information transmission. Trans. IRE Professional Group Inform. Theory 4(4):93–111.Google Scholar
  • Menahem E, Rokach L, Elovici Y (2013) Combining one-class classifiers via meta learning. He Q, Iyengar A, Nejdl W, Pei J, Rastogi R, eds. Proc. 22nd ACM Internat. Conf. Inform. Knowledge Management (ACM, New York), 2435–2440.Google Scholar
  • Müller E, Schiffer M, Seidl T (2010) Adaptive outlierness for subspace outlier ranking. Proc. 19th ACM Internat. Conf. Inform. Knowledge Management (ACM, New York), 1629–1632.Google Scholar
  • Nguyen HV, Müller E, Böhm K (2014) A near-linear time subspace search scheme for unsupervised selection of correlated features. Big Data Res. 1:37–51.Google Scholar
  • Nguyen HV, Müller E, Vreeken J, Keller F, Böhm K (2013) CMI: An information-theoretic contrast measure for enhancing subspace cluster and outlier detection. Proc. 2013 SIAM Internat. Conf. Data Mining (SIAM, Philadelphia), 198–206.Google Scholar
  • Park C, Huang JZ, Ding Y (2010) A computable plug-in estimator of minimum volume sets for novelty detection. Oper. Res. 58(5):1469–1480.LinkGoogle Scholar
  • Pimentel MA, Clifton DA, Clifton L, Tarassenko L (2014) A review of novelty detection. Signal Processing 99:215–249.Google Scholar
  • Rokhlin VA (1967) Lectures on the entropy theory of measure-preserving transformations. Russian Math. Surveys 22(5):1–52.Google Scholar
  • Schölkopf B, Smola A, Müller KR (2005) Kernel principal component analysis. Artificial Neural Networks—ICANN’97: 7th Internat. Conf. Proc. (Springer, Berlin), 583–588.Google Scholar
  • Scott DW (2015) Multivariate Density Estimation: Theory, Practice, and Visualization (John Wiley & Sons, Hoboken, NJ).Google Scholar
  • Simovici D (2007) On generalized entropy and entropic metrics. J. Multiple-Valued Logic Soft Comput. 13(4/6):295–320.Google Scholar
  • Sinai IG, Sinaj JG, Sinai YG (1976) Introduction to Ergodic Theory, vol. 18 (Princeton University Press, Princeton, NJ).Google Scholar
  • Somol P, Novovičová J (2010) Evaluating stability and comparing output of feature selectors that optimize feature subset cardinality. IEEE Trans. Pattern Anal. Machine Intelligence 32(11):1921–1939.Google Scholar
  • Steinwart I, Hush D, Scovel C (2005) A classification framework for anomaly detection. J. Machine Learn. Res. 6(2):211–232.Google Scholar
  • Sugar CA, James GM (2003) Finding the number of clusters in a data set: An information-theoretic approach. J. Amer. Statist. Assoc. 98(463):750–763.Google Scholar
  • Tarassenko L, Hann A, Patterson A, Braithwaite E, Davidson K, Barber V, Young D (2005) BiosignTM: Multi-parameter monitoring for early warning of patient deterioration. Proc. 3rd IEE Internat. Seminar Medical Appl. Signal Processing 2005 (IEEE, Piscataway, NJ), 71–76.Google Scholar
  • Watanabe S (1960) Information theoretical analysis of multivariate correlation. IBM J. Res. Development 4(1):66–82.Google Scholar
  • Yianilos PN (2002) Normalized forms for two common metrics. Report No. 91–082, NEC Research Institute, Princeton, NJ.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.