Stochastic Neighbourhood Components Analysis

Published Online:https://doi.org/10.1287/ijds.2023.0018

Distance metric learning is a fundamental task in data mining and is known to enhance the performance of various distance-based algorithms. In this paper, we consider stochastic training data in which repeated feature vectors can belong to different classes, a scenario in which existing methods of metric learning are known to struggle. This type of data is common in stochastic simulations, where multidimensional, recurrent system states are subject to inherent randomness. Classification models on such high-resolution simulation-generated data play a critical role in real-time decision making across diverse applications. This paper presents and implements a stochastic version of the popular neighbourhood components analysis. We demonstrate its behaviour on stochastic data using simulation models and reveal its advantages when used for nearest neighbour classification. Meanwhile, the assumptions of stochastic labelling and repeated feature vectors extend to data from various domains, suggesting that the method can attain broad impact. For example, beyond its applications to system control and decision making with digital twin simulation, it may enhance the analysis of data from sensor networks, recommender systems, and crowdsourced platforms, where stochasticity and recurring feature patterns are typical.

Funding: This work was supported by the Engineering and Physical Sciences Research Council–funded STOR-i Centre for Doctoral Training at Lancaster University [Grant EP/L015692/1]. In addition, Barry L. Nelson’s work was supported by the U.S. National Science Foundation [Grant DMS-1854562].

Data Ethics & Reproducibility Note: The code capsule is available on Code Ocean at https://doi.org/10.24433/CO.0189724.v5 and in the e-Companion to this article (available at https://doi.org/10.1287/ijds.2023.0018).

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.