When Behavioral Data Betray Users: A Diagnostic and Protective Framework Against Social Interaction Leakages
Abstract
Digital platforms often share publicly visible behavioral data, such as ratings and reviews, to improve service quality. Yet, these seemingly innocuous data can also reveal hidden social ties among users. We develop a two-stage framework that first diagnoses this leakage risk by inferring latent social interactions from behavioral data and then mitigates the risk while preserving data utility. We characterize when hidden ties are identifiable from observed actions and show, using public Yelp data from Louisiana and Pennsylvania, that an attacker can recover about half of true social ties at a 10% false-positive rate and more than 60% at a 20% false-positive rate. These inferred ties can materially increase cyber risk: when used for spear phishing, the estimated return on attack rises from 109% for a campaign with 500 impersonation attempts to 1,098% for one with 10,000 attempts. To reduce this leakage, we propose two perturbation mechanisms with formal differential privacy guarantees on the released representations: one adds Gaussian noise directly to the released action matrix, and the other adds Laplace noise to the learned representation before resynthesizing the released data. Both mechanisms reduce the link-inference accuracy and substantially lower estimated attacker returns; under our main protection settings, returns become negative for smaller campaigns and remain much lower at larger scales. Overall, the paper provides a practical framework for diagnosing social interaction leakage and evaluating the privacy-utility trade-off when platforms share behavioral data.
History: Peiyu Chen, Senior Editor; Heng Xu, Associate Editor.
Funding: Y. Leng is supported by the U.S. National Science Foundation (NSF) [Grant IIS-2153468].
Supplemental Material: The online appendix is available at https://doi.org/10.1287/isre.2024.1469.

