Interpretation and Visualization of Distance Covariance Through Additive Decomposition of Correlations Formula
Abstract
Distance covariance is a widely used statistical methodology for testing dependence between two groups of variables. Despite its appealing properties—consistency against any alternatives and the superior testing power—distance covariance only indicates whether the null hypothesis of independence is rejected. In most applications, however, practitioners are interested in understanding how the two groups of variables are related. This paper derives an additive decomposition of correlations formula for the population and sample distance covariance. The formula provides a clear explanation of why distance covariance measures dependency between random vectors. Based on this formula, we develop a visualization workflow that provides practitioners with an intuitive interpretation of the distance covariance. We apply this method to simulated test cases to illustrate why the distance covariance test indicates that two groups of variables are related. We also apply the visualization approach to a real solar-cell manufacturing data set to show how distance covariance determines that several process features are associated with the solar-cell performance.
History: Eunshin Byon served as the senior editor for this article.
Supplemental Material: The online appendix is available at https://doi.org/10.1287/ijds.2024.0054.

