Interpretation and Visualization of Distance Covariance Through Additive Decomposition of Correlations Formula

Published Online:https://doi.org/10.1287/ijds.2024.0054

Distance covariance is a widely used statistical methodology for testing dependence between two groups of variables. Despite its appealing properties—consistency against any alternatives and the superior testing power—distance covariance only indicates whether the null hypothesis of independence is rejected. In most applications, however, practitioners are interested in understanding how the two groups of variables are related. This paper derives an additive decomposition of correlations formula for the population and sample distance covariance. The formula provides a clear explanation of why distance covariance measures dependency between random vectors. Based on this formula, we develop a visualization workflow that provides practitioners with an intuitive interpretation of the distance covariance. We apply this method to simulated test cases to illustrate why the distance covariance test indicates that two groups of variables are related. We also apply the visualization approach to a real solar-cell manufacturing data set to show how distance covariance determines that several process features are associated with the solar-cell performance.

History: Eunshin Byon served as the senior editor for this article.

Supplemental Material: The online appendix is available at https://doi.org/10.1287/ijds.2024.0054.

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.