The typical way that we display correlated data is that we plot the points on an X–Y plane. The data are correlated to the degree to which the points are contained within a narrow, slanted ellipse.
I believe that this is, in fact, the most intuitive way to display correlated data. However, there is an alternate way of doing it that yields interesting insights.
In the plot above, the X and Y axes are orthogonal (at a right angle). However, we can make scatterplots in which the axes are oblique (not orthogonal). This is hard to think about at first but after a while it makes sense. No matter what, the points are at the intersection of two vectors perpendicular to the axes. For example, point A (2,2) and point B (1,3) can be displayed with oblique axes like so:
If we make the cosine of the angle between X and Y axes to equal the correlation coefficient, something interesting happens. Suppose that X and Y are normally distributed z-scores with a correlation of 0.8. When the cosine of the angle between the axes equals the correlation coefficient, the data appear to be contained in a circle rather than in an ellipse.
What is the value of this way of looking at correlations? There are many insights to be had but for now I will focus on two. First, partially correlated data are partially redundant. Viewing the data with oblique axes gives us an alternate way of seeing how redundant the information provided by the two variables is. Second, viewing the data with oblique axes gives an idea as to what is happening with principal components analysis.
Oblique Axes and Principal Components Analysis
Principal components analysis takes our data and summarizes it in the most economical way possible. With only 2 correlated variables, the first principal component is a summary of overall elevation of the 2 scores. If X and Y both equal 2 (and the correlation is 0.8), the score on the first principal component is about 2.11 (which, like all composite scores, is slightly more extreme than the weighted average of its parts).
In the plot above, the first principal component (PC1) is the red vector that bisects X and Y. The cosine of the angle between PC1 and the X-axis is X’s correlation with PC1 (also known as X’s loading on PC1). Because there are only two variables, X and Y have equal loadings on PC1.
The second principal component (PC2) is orthogonal to the first principal component. The meaning of PC2 depends on how many variables there are and their structure. In the case of two positively correlated variables, PC2 is a summary of the magnitude of the difference between the scores. If X = 2 and Y = 1, they differ by 1 standard score. If X and Y are highly correlated, this is a large difference and the score on PC2 would be large. If X and Y have a low correlation, this difference is not so large and the score on PC2 is more modest.
Oblique Axes and the Mahalanobis Distance
The Mahalanobis distance is a measure of how unusual a profile of scores is in a particular population. Shown with oblique axes, the Mahalanobis distance is simply the distance of the point to the origin (at the population mean). Suppose that X and Y have correlation of 0.90. As shown below, if X is 1 standard deviation above the mean and Y is 1 standard deviations below the mean, the Mahalanobis distance for this point is going to be large (4.5).
For multivariate k normal variables, the Mahalanobis distance has a χ distribution with k degrees of freedom (the χ distribution occurs when you take the square root of every value in the more well known χ2 distribution). In the χ distribution with 2 degrees of freedom, a value of 4.5 is greater than 99.95% of values. Thus, (-1,1) is a quite unusual pair of scores if the z-scores correlate at ρ = 0.90
If both X and Y are 1 standard deviation above the mean, the Mahalanobis distance would be fairly small (1.03). In the χ distribution with 2 degrees of freedom, a value of 1.03 is greater than only 39% of values, making this a fairly typical pair of scores.