The more reliable a score is, the more certain we can be about what it means (provided its validity is close to its reliability). Certain rules-of-humb about score reliability are sometimes proposed:
- Base high-stakes decisions only on scores with reliability coefficients of 0.98 or better.
- Base substantive interpretations on scores with reliability coefficients of 0.90 or better.
- Base decisions to give more tests or not on scores with reliability coefficients of 0.80 or more.
Such guidelines seem reasonable to me, but I do not find reliability coefficients to be intuitively easy to understand. How much uncertainty is associated with a reliability coefficient of 0.80? The value of the coefficient (0.80) is not directly informative about individual scores. Instead, it refers to the correlation the scores have with a (more often than not hypothetical) repeated measurement.
Another way to think about the reliability coefficient is that it is a ratio of true score variance to observed score variance. Variance is the average squared deviation from the mean. Squared quantities are not easy to think about for most of us. For this reason, I prefer to convert reliability coefficients into confidence interval widths. Confidence interval widths and reliability coefficients have a non-linear relationship:
is the z-score associated with the level of confidence you want (e.g., 1.96 for a 95% confidence interval)
is the standard deviation of X
is the classical test theory reliability coefficient for X
For index scores (μ = 100, σ = 15), a reliability coefficient of 0.80 is associated with a 95% confidence interval that is 24 points wide. That to me is much more informative than knowing that 80% of the variance is reliable.
Calculating a lower and upper bounds of a confidence interval for a score looks complex with all the symbols and subscripts, but after doing it a few times, it is not so bad. Basically, you are converting your score to a z-score, multiplying it by the reliability coefficient, and then adding (or subtracting) the margin of error, then converting everything back to the original metric.
The animated graph below shows the non-linear relationship between reliability and 95% confidence interval widths for different observed index scores. The confidence interval width narrows slowly at first and then quickly as the reliability coefficient approaches 1.