Unfortunate statistical terms

I like most of the technical terms we use in statistics. However, there are a few of them that I wish were easier to teach and remember. Many others have opined on such matters. This is my list of complaints:

Statistical significance: This term is so universally hated I am surprised that we haven’t held a convention and banned its use. How many journalists have been mislead by researchers’ technical use of significance? I wish we said something like “not merely random” or “probably not zero.”
Type I/Type II error: It is hard to remember which is which because the terms don’t convey any clues as to what they mean. I wish more informative metaphors were used such as false hit and false miss.
Power: Statistical power refers to the probability that the null hypothesis will be rejected, provided that the null hypothesis is false. The term is not self-explanatory and requires memorization! I wish we used a better term such as true hit rate or false null rejection rate. While we’re at it, α and β are not much better. False hit rate (or true null rejection rate) and false miss rate (or false null retention rate) would be easier to remember.
Prediction error: The word error in English typically refers to an action that results in harm that could have been avoided if better choices had been made. In the context of statistical models, prediction errors are what you get wrong even though you have done everything right! I wish there were a word that referred to actions that were done in good faith yet resulted in unforeseeable harm. In this case, we already have a perfectly good substitute term that is widely used: disturbance. I suppose that the connotations of disturbance could generate different misunderstandings but in my estimation they are not as bad as those generated by error. I wish that we could just use the term residuals but that refers to something slightly different: the estimate of an error (residual:error::statistic:parameter). We can only know the errors if we know the true model parameters.
Variance explained: This term works if the predictor is a cause of the criterion variable. However, when it is simply a correlate, it misleadingly suggests that we now understand what is going on. I wish the term were something more neutral such as variance predicted.
Moderator/Mediator: At least in English, these terms sound so much alike that they are easily confused. I think that we should dump moderator along with related terms interaction effect, simple main effect, and simple slope. I think that the term conditional effects is more descriptive and straightforward.
Biased: This word is hard to use in its technical sense when talking to non-statisticians. It sounds like we are talking about bigoted statistics! Unfortunately I can’t think of good alternative to it (though I can think of some awkward ones like stable inaccuracy).
Degrees of freedom: For me, this concept is extremely difficult to explain properly in an introductory course. Students are confused about what degrees have to do with it (or for that matter, freedom). I don’t know if I have a good replacement term (independent dimensions? non-redundancy index? matrix rank?).
True score: This term sounds like it refers to the Aristotelian truth when in fact it is merely the long-term average score if there were no carryover effects of repeated measurement. Thus, a person’s true score on one IQ test might be quite different from the same person’s true score on another IQ test. Neither true score refers to the person’s “true cognitive ability.” To avoid this confusion, I would prefer something like the individual expected value, or IEV for short.
Reliability: In typical usage, reliability refers to morally desirable traits such as trustworthiness and truthfulness. When statisticians refer to the reliability of scores or experimental results, to the untrained ear it probably sounds like we are talking about validity. I would prefer to talk about stability, consistency, or precision instead.

I am sure that there are many more!

9 thoughts on “Unfortunate statistical terms”

Pingback: Two visualizations for explaining “variance explained” | Assessing Psyche, Engaging Gauss, Seeking Sophia
Brant says:

How about ‘construct irrelevant variance’? Couldn’t we just use a term like impure measurement?

July 10, 2014 at 5:26 pm Reply
- W. Joel Schneider says:
  
  It’s a bit of a mouthful, isn’t it? Still, it describes clearly what it is…
  
  July 10, 2014 at 6:12 pm Reply
Brant says:

I am looking at the CTOPP-II manual (pg.58) for reliability information. It has 3 types of reliability (internal consistency, test-retest, and scorer). For an individual evaluation, which one is the one I base a decision about the adequacy of the reliability on? The test-retest is well below .90.

July 10, 2014 at 7:50 pm Reply
- W. Joel Schneider says:
  
  Each type of coefficient presents different information and is therefore useful for different purposes. If all you want is simply to give a general sense of score reliability, the internal consistency coefficient is generally sufficient.
  
  July 10, 2014 at 8:00 pm Reply
penthi says:

First thing that sprang to my mind (after reading the title): NULL-result. It strongly conveys that these are results to be considered non-valid and ignored, a mechanism at the core of many current-day (and historical) valid criticism of ‘what is going wrong with science’. UGH!

June 24, 2015 at 2:00 am Reply
Pingback: Metric Matters, Part 2: Evaluating Regression Models - Velocity Business Solutions Limited
Pingback: Metric Matters, Part 2: Evaluating Regression Models - Hong Kong | Abiqos
cubefox says:

Another example: “Effect size” uses causal lingo. Perhaps “association size” is accurate?

June 27, 2022 at 10:29 am Reply