# IQ, the death penalty, and me

Today the Supreme Court of the United States ruled that, in death penalty cases, the state of Florida must take into account the inherent imprecision of IQ tests.

Why are IQ tests used in death penalty cases? It is unconstitutional to execute a person deemed to be intellectually disabled (Intellectual disability is the current term for what was previously known as mental retardation.). Diagnosing intellectual disabilities is a complex matter but the diagnosis hinges to a large degree on the person’s performance on a well-constructed IQ test. Although high-quality IQ tests are more reliable than most psychological measures, even the best IQ tests are imperfectly precise. There is a potentially large risk that a person with an observed score slightly above the threshold set by Florida law may have a “true score” that is below the threshold.

It was an unexpected honor to have my work cited in both the court’s decision (written by Justice Kennedy) and the dissenting opinion (written by Justice Alito). My contribution to the argument (relevant portion reproduced here) is a technical one and played an admittedly small role in the proceedings . My main point was that when multiple IQ tests have been administered to the same individual, we should not average the scores but make them into a composite score in the same way that we combine psychological scores in any other context. Doing so gives a more accurate estimate of the IQ and a smaller confidence interval around the score. I hope that the application of this procedure results in fewer incorrect decisions and a fairer administration of justice.

I am grateful to Cecil Reynolds for giving me the opportunity to write the paper and to Kevin McGrew for encouraging me to re-write and publish the argument on the web, demonstrating its application to death penalty cases. Although it was the published chapter that was cited and used by the defense, it was the free web version that initially caught the attention of the law firm representing the defendant.

# Why averaging multiple IQ scores is incorrect in death penalty cases

As I have explained elsewhere on this blog, when a person has been given multiple IQ tests, it is common practice to take the mean IQ or median IQ to determine eligibility for the death penalty. As long as all the scores are valid estimates, combining multiple scores results in more accurate measurement.

Unfortunately, taking the mean or median IQ score is one of those solutions that is simple, neat, and wrong. Why? In the graph below, there are two IQ tests that correlate at 0.9. On each test, the population mean is μ = 100 and the standard deviation is σ = 15. On either test alone, about 2.3% of people score 70 or less, the typical threshold at which a person is ineligible for the death penalty.

What percent of people score 70 or less on the average of the 2 tests? About 2%. Why is it 2% instead of 2.3%? The smaller number occurs because the tests, though highly correlated, are not perfectly correlated. The average of the 2 tests has population mean of μ = 100 but its standard deviation is smaller than 15. In this case, the standard deviation is σ = 14.62. The fact that the standard deviation of the average of two scores is smaller results in fewer people below the threshold of 70 than is the case if just one test had been given.

There is an established procedure for rescaling a composite score so that it has the correct mean and standard deviation. It is the same procedure that was applied to the IQ subtest scores in the calculation of the full scale IQ. This same procedure should be applied when multiple IQ scores have been given.

Assuming that all the IQ scores have a mean of μ = 100 and a standard deviation of σ = 15, the composite IQ of k scores is:

$\text{Composite IQ}=\dfrac{\text{Sum of the IQ scores}-100k}{\sqrt{\text{Sum of the correlation matrix}}}+100$

In the graph above, the diagonal axis represents the composite IQ with the proper scaling so that the composite IQ has a mean of 100 and a standard deviation of 15 (instead of 14.62). As stated previously, if the 2 IQ tests were simply averaged, only about 2.0% score 70 or less. On a properly scaled IQ score, 2.0% corresponds to an IQ of 69.

Does 1 point matter? It does to the person who on average scored 71 on the 2 IQ tests. That person, with the score properly rescaled, would have a composite IQ of  70 and thus would be deemed ineligible for execution.

Your intuition might be telling you that something is fishy about all this. Does this mean that whenever someone scores 71 on an IQ test, just missing the threshold, that another test should be given, resulting in another score of 71 so that the composite score is 70? The answer is that your intuition (and mine) is often unreliable when it comes to probability. As I have explained in this video, most people who score 71 on one IQ test score higher than 71 on a second IQ test. As long as all the scores are properly rescaled, the composite IQ is more accurate and nothing fishy is happening.

This procedure should not be applied mechanically in all situations. The method assumes that each score is equally valid and thus has equal weight. There are reasons to prefer some IQ administrations over others (e.g., a full battery given by a licensed clinician is likely to be more accurate than an abbreviated IQ test given by a first-year graduate student). If there are reasons to dismiss a particular score (e.g., the evaluee intentionally tried to obtain a low score), it should not figure into the composite score. There are further complications not discussed here such as the fact that people tend to score higher when retested with the same test (or one that is very similar).

# Execution by Miscalculation

Some people facing execution have had their IQ estimated multiple times. If an ability is measured more than once, there is an established way to combine all of the available information. These procedures are used in every psychological test ever published.

Some people who should not be eligible for execution are deemed eligible because psychologists are not combining IQ scores properly. The current standard of averaging the scores (or taking the median) is wrong. The correct procedure is the one we use in every other domain of psychological measurement, including in the computation of a single IQ score. When a person has been tested multiple times, this same procedure should be used to estimate a person’s IQ.

Those seeking detailed explanations of the correct calculation procedures will find them in this paper and in the second half of this video.

The correct calculation procedures are also explained on pages 289–291 of this chapter:

Schneider, W. J. (2013). Principles of assessment of aptitude and achievement. In D. Saklofske, C. Reynolds, & V. Schwean (Eds.), Oxford handbook of psychological assessment of children and adolescents (pp. 286–330). New York: Oxford University Press.

# Video Tutorial: Misunderstanding Regression to the Mean

One of the most widely misunderstood statistical concepts is regression to the mean. In this video tutorial, I address common false beliefs about regression to the mean and answer the following questions:

1. What is regression to the mean?
2. Do variables become less variable each time they are measured?
3. Does regression to the mean happen all the time or just in certain situations?
4. Does repeated testing cause people to come closer and closer to the mean?
5. How is regression to the mean relevant in death penalty cases?

