# How unusual is it to have multiple scores below a threshold?

In psychological assessment, it is common to specify a threshold at which a score is considered unusual (e.g., 2 standard deviations above or below the mean). If we can assume that the scores are roughly normal, it is easy to estimate the proportion of people with scores below the threshold we have set. If the threshold is 2 standard deviations below the mean, then the Excel function `NORMSDIST` will tell us the answer:

```=NORMSDIST(-2)

=0.023```

In R, the `pnorm` function gives the same answer:

`pnorm(-2)`

How unusual is it to have multiple scores below the threshold? The answer depends on how correlated the scores are. If we can assume that the scores are multivariate normal, Crawford and colleagues (2007) show us how to obtain reasonable estimates using simulated data. Here is a script in R that depends on the `mvtnorm` package. Suppose that the 10 subtests of the WAIS-IV have correlations as depicted below. Because the subtests have a mean of 10 and a standard deviation of 3, the scores are unusually low if 4 or lower.

```#WAIS-IV subtest names
WAISSubtests <- c("BD", "SI", "DS", "MR", "VO", "AR", "SS", "VP", "IN", "CD")

# WAIS-IV correlations
WAISCor <- rbind(
c(1.00,0.49,0.45,0.54,0.45,0.50,0.41,0.64,0.44,0.40), #BD
c(0.49,1.00,0.48,0.51,0.74,0.54,0.35,0.44,0.64,0.41), #SI
c(0.45,0.48,1.00,0.47,0.50,0.60,0.40,0.40,0.43,0.45), #DS
c(0.54,0.51,0.47,1.00,0.51,0.52,0.39,0.53,0.49,0.45), #MR
c(0.45,0.74,0.50,0.51,1.00,0.57,0.34,0.42,0.73,0.41), #VO
c(0.50,0.54,0.60,0.52,0.57,1.00,0.37,0.48,0.57,0.43), #AR
c(0.41,0.35,0.40,0.39,0.34,0.37,1.00,0.38,0.34,0.65), #SS
c(0.64,0.44,0.40,0.53,0.42,0.48,0.38,1.00,0.43,0.37), #VP
c(0.44,0.64,0.43,0.49,0.73,0.57,0.34,0.43,1.00,0.34), #IN
c(0.40,0.41,0.45,0.45,0.41,0.43,0.65,0.37,0.34,1.00)) #CD
rownames(WAISCor) <- colnames(WAISCor) <- WAISSubtests

#Means
WAISMeans<-rep(10,length(WAISSubtests))

#Standard deviations
WAISSD<-rep(3,length(WAISSubtests))

#Covariance Matrix
WAISCov<-WAISCor*WAISSD%*%t(WAISSD)

#Sample size
SampleSize<-1000000

library(mvtnorm)

#Make simulated data
d<-rmvnorm(n=SampleSize,mean=WAISMeans,sigma=WAISCov)
#To make this more realistic, you can round all scores to the nearest integer (d<-round(d))

#Threshold for abnormality
Threshold<-4

#Which scores are less than or equal to threshold
Abnormal<- d<=Threshold

#Number of scores less than or equal to threshold
nAbnormal<-rowSums(Abnormal)

#Frequency distribution table
p<-c(table(nAbnormal)/SampleSize)

#Plot
barplot(p,axes=F,las=1,
xlim=c(0,length(p)*1.2),ylim=c(0,1),
bty="n",pch=16,col="royalblue2",
xlab="Number of WAIS-IV subtest scores less than or equal to 4",
ylab="Proportion")
axis(2,at=seq(0,1,0.1),las=1)

# Using the multivariate normal distribution

The simulation method works very well, especially if the sample size is very large. An alternate method that gives more precise numbers is to estimate how much of the multivariate normal distribution is within certain bounds. That is, we find all of the regions of the multivariate normal distribution in which one and only one test is below a threshold and then add up all the probabilities. The process is repeated to find all regions in which two and only two tests are below a threshold. Repeat the process, with 3 tests, 4 tests, and so on. This is tedious to do by hand but only takes a few lines of code do automatically.

```AbnormalPrevalance<-function(Cor,Mean=0,SD=1,Threshold){
require(mvtnorm)
k<-nrow(Cor)
p<-rep(0,k)
zThreshold<-(Threshold-Mean)/SD
for (n in 1:k){
combos<-combn(1:k,n)
ncombos<-ncol(combos)
for (i in 1:ncombos){
u<-rep(Inf,k)
u[combos[,i]]<-zThreshold
l<-rep(-Inf,k)
l[seq(1,k)[-combos[,i]]]<-zThreshold
p[n]<-p[n]+pmvnorm(lower=l,upper=u,mean=rep(0,k),sigma=Cor)
}
}
p<-c(1-sum(p),p)
names(p)<-0:k

barplot(p,axes=F,las=1,xlim=c(0,length(p)*1.2),ylim=c(0,1),
bty="n",pch=16,col="royalblue2",
xlab=bquote("Number of scores less than or equal to " * .(Threshold)),
ylab="Proportion")
axis(2,at=seq(0,1,0.1),las=1) 