In psychological assessment, it is common to specify a threshold at which a score is considered unusual (e.g., 2 standard deviations above or below the mean). If we can assume that the scores are roughly normal, it is easy to estimate the proportion of people with scores below the threshold we have set. If the threshold is 2 standard deviations below the mean, then the Excel function `NORMSDIST`

will tell us the answer:

=NORMSDIST(-2) =0.023

In R, the `pnorm`

function gives the same answer:

pnorm(-2)

How unusual is it to have multiple scores below the threshold? The answer depends on how correlated the scores are. If we can assume that the scores are multivariate normal, Crawford and colleagues (2007) show us how to obtain reasonable estimates using simulated data. Here is a script in R that depends on the `mvtnorm`

package. Suppose that the 10 subtests of the WAIS-IV have correlations as depicted below. Because the subtests have a mean of 10 and a standard deviation of 3, the scores are unusually low if 4 or lower.

#WAIS-IV subtest names WAISSubtests <- c("BD", "SI", "DS", "MR", "VO", "AR", "SS", "VP", "IN", "CD") # WAIS-IV correlations WAISCor <- rbind( c(1.00,0.49,0.45,0.54,0.45,0.50,0.41,0.64,0.44,0.40), #BD c(0.49,1.00,0.48,0.51,0.74,0.54,0.35,0.44,0.64,0.41), #SI c(0.45,0.48,1.00,0.47,0.50,0.60,0.40,0.40,0.43,0.45), #DS c(0.54,0.51,0.47,1.00,0.51,0.52,0.39,0.53,0.49,0.45), #MR c(0.45,0.74,0.50,0.51,1.00,0.57,0.34,0.42,0.73,0.41), #VO c(0.50,0.54,0.60,0.52,0.57,1.00,0.37,0.48,0.57,0.43), #AR c(0.41,0.35,0.40,0.39,0.34,0.37,1.00,0.38,0.34,0.65), #SS c(0.64,0.44,0.40,0.53,0.42,0.48,0.38,1.00,0.43,0.37), #VP c(0.44,0.64,0.43,0.49,0.73,0.57,0.34,0.43,1.00,0.34), #IN c(0.40,0.41,0.45,0.45,0.41,0.43,0.65,0.37,0.34,1.00)) #CD rownames(WAISCor) <- colnames(WAISCor) <- WAISSubtests #Means WAISMeans<-rep(10,length(WAISSubtests)) #Standard deviations WAISSD<-rep(3,length(WAISSubtests)) #Covariance Matrix WAISCov<-WAISCor*WAISSD%*%t(WAISSD) #Sample size SampleSize<-1000000 #Load mvtnorm package library(mvtnorm) #Make simulated data d<-rmvnorm(n=SampleSize,mean=WAISMeans,sigma=WAISCov) #To make this more realistic, you can round all scores to the nearest integer (d<-round(d)) #Threshold for abnormality Threshold<-4 #Which scores are less than or equal to threshold Abnormal<- d<=Threshold #Number of scores less than or equal to threshold nAbnormal<-rowSums(Abnormal) #Frequency distribution table p<-c(table(nAbnormal)/SampleSize) #Plot barplot(p,axes=F,las=1, xlim=c(0,length(p)*1.2),ylim=c(0,1), bty="n",pch=16,col="royalblue2", xlab="Number of WAIS-IV subtest scores less than or equal to 4", ylab="Proportion") axis(2,at=seq(0,1,0.1),las=1) text(x=0.7+0:10*1.2,y=p,labels=formatC(p,digits=2),cex=0.7,pos=3,adj=0.5)

# Using the multivariate normal distribution

The simulation method works very well, especially if the sample size is very large. An alternate method that gives more precise numbers is to estimate how much of the multivariate normal distribution is within certain bounds. That is, we find all of the regions of the multivariate normal distribution in which one and only one test is below a threshold and then add up all the probabilities. The process is repeated to find all regions in which two and only two tests are below a threshold. Repeat the process, with 3 tests, 4 tests, and so on. This is tedious to do by hand but only takes a few lines of code do automatically.

AbnormalPrevalance<-function(Cor,Mean=0,SD=1,Threshold){ require(mvtnorm) k<-nrow(Cor) p<-rep(0,k) zThreshold<-(Threshold-Mean)/SD for (n in 1:k){ combos<-combn(1:k,n) ncombos<-ncol(combos) for (i in 1:ncombos){ u<-rep(Inf,k) u[combos[,i]]<-zThreshold l<-rep(-Inf,k) l[seq(1,k)[-combos[,i]]]<-zThreshold p[n]<-p[n]+pmvnorm(lower=l,upper=u,mean=rep(0,k),sigma=Cor)[1] } } p<-c(1-sum(p),p) names(p)<-0:k barplot(p,axes=F,las=1,xlim=c(0,length(p)*1.2),ylim=c(0,1), bty="n",pch=16,col="royalblue2", xlab=bquote("Number of scores less than or equal to " * .(Threshold)), ylab="Proportion") axis(2,at=seq(0,1,0.1),las=1) text(x=0.7+0:10*1.2,y=p,labels=formatC(p,digits=2),cex=0.7,pos=3,adj=0.5) return(p) } Proportions<-AbnormalPrevalance(Cor=WAISCor,Mean=10,SD=3,Threshold=4)

Using this method, the results are nearly the same but slightly more accurate. If the number of tests is large, the code can take a long time to run.

Pingback: How common is it to have no academic weaknesses? | Assessing Psyche, Engaging Gauss, Seeking Sophia