# How common is it to have no academic weaknesses?

I’m afraid that the question posed by the title does not have a single answer. It depends on how we define and measure academic performance.

Let’s sidestep some difficult questions about what exactly an “academic deficit” is and for the sake of convenience pretend that it is a score at least 1 standard deviation below the mean on a well normed test administered by a competent psychologist with good clinical skills.

Suppose that we start with the 9 core WJ III achievement tests (the answers will not be all that different with the new WJ IV):

What is the percentage of the population that does not have any score below 85? If we can assume that the scores are multivariate normal, the answer can be found using data simulation or via the cumulative density function of the multivariate normal distribution. I gave examples of both methods in the previous post. If we use the correlation matrix for the 6 to 9 age group of the WJ III NU, about 47% of the population has no academic scores below 85.

Using the same methods we can estimate what percent of the population has no academic scores below various thresholds. Subtracting these numbers from 100%, we can see that fairly large proportions have at least one low score.

# What proportion of people with average cognitive scores have no academic weaknesses?

The numbers in the table above include people with very low cognitive ability. It would be more informative if we could control for a person’s measured cognitive abilities.

Suppose that an individual has index scores of exactly 100 for all 14 subtests that are used to calculate the WJ III GIA Extended. We can calculate the means and the covariance matrix of the achievement tests for all people with this particular cognitive profile. We will make use of the conditional multivariate normal distribution. As explained here (or here), we partition the academic tests $(\mathbf{X}_1)$ and the cognitive predictor tests $(\mathbf{X}_2)$ like so:

$\begin{pmatrix}\mathbf{X}_1 \\ \mathbf{X}_2 \end{pmatrix}\sim\mathcal{N}\left(\begin{pmatrix}\boldsymbol{\mu}_1 \\ \boldsymbol{\mu}_2\end{pmatrix},\begin{pmatrix}\mathbf{\Sigma}_{11} & \mathbf{\Sigma}_{12} \\ \mathbf{\Sigma}_{21} & \mathbf{\Sigma}_{22}\end{pmatrix}\right)$

• $\boldsymbol{\mu}_1$ and $\boldsymbol{\mu}_2$ are the mean vectors for the academic and cognitive variables, respectively.
• $\mathbf{\Sigma}_{11}$ and $\mathbf{\Sigma}_{22}$ are the covariances matrices of academic and cognitive variables, respectively.
• $\mathbf{\Sigma}_{12}$ is the matrix of covariances between the academic and cognitive variables.

If the cognitive variables have the vector of particular values $\mathbf{x}_2$, then the conditional mean vector of the academic variables $(\boldsymbol{\mu}_{1|2})$ is:

$\boldsymbol{\mu}_{1|2}=\boldsymbol{\mu}_1+\mathbf{\Sigma}_{12}\mathbf{\Sigma}^{-1}_{22}(\mathbf{x}_2-\boldsymbol{\mu}_2)$

The conditional covariance matrix:
$\mathbf{\Sigma}_{1|2}=\mathbf{\Sigma}_{11}-\mathbf{\Sigma}_{12}\mathbf{\Sigma}^{-1}_{22}\mathbf{\Sigma}_{21}$

If we can assume multivariate normality, we can use these equations, to estimate the proportion of people with no scores below any threshold on any set of scores conditioned on any set of predictor scores. In this example, about 51% of people with scores of exactly 100 on all 14 cognitive predictors have no scores below 85 on the 9 academic tests. About 96% of people with this cognitive profile have no scores below 70.

Because there is an extremely large number of possible cognitive profiles, I cannot show what would happen with all of them. Instead, I will show what happens with all of the perfectly flat profiles from all 14 cognitive scores equal to 70 to all 14 cognitive scores equal to 130.

Here is what happens with the same procedure when the threshold is 70 for the academic scores:

Here is the R code I used to perform the calculations. You can adapt it to other situations fairly easily (different tests, thresholds, and profiles).

library(mvtnorm)
WJ <- matrix(c(
1,0.49,0.31,0.46,0.57,0.28,0.37,0.77,0.36,0.15,0.24,0.49,0.25,0.39,0.61,0.6,0.53,0.53,0.5,0.41,0.43,0.57,0.28, #Verbal Comprehension
0.49,1,0.27,0.32,0.47,0.26,0.32,0.42,0.25,0.21,0.2,0.41,0.21,0.28,0.38,0.43,0.31,0.36,0.33,0.25,0.29,0.4,0.18, #Visual-Auditory Learning
0.31,0.27,1,0.25,0.33,0.18,0.21,0.28,0.13,0.16,0.1,0.33,0.13,0.17,0.25,0.22,0.18,0.21,0.19,0.13,0.25,0.31,0.11, #Spatial Relations
0.46,0.32,0.25,1,0.36,0.17,0.26,0.44,0.19,0.13,0.26,0.31,0.18,0.36,0.4,0.36,0.32,0.29,0.31,0.27,0.22,0.33,0.2, #Sound Blending
0.57,0.47,0.33,0.36,1,0.29,0.37,0.49,0.28,0.16,0.23,0.57,0.24,0.35,0.4,0.44,0.36,0.38,0.4,0.34,0.39,0.53,0.27, #Concept Formation
0.28,0.26,0.18,0.17,0.29,1,0.35,0.25,0.36,0.17,0.27,0.29,0.53,0.22,0.37,0.32,0.52,0.42,0.32,0.49,0.42,0.37,0.61, #Visual Matching
0.37,0.32,0.21,0.26,0.37,0.35,1,0.3,0.24,0.13,0.22,0.33,0.21,0.35,0.39,0.34,0.38,0.38,0.36,0.33,0.38,0.43,0.36, #Numbers Reversed
0.77,0.42,0.28,0.44,0.49,0.25,0.3,1,0.37,0.15,0.23,0.43,0.23,0.37,0.56,0.55,0.51,0.47,0.47,0.39,0.36,0.51,0.26, #General Information
0.36,0.25,0.13,0.19,0.28,0.36,0.24,0.37,1,0.1,0.22,0.21,0.38,0.26,0.26,0.33,0.4,0.28,0.27,0.39,0.21,0.25,0.32, #Retrieval Fluency
0.15,0.21,0.16,0.13,0.16,0.17,0.13,0.15,0.1,1,0.06,0.16,0.17,0.09,0.11,0.09,0.13,0.1,0.12,0.13,0.07,0.12,0.07, #Picture Recognition
0.24,0.2,0.1,0.26,0.23,0.27,0.22,0.23,0.22,0.06,1,0.22,0.35,0.2,0.16,0.22,0.25,0.21,0.19,0.26,0.17,0.19,0.21, #Auditory Attention
0.49,0.41,0.33,0.31,0.57,0.29,0.33,0.43,0.21,0.16,0.22,1,0.2,0.3,0.33,0.38,0.29,0.31,0.3,0.25,0.42,0.47,0.25, #Analysis-Synthesis
0.25,0.21,0.13,0.18,0.24,0.53,0.21,0.23,0.38,0.17,0.35,0.2,1,0.15,0.19,0.22,0.37,0.21,0.2,0.4,0.23,0.19,0.37, #Decision Speed
0.39,0.28,0.17,0.36,0.35,0.22,0.35,0.37,0.26,0.09,0.2,0.3,0.15,1,0.39,0.36,0.32,0.3,0.3,0.3,0.25,0.33,0.23, #Memory for Words
0.61,0.38,0.25,0.4,0.4,0.37,0.39,0.56,0.26,0.11,0.16,0.33,0.19,0.39,1,0.58,0.59,0.64,0.5,0.48,0.46,0.52,0.42, #Letter-Word Identification
0.6,0.43,0.22,0.36,0.44,0.32,0.34,0.55,0.33,0.09,0.22,0.38,0.22,0.36,0.58,1,0.52,0.52,0.47,0.42,0.43,0.49,0.36, #Passage Comprehension
0.53,0.36,0.21,0.29,0.38,0.42,0.38,0.47,0.28,0.1,0.21,0.31,0.21,0.3,0.64,0.52,0.58,1,0.5,0.49,0.46,0.47,0.49, #Spelling
0.5,0.33,0.19,0.31,0.4,0.32,0.36,0.47,0.27,0.12,0.19,0.3,0.2,0.3,0.5,0.47,0.48,0.5,1,0.44,0.41,0.46,0.36, #Writing Samples
0.41,0.25,0.13,0.27,0.34,0.49,0.33,0.39,0.39,0.13,0.26,0.25,0.4,0.3,0.48,0.42,0.65,0.49,0.44,1,0.38,0.37,0.55, #Writing Fluency
0.43,0.29,0.25,0.22,0.39,0.42,0.38,0.36,0.21,0.07,0.17,0.42,0.23,0.25,0.46,0.43,0.42,0.46,0.41,0.38,1,0.57,0.51, #Calculation
0.57,0.4,0.31,0.33,0.53,0.37,0.43,0.51,0.25,0.12,0.19,0.47,0.19,0.33,0.52,0.49,0.43,0.47,0.46,0.37,0.57,1,0.46, #Applied Problems
0.28,0.18,0.11,0.2,0.27,0.61,0.36,0.26,0.32,0.07,0.21,0.25,0.37,0.23,0.42,0.36,0.59,0.49,0.36,0.55,0.51,0.46,1), nrow= 23, byrow=TRUE) #Math Fluency
WJNames <- c("Verbal Comprehension", "Visual-Auditory Learning", "Spatial Relations", "Sound Blending", "Concept Formation", "Visual Matching", "Numbers Reversed", "General Information", "Retrieval Fluency", "Picture Recognition", "Auditory Attention", "Analysis-Synthesis", "Decision Speed", "Memory for Words", "Letter-Word Identification", "Passage Comprehension", "Reading Fluency", "Spelling", "Writing Samples", "Writing Fluency", "Calculation", "Applied Problems", "Math Fluency")
rownames(WJ) <- colnames(WJ) <- WJNames

#Number of tests
k<-length(WJNames)

#Means and standard deviations of tests
mu<-rep(100,k)
sd<-rep(15,k)

#Covariance matrix
sigma<-diag(sd)%*%WJ%*%diag(sd)
colnames(sigma)<-rownames(sigma)<-WJNames

#Vector identifying predictors (WJ Cog)
p<-seq(1,14)

#Threshold for low scores
Threshold<-85

#Proportion of population who have no scores below the threshold
pmvnorm(lower=rep(Threshold,length(WJNames[-p])),upper=rep(Inf,length(WJNames[-p])),sigma=sigma[-p,-p],mean=mu[-p])[1]

#Predictor test scores for an individual
x<-rep(100,length(p))
names(x)<-WJNames[p]

#Condition means and covariance matrix
condMu<-c(mu[-p] + sigma[-p,p] %*% solve(sigma[p,p]) %*% (x-mu[p]))
condSigma<-sigma[-p,-p] - sigma[-p,p] %*% solve(sigma[p,p]) %*% sigma[p,-p]

#Proportion of people with the same predictor scores as this individual who have no scores below the threshold
pmvnorm(lower=rep(Threshold,length(WJNames[-p])),upper=rep(Inf,length(WJNames[-p])),sigma=condSigma,mean=condMu)[1]


Standard

# Conditional normal distributions provide useful information in psychological assessment

Conditional Normal Distribution

Conditional normal distributions are really useful in psychological assessment. We can use them to answer questions like:

• How unusual is it for someone with a vocabulary score of 120 to have a score of 90 or lower on reading comprehension?
• If that person also has a score of 80 on a test of working memory capacity, how much does the risk of scoring 90 or lower on reading comprehension increase?

What follows might be mathematically daunting. Just let it wash over you if it becomes confusing. At the end, there is a video in which I will show how to use a spreadsheet that will do all the calculations.

# Unconditional Normal Distributions

Suppose that variable Y represents reading comprehension test scores. Here is a fancy way of saying Y is normally distributed with a mean of 100 and a standard deviation of 15:

$Y\sim N(100,15^2)$

In this notation, “~” means is distributed as, and N means normally distributed with a particular mean (μ) and variance (σ2).

If we know literally nothing about a person from this population, our best guess is that the person’s reading comprehension score is at the population mean. One way to say this is that the person’s expected value on reading comprehension is the population mean:

$E(Y)=\mu_Y = 100$

The 95% confidence interval around this guess is :

$95\%\, \text{CI} = \mu_Y \pm z_{95\%} \sigma_Y$

$95\%\, \text{CI} \approx 100 \pm 1.96*15 = 70.6 \text{ to } 129.4$

Unconditional Normal Distribution with 95% CI

# Conditional Normal Distributions

## Simple Linear Regression

Now, suppose that we know one thing about the person: the person’s score on a vocabulary test. We can let X represent the vocabulary score and its distribution is the same as that of Y:

$X\sim N(100,15^2)$

If we know that this person scored 120 on vocabulary (X), what is our best guess as to what the person scored on reading comprehension (Y)? This guess is a conditional expected value. It is “conditional” in the sense that the expected value of Y depends on what value X has. The pipe symbol “|” is used to note a condition like so:

$E(Y|X=120)$

This means, “What is our best guess for Y if X is 120?”

What if we don’t want to be specific about the value of X but want to refer to any particular value of X? Oddly enough, it is traditional to use the lowercase x for that. So, X refers to the variable as a whole and x refers to any particular value of variable X. So if I know that variable X happens to be a particular value x, the expected value of Y is:

$E(Y|X=x)=\sigma_Y \rho_{XY}\dfrac{x-\mu_X}{\sigma_X}+\mu_Y$

where ρXY is the correlation between X and Y.

You might recognize that this is a linear regression formula and that:

$E(Y|X=x)=\hat{Y}$

where “Y-hat” (Ŷ) is the predicted value of Y when X is known.

Let’s assume that the relationship between X and Y is bivariate normal like in the image at the top of the post:

$\begin{bmatrix}X\\Y\end{bmatrix}\sim N\left(\begin{bmatrix}\mu_X\\ \mu_Y\end{bmatrix}\begin{matrix} \\,\end{matrix}\begin{bmatrix}\sigma_X^2&\rho_{XY}\sigma_X\sigma_Y\\ \rho_{XY}\sigma_X\sigma_Y&\sigma_X^2\end{bmatrix}\right)$

The first term in the parentheses is the vector of means and the second term (the square matrix in the brackets) is the covariance matrix of X and Y. It is not necessary to understand the notation. The main point is that X and Y are both normal, they have a linear relationship, and the conditional variance of Y at any value of X is the same.

The conditional standard deviation of Y at any particular value of X is:

$\sigma_{Y|X=x}=\sigma_Y\sqrt{1-\rho_{xy}^2}$

This is the standard deviation of the conditional normal distribution. In the language of regression, it is the standard error of the estimate (σe). It is the standard deviation of the residuals (errors). Residuals are simply the amount by which your guesses differ from the actual values.

$e = y - E(Y|X=x)=y-\hat{Y}$

So,

$\sigma_{Y|X=x}=\sigma_e$

So, putting all this together, we can answer our question:

How unusual is it for someone with a vocabulary score of 120 to have a score of 90 or lower on reading comprehension?

The expected value of Y (Ŷ) is:

$E(Y|X=120)=15\rho_{XY}\dfrac{120-100}{15}+100$

Suppose that the correlation is 0.5. Therefore,

$E(Y|X=120)=15*0.5\dfrac{120-100}{15}+100=110$

This means that among all the people with a vocabulary score of 120, the average is 110 on reading comprehension. Now, how far off from that is 90?

$e= y - \hat{Y}=90-110=-20$

What is the standard error of the estimate?

$\sigma_{Y|X=x}=\sigma_e=\sigma_Y\sqrt{1-\rho_{xy}^2}$

$\sigma_{Y|X=x}=\sigma_e=15\sqrt{1-0.5^2}\approx12.99$

Dividing the residual by the standard error of the estimate (the standard deviation of the conditional normal distribution) gives us a z-score. It represents how far from expectations this individual is in standard deviation units.

$z=\dfrac{e}{\sigma_e} \approx\dfrac{-20}{12.99}\approx -1.54$

Using the standard normal cumulative distribution function (Φ) gives us the proportion of people scoring 90 or less on reading comprehension (given a vocabulary score of 120).

$\Phi(z)\approx\Phi(-1.54)\approx 0.06$

In Microsoft Excel, the standard normal cumulative distribution function is NORMSDIST. Thus, entering this into any cell will give the answer:

=NORMSDIST(-1.54)

Conditional normal distribution when Vocabulary = 120

## Multiple Regression

What proportion of people score 90 or less on reading comprehension if their vocabulary is 120 but their working memory capacity is 80?

Let’s call vocabulary X1 and working memory capacity X2. Let’s suppose they correlated at 0.3. The correlation matrix among the predictors (RX):

$\mathbf{R_X}=\begin{bmatrix}1&\rho_{12}\\ \rho_{12}&1\end{bmatrix}=\begin{bmatrix}1&0.3\\ 0.3&1\end{bmatrix}$

The validity coefficients are the correlations of Y with both predictors (RXY):

$\mathbf{R}_{XY}=\begin{bmatrix}\rho_{Y1}\\ \rho_{Y2}\end{bmatrix}=\begin{bmatrix}0.5\\ 0.4\end{bmatrix}$

The standardized regression coefficients (β) are:

$\pmb{\mathbf{\beta}}=\mathbf{R_{X}}^{-1}\mathbf{R}_{XY}\approx\begin{bmatrix}0.418\\ 0.275\end{bmatrix}$

Unstandardized coefficients can be obtained by multiplying the standardized coefficients by the standard deviation of Y (σY) and dividing by the standard deviation of the predictors (σX):

$\mathbf{b}=\sigma_Y\pmb{\mathbf{\beta}}/\pmb{\mathbf{\sigma}}_X$

However, in this case all the variables have the same metric and thus the unstandardized and standardized coefficients are the same.

The vector of predictor means (μX) is used to calculate the intercept (b0):

$b_0=\mu_Y-\mathbf{b}' \pmb{\mathbf{\mu}}_X$

$b_0\approx 100-\begin{bmatrix}0.418\\ 0.275\end{bmatrix}^{'} \begin{bmatrix}100\\ 100\end{bmatrix}\approx 30.769$

The predicted score when vocabulary is 120 and working memory capacity is 80 is:

$\hat{Y}=b_0 + b_1 X_1 + b_2 X_2$

$\hat{Y}\approx 30.769+0.418*120+0.275*80\approx 102.9$

The error in this case is 90-102.9=-12.9:

The multiple R2 is calculated with the standardized regression coefficients and the validity coefficients.

$R^2 = \pmb{\mathbf{\beta}}'\pmb{\mathbf{R}}_{XY}\approx\begin{bmatrix}0.418\\ 0.275\end{bmatrix}^{'} \begin{bmatrix}0.5\\ 0.4\end{bmatrix}\approx0.319$

The standard error of the estimate is thus:

$\sigma_e=\sigma_Y\sqrt{1-R^2}\approx 15\sqrt{1-0.319^2}\approx 12.38$

The proportion of people with vocabulary = 120 and working memory capacity = 80 who score 90 or less is:

$\Phi\left(\dfrac{e}{\sigma_e}\right)\approx\Phi\left(\dfrac{-12.9}{12.38}\right)\approx 0.15$