Principles of assessment of aptitude and achievement, Tutorial

Predicted Achievement Using Simple Linear Regression

There are two ways to make an estimate of a person’s abilities. A point estimate (a single number) is precise but usually wrong, whereas an interval estimate (a range of numbers) is usually right but can be so wide that it is nearly useless. Confidence intervals combine both types of estimates in order to balance the weaknesses of one type of estimate with the strengths of the other. If I say that Suzie’s expected reading comprehension is 85 ± 11, the 85 is the point estimate (also known as the expected score or the predicted score or just Ŷ). The ± 11 is called the margin of error. If the confidence level is left unspecified, by convention we mean the 95% margin of error. If I add 11 and subtract 11 to get a range from 74 to 96, I have the respective lower and upper bounds of the 95% confidence interval.

Calculating the Predicted Achievement Score

I will assume that both the IQ and achievement scores are index scores (μ = 100, σ = 15) to make things simple. The predicted achievement score is a point estimate. It represents the best guess we can make in the absence of other information. The equation below is called a regression equation.

\hat{Y}=\sigma_Y r_{XY} \frac{X-\mu_X}{\sigma_X}+\mu_Y

If X is IQ, Y is Achievement, and both scores are index scores (μ = 100, σ = 15), the regression equation simplifies to:

Predicted achievement = (Correlation between IQ and Achievement) (IQ – 100) + 100

Calculating the Confidence Interval for the Predicted Achievement Score

Whenever you make a prediction using regression, your estimate is not exactly right very often. It is expected to differ from the actual achievement score by a certain amount (on average). This amount is called the standard error of the estimate. It is the standard deviation of all the prediction errors. Thus, it is the standard to which all the errors in your estimates are compared. When both scores are index scores, the formula is

\text{Standard error of the estimate}=\sqrt{1-r^2_{XY}}

To calculate the margin of error, multiply the standard error of the estimate by the z-score that corresponds to the degree of confidence desired. In Microsoft Excel the formula for the z-score corresponding to the 95% confidence interval is



For the 95% confidence interval, multiply the standard error of the estimate by 1.96. The 95% confidence interval’s formula is

95% Confidence Interval = Predicted achievement ± 1.96 * Standard error of the estimate

This interval estimates the achievement score for 95% of people with the same IQ as the child. About 2.5% will score lower than this estimate and 2.5% will score higher.

You can use Excel to estimate how unusual it is for an observed achievement score to differ from a predicted achievement score in a particular direction by using this formula,

=NORMSDIST(-1*ABS(Observed-Predicted)/(Standard error of the estimate))

If a child’s observed achievement score is unusually low, it does not automatically mean that the child has a learning disorder. Many other things need to be checked before that diagnosis can be considered valid. However, it does mean that an explanation for the unusually low achievement score should be sought.

This post is an excerpt from:

Schneider, W. J. (2013). Principles of assessment of aptitude and achievement. In D. Saklofske, C. Reynolds, & V. Schwean (Eds.), Oxford handbook of psychological assessment of children and adolescents (pp. 286–330). New York: Oxford

Cognitive Assessment, Principles of assessment of aptitude and achievement

Potential Misconceptions about Potential

          If you are a mono-g-ist, you can use the estimate of g (IQ) to get an idea of what is the typical range of achievement scores for a child with that IQ. Not every child with the same IQ will have the same achievement scores.[1] Not even mono-g-ists believe that. Also, it is simply not true that achievement cannot be higher than IQ. Equally false is the assumption that if achievement is higher than IQ, then the IQ is wrong. These misconceptions are based on two premises: one true, the other false. If potential is the range of all possible outcomes, it is logically true that people cannot exceed their potentials. The false premise is that IQ and achievement tests are measured on the “potential scale.” By analogy, if I say, “This thermometer reads -10 degrees. I know from my understanding of physics that Brownian motion never stops and thus no temperature dips below zero. Therefore, this thermometer is incorrect.” My premise is true, if the thermometer is on the Kelvin scale. However, it is on the Celsius scale and so there is no reason to believe that something is amiss. IQ and achievement simply are not measured on the “potential scale.” They are measured with standard scores, which are transformed deviations from a population mean. Because of this about half of all people have academic achievement scores that are higher than their own IQ. There is nothing wrong with this.

[1] And not every child with the same achievement scores will have the same IQ.

This post is an excerpt from:

Schneider, W. J. (2013). Principles of assessment of aptitude and achievement. In D. Saklofske, C. Reynolds, & V. Schwean (Eds.), Oxford handbook of psychological assessment of children and adolescents (pp. 286–330). New York: Oxford

Cognitive Assessment, Principles of assessment of aptitude and achievement

How to Assess Aptitudes If You Are a Mono-g-ist

For the mono-g-ist, the assessment of aptitudes is rather simple: measure g and be done with it. Other abilities may have a little predictive validity beyond g, but not enough to make it worth all the additional effort needed (Glutting, Watkins, Konold, & McDermott, 2006). This advice is simple enough, but how does one measure g well?

The first step is to select a set of highly g-loaded tests. The term highly g-loaded simply means to correlate strongly with statistical g. This raises an important question. If the existence of g is in doubt, how can we know if a test correlates with it? To the poly-G-ist, this might sound like studying the environmental impact of unicorn overpopulation. The problem is resolved by distinguishing between two different meanings of g. First, there is theoretical g, a hypothetical entity thought to have causal relationships with many aspects of daily functioning. This is the g that many doubt exists. Second, there is statistical g, which is not in question. It is typically defined by a statistical procedure called factor analysis (or a closely related procedure called principal components analysis). All scholars agree that statistical g can be extracted from a correlation matrix and that virtually all cognitive tests correlate positively with it to some degree. Thus, a g-hating poly-G-ist can talk about a g-loaded test without fear of self-contradiction. A highly g-loaded test simply has a strong correlation with statistical g. A highly g-loaded test, then, is by definition highly correlated with many other tests. This means that it is probably a good predictor of academic achievement tests, which are, for the most part, also highly g-loaded. A cognitive test with a low g-loading (e.g., WJ III Planning or WISC-IV Cancellation) does not correlate with much of anything except itself. Mono-g-ists avoid such tests whenever possible (but Poly-g-ists love them—if they can be found to be uniquely predictive of an important outcome).

The second step to estimate g is to make sure that the highly g-loaded tests you have selected are as different from each other as possible in terms of item content and response format. To select highly similar tests (e.g., more than one vocabulary test) will contaminate the estimate of g with the influence of narrow abilities, which, to the mono-g-ist, are unimportant.

Fortunately, cognitive ability test publishers have saved us much trouble and have assembled such collections of subtests to create composite scales that can be used to estimate g. Such composite scores go by many different names[1] but I will refer to them as IQ scores. These operational measures of g tend to correlate strongly with one another, mostly in the range of 0.70 to 0.80 but sometimes as low as 0.60 or as high as 0.90 (Kamphaus, 2005). Even so, they are not perfectly interchangeable. If both tests have the traditional mean of 100 and standard deviation of 15, the probability that the two scores will be within a certain range of each other can be found in the Table below.[2] For example, for a person who takes two IQ tests that are correlated at 0.80, there is a 29% chance that the IQ scores will differ by 10 points or more.

What is the probability that a person’s scores on two IQ tests will differ by the specified amount or more?


Probability if the IQ tests correlate at r =



0 .70

0 .80

0 .90

> 5





> 10





> 15





> 20





> 25






If a person has two or more IQ scores that differ by a wide margin, it does not necessarily mean that something is wrong. To insist on perfect correlations between IQ tests is not realistic and not fair.[3] However, when a child has taken two IQ tests recently and the scores are different, it raises the question of which IQ is more accurate.

[1] Full Scale IQ (WISC-IV, SB5, UNIT), Full Scale (Leiter-R, CAS), General Intellectual Ability (WJ III), General Conceptual Ability (DAS-II), Composite Intelligence Index (RIAS), Composite Intelligence Scale (KAIT), Fluid-Crystallized Index (KABC-II), and many others.

[2] This table was created by calculating the standard deviation of the difference between two correlated normally distributed variables and then applying the cumulative probability density function of the normal curve.

[3] “If I were to command a general to turn into a seagull, and if the general did not obey, that would not be the general’s fault. It would be mine.” – Antoine de Saint-Exupéry, The Little Prince

This post is an excerpt from:

Schneider, W. J. (2013). Principles of assessment of aptitude and achievement. In D. Saklofske, C. Reynolds, & V. Schwean (Eds.), Oxford handbook of psychological assessment of children and adolescents (pp. 286–330). New York: Oxford

Cognitive Assessment, Principles of assessment of aptitude and achievement

Aptitudes and Achievement: Definitions, Distinctions, and Difficulties

Achievement typically refers to knowledge and skills that are formally taught in academic settings. However, this definition of achievement can be broadened to include any ability that is valued and taught in a particular cultural setting (e.g., hunting, dancing, or computer programming). Aptitude refers to an individual’s characteristics that indicate the potential to develop a culturally valued ability, given the right circumstances. The difference between aptitudes and achievement at the definitional level is reasonably clear. However, at the measurement level, the distinction becomes rather murky.

Potential, which is latent within a person, is impossible to observe directly. It must be inferred by measuring characteristics that either are typically associated with an ability or are predictive of the future development of the ability. Most of the time, aptitude is assessed by measuring abilities that are considered to be necessary precursors of achievement. For example, children who understand speech have greater aptitude for reading comprehension than do children who do not understand speech. Such precursors may themselves be a form of achievement. For example, it is possible for researchers to consider students’ knowledge of history as an outcome variable that is intrinsically valuable. However, some researchers may measure knowledge of history as a predictor of being able to construct a well-reasoned essay on politics. Thus, aptitude and achievement tests are not distinguished by their content but by how they are used. If we use a test to measure current mastery of a culturally valued ability, it is an achievement test. If we use a test to explain or forecast mastery of a culturally valued ability, it is an aptitude test.

IQ tests are primarily used as aptitude tests. However, an inspection of the contents of most IQ tests reveals that many test items could be repurposed as items in an achievement test (e.g., vocabulary, general knowledge, and mental arithmetic items). Sometimes the normal roles of reading tests and IQ tests are reversed, such as when neuropsychologists estimate loss of function following a brain injury by comparing current IQ to performance on a word-reading test.

A simple method to distinguish between aptitude and achievement is to ask, “Do I care about whether a child has the ability measured by this test because it is inherently valuable or because it is associated with some other ability (the one that I actually care about)?” Most people want children to be able to comprehend what they read. Thus, reading tests are typically achievement tests. Most people are not particularly concerned about how well children can reorder numbers and letters in their heads. Thus, the WISC-IV Number-Letter Sequencing subtest is typically used as an aptitude test, presumably because the ability it measures is a necessary component of being able to master algebra, program computers, follow the chain of logic presented by debating candidates, and other skills that people in our culture care about.

This post is an excerpt from:

Schneider, W. J. (2013). Principles of assessment of aptitude and achievement. In D. Saklofske, C. Reynolds, & V. Schwean (Eds.), Oxford handbook of psychological assessment of children and adolescents (pp. 286–330). New York: Oxford University Press.