CHC Theory, Principles of assessment of aptitude and achievement

CHC Theory: Love, Marriage, and a Giant Baby Carriage

CHC Theory is the child of two titans, Carroll’s (1993) lumbering leviathan, the Three-Stratum Theory of Cognitive Abilities and Cattell and Horn’s two-headed giant, Gf-Gc Theory (Horn & Cattell, 1964). Given that Horn was as staunchly anti-g as they come (Horn & Blankson, 2005) and that Carroll was a dedicated g-man (though not of the g-and-only-g variety; Carroll, 2003), it surprising that these theories even had a courtship much less a marriage.

From 1986 to the late 1990s, in a series of encounters initiated and chaperoned by test developer Richard Woodcock, Horn and Carroll discussed the intersections of their theories and eventually consented to have their names yoked together under a single framework (McGrew, 2005). Although the interfaith ceremony was officiated by Woodcock, the product of their union was midwifed primarily by McGrew (1997). Woodcock, McGrew and colleagues’ ecumenical approach has created a space in which mono-g-ists and poly-G-ists can engage in civil dialogue or at least ignore one another politely. CHC Theory puts g atop a three-stratum hierarchy of cognitive abilities but g’s role in the theory is such that poly-G-ists can ignore it to the degree that they see fit.

This post is an excerpt from:

Schneider, W. J. (2013). Principles of assessment of aptitude and achievement. In D. Saklofske, C. Reynolds, & V. Schwean (Eds.), Oxford handbook of psychological assessment of children and adolescents (pp. 286–330). New York: Oxford.

Standard
Cognitive Assessment, Principles of assessment of aptitude and achievement, Psychometrics, Tutorial

Can’t Decide Which IQ Is Best? Make a Composite Score.

A man with a watch knows what time it is. A man with two watches is never sure.

-Segal’s Law

Suppose you have been asked to settle a matter with important implications for an evaluee.[1] A young girl was diagnosed with mental retardation [now called intellectual disability] three years ago. Along with low adaptive functioning, her Full Scale IQ was a 68, two points under the traditional line used to diagnose [intellectual disability]. Upon re-evaluation two months ago, her IQ, derived from a different test, was now 78. Worried that their daughter would no longer qualify for services, the family paid out of pocket to have their daughter evaluated by another psychologist and the IQ came out as 66. Because of your reputation for being fair-minded and knowledgeable, you have been asked to decide which, if any, is the real IQ. Of course, there is no such thing as a “real IQ” but you understand what the referral question is.

You give a different battery of tests and the girl scores a 76. Now what should be done? It would be tempting to assume that, “Other psychologists are sloppy, whereas my results are free of error.” However, you are fair minded. You know that all scores have measurement error and you plot the scores and their 95% confidence intervals as seen in Figure 1.

Figure 1

Recent IQ Scores and their 95% Confidence Intervals from the Same Individual

PrinciplesFigure1

It is clear that Test C’s confidence interval does not overlap with those of Tests B and D. Is this kind of variability in scores unusual?[2] There are two tests that indicate an IQ in the high 60’s and two tests that indicate an IQ in the high 70’s. Which pair of tests is correct? Should the poor girl be subjected to yet another test that might act as a tie breaker?

Perhaps the fairest solution is to treat each IQ test as subtests of a much larger “Mega-IQ Test.” That is, perhaps the best that can be done is to combine the four IQ scores into a single score and then construct a confidence interval around it.

Where should the confidence interval be centered? Intuitively, it might seem reasonable to simply average all four IQ results and say that the IQ is 72. However, this is not quite right. Averaging scores gives a rough approximation of a composite score but it is less accurate for low and high scorers than it is for scorers near the mean. An individual’s composite score is further away from the population mean than the average of the individual’s subtest scores. About 3.1% of people score a 72 or lower on a single IQ test (assuming perfect normality). However, if we were to imagine a population of people who took all four IQ tests in question, only 1.9% of them would have an average score of 72 or lower. That is, it is more unusual to have a mean IQ of 72 than it is to score a 72 IQ on any particular IQ test. It is unusual to score 72 on one IQ test but it is even more unusual to score that low on more than one test on average. Another way to think about this issue is to recognize that the mean score cannot be interpreted as an IQ score because it has a smaller standard deviation than IQ scores have. To make it comparable to IQ, it needs to be rescaled so that it has a “standard” standard deviation of 15.

Here is a good method for computing a composite score and its accompanying 95% confidence interval. It is not nearly as complicated as it might seem at first glance. This method assumes that you know the reliability coefficients of all the scores and you know all the correlations between the scores. All scores must be index scores (μ = 100, σ = 15). If they are not, they can be converted using this formula:

\text{Index Score} = 15(\dfrac{X-\mu}{\sigma})+100

Computing a Composite Score

Step 1: Add up all of the scores.

In this case,

68 + 78 + 66 + 76 = 288

Step 2: Subtract the number of tests times 100.

In this case there are 4 tests. Thus,

288-4 * 100 = 288-400 = -112

Step 3: Divide by the square root of the sum of all the elements in the correlation matrix.

In this case, suppose that the four tests are correlated as such:

 

Test A

Test B

Test C

Test D

Test A

1

0.80

0.75

0.85

Test B

0.80

1

0.70

0.71

Test C

0.75

0.70

1

0.78

Test D

0.85

0.71

0.78

1

The sum of all 16 elements, including the ones in the diagonal is 13.18. The square root of 13.18 is about 3.63. Thus,

-112 / 3.63 = -30.85

Step 4: Complete the computation of the composite score by adding 100.

In this case,

-30.82 + 100 = 69.18

Given the four IQ scores available, assuming that there is no reason to favor one above the others, the best estimate is that her IQ is 69. Most of the time, there is no need for further calculation. However, we might like to know how precise this estimate is by constructing a 95% confidence interval around this score.

Confidence Intervals of Composite Scores

Calculating a 95% confidence interval is more complicated than the calculations above but not overly so.

Step 1: Calculate the composite reliability.

Step 1a: Subtract the number of tests from the sum of the correlation matrix.

In this case, there are 4 tests. Therefore,

13.18-4 = 9.18

Step 1b: Add in all the test reliability coefficients.

In this case, suppose that the four reliability coefficients are 0.97, 0.96, 0.98, and 0.97. Therefore,

9.18 + 0.97 + 0.96 + 0.98 + 0.97 = 13.06

Step 1c: Divide by the original sum of the correlation matrix.

In this case,

13.06 / 13.18 \approx 0.9909

Therefore, in this case, the reliability coefficient of the composite score is higher than that of any single IQ score. This makes sense, given that we have four scores, we should know what her IQ is with greater precision than we would if we only had one score.

Step 2: Calculate the standard error of the estimate by subtracting the reliability coefficient squared from the reliability coefficient and taking the square root. Then, multiply by the standard deviation, 15.

In this case,

15\sqrt{0.9909-0.9909^2}\approx 1.4247

Step 3: Calculate the 95% margin of error by multiplying the standard error of the estimate by 1.96.

In this case,

1.96 * 1.44247 \approx 2.79

The value 1.96 is the approximate z-score associated with the 95% confidence interval. If you want the z-score associated with a different margin of error, then use the following Excel formula. Shown here is the calculation of the z-score for a 99% confidence interval:

=\mathrm{NORMSINV}(1-(1-0.99)/2)

Step 4: Calculate the estimated true score by subtracting 100 from the composite score, multiplying the reliability coefficient, and adding 100. That is,

\text{Estimated True Score} =\text{Reliability Coefficient} * (\text{Composite} - 100) + 100

In this case,

0.9909*(69.18-100)+100=69.46

Step 5: Calculate the upper and lower bounds of the 95% confidence interval by starting with the estimated true score and then adding and subtracting the margin of error.

In this case,

69.46 \pm 2.79 = 66.67 \text{ to } 72.25

This means that we are 95% sure that her IQ is between about 67 and 72. Assuming that other criteria for mental retardation [intellectual disability] are met, this is in the range to qualify for services in most states. It should be noted that this procedure can be used for any kind of composite score, not just for IQ tests.


[1] An Excel spreadsheet I wrote can calculate all of the statistics in this section and can be downloaded for free at http://my.ilstu.edu/~wjschne/AssessingPsyche/AssessingPsycheSoftware.html.

[2] This degree of profile variability is not at all unusual. In fact, it is quite typical. A statistic called the Mahalanobis Distance (Crawford & Allen, 1994) can be used to estimate how typical an individual profile of scores is compared to a particular population of score profiles. Using the given correlation matrix and assuming multivariate normality, this profile is at the 86th percentile in terms of profile unusualness…and almost of all of the reason that it is unusual is that its overall elevation is unusually low (Mean = 72). If we consider only those profiles that have an average score of 72, this profile’s unusualness is at the 54th percentile (Schneider, in preparation). That is, the amount of variability in this profile is typical compared to other profiles with an average score of 72.

This post is an excerpt from:

Schneider, W. J. (2013). Principles of assessment of aptitude and achievement. In D. Saklofske, C. Reynolds, & V. Schwean (Eds.), Oxford handbook of psychological assessment of children and adolescents (pp. 286–330). New York: Oxford.

Figure 1 was updated from the original to show more accurately the precision that IQ scores have.

Standard
Principles of assessment of aptitude and achievement, Tutorial

Predicted Achievement Using Simple Linear Regression

There are two ways to make an estimate of a person’s abilities. A point estimate (a single number) is precise but usually wrong, whereas an interval estimate (a range of numbers) is usually right but can be so wide that it is nearly useless. Confidence intervals combine both types of estimates in order to balance the weaknesses of one type of estimate with the strengths of the other. If I say that Suzie’s expected reading comprehension is 85 ± 11, the 85 is the point estimate (also known as the expected score or the predicted score or just Ŷ). The ± 11 is called the margin of error. If the confidence level is left unspecified, by convention we mean the 95% margin of error. If I add 11 and subtract 11 to get a range from 74 to 96, I have the respective lower and upper bounds of the 95% confidence interval.

Calculating the Predicted Achievement Score

I will assume that both the IQ and achievement scores are index scores (μ = 100, σ = 15) to make things simple. The predicted achievement score is a point estimate. It represents the best guess we can make in the absence of other information. The equation below is called a regression equation.

\hat{Y}=\sigma_Y r_{XY} \frac{X-\mu_X}{\sigma_X}+\mu_Y

If X is IQ, Y is Achievement, and both scores are index scores (μ = 100, σ = 15), the regression equation simplifies to:

Predicted achievement = (Correlation between IQ and Achievement) (IQ – 100) + 100

Calculating the Confidence Interval for the Predicted Achievement Score

Whenever you make a prediction using regression, your estimate is not exactly right very often. It is expected to differ from the actual achievement score by a certain amount (on average). This amount is called the standard error of the estimate. It is the standard deviation of all the prediction errors. Thus, it is the standard to which all the errors in your estimates are compared. When both scores are index scores, the formula is

\text{Standard error of the estimate}=\sqrt{1-r^2_{XY}}

To calculate the margin of error, multiply the standard error of the estimate by the z-score that corresponds to the degree of confidence desired. In Microsoft Excel the formula for the z-score corresponding to the 95% confidence interval is

=NORMSINV(1-(1-0.95)/2)

≈1.96

For the 95% confidence interval, multiply the standard error of the estimate by 1.96. The 95% confidence interval’s formula is

95% Confidence Interval = Predicted achievement ± 1.96 * Standard error of the estimate

This interval estimates the achievement score for 95% of people with the same IQ as the child. About 2.5% will score lower than this estimate and 2.5% will score higher.

You can use Excel to estimate how unusual it is for an observed achievement score to differ from a predicted achievement score in a particular direction by using this formula,

=NORMSDIST(-1*ABS(Observed-Predicted)/(Standard error of the estimate))

If a child’s observed achievement score is unusually low, it does not automatically mean that the child has a learning disorder. Many other things need to be checked before that diagnosis can be considered valid. However, it does mean that an explanation for the unusually low achievement score should be sought.

This post is an excerpt from:

Schneider, W. J. (2013). Principles of assessment of aptitude and achievement. In D. Saklofske, C. Reynolds, & V. Schwean (Eds.), Oxford handbook of psychological assessment of children and adolescents (pp. 286–330). New York: Oxford

Standard
Cognitive Assessment, Principles of assessment of aptitude and achievement

Potential Misconceptions about Potential

          If you are a mono-g-ist, you can use the estimate of g (IQ) to get an idea of what is the typical range of achievement scores for a child with that IQ. Not every child with the same IQ will have the same achievement scores.[1] Not even mono-g-ists believe that. Also, it is simply not true that achievement cannot be higher than IQ. Equally false is the assumption that if achievement is higher than IQ, then the IQ is wrong. These misconceptions are based on two premises: one true, the other false. If potential is the range of all possible outcomes, it is logically true that people cannot exceed their potentials. The false premise is that IQ and achievement tests are measured on the “potential scale.” By analogy, if I say, “This thermometer reads -10 degrees. I know from my understanding of physics that Brownian motion never stops and thus no temperature dips below zero. Therefore, this thermometer is incorrect.” My premise is true, if the thermometer is on the Kelvin scale. However, it is on the Celsius scale and so there is no reason to believe that something is amiss. IQ and achievement simply are not measured on the “potential scale.” They are measured with standard scores, which are transformed deviations from a population mean. Because of this about half of all people have academic achievement scores that are higher than their own IQ. There is nothing wrong with this.


[1] And not every child with the same achievement scores will have the same IQ.

This post is an excerpt from:

Schneider, W. J. (2013). Principles of assessment of aptitude and achievement. In D. Saklofske, C. Reynolds, & V. Schwean (Eds.), Oxford handbook of psychological assessment of children and adolescents (pp. 286–330). New York: Oxford

Standard
Cognitive Assessment, Principles of assessment of aptitude and achievement

How to Assess Aptitudes If You Are a Mono-g-ist

For the mono-g-ist, the assessment of aptitudes is rather simple: measure g and be done with it. Other abilities may have a little predictive validity beyond g, but not enough to make it worth all the additional effort needed (Glutting, Watkins, Konold, & McDermott, 2006). This advice is simple enough, but how does one measure g well?

The first step is to select a set of highly g-loaded tests. The term highly g-loaded simply means to correlate strongly with statistical g. This raises an important question. If the existence of g is in doubt, how can we know if a test correlates with it? To the poly-G-ist, this might sound like studying the environmental impact of unicorn overpopulation. The problem is resolved by distinguishing between two different meanings of g. First, there is theoretical g, a hypothetical entity thought to have causal relationships with many aspects of daily functioning. This is the g that many doubt exists. Second, there is statistical g, which is not in question. It is typically defined by a statistical procedure called factor analysis (or a closely related procedure called principal components analysis). All scholars agree that statistical g can be extracted from a correlation matrix and that virtually all cognitive tests correlate positively with it to some degree. Thus, a g-hating poly-G-ist can talk about a g-loaded test without fear of self-contradiction. A highly g-loaded test simply has a strong correlation with statistical g. A highly g-loaded test, then, is by definition highly correlated with many other tests. This means that it is probably a good predictor of academic achievement tests, which are, for the most part, also highly g-loaded. A cognitive test with a low g-loading (e.g., WJ III Planning or WISC-IV Cancellation) does not correlate with much of anything except itself. Mono-g-ists avoid such tests whenever possible (but Poly-g-ists love them—if they can be found to be uniquely predictive of an important outcome).

The second step to estimate g is to make sure that the highly g-loaded tests you have selected are as different from each other as possible in terms of item content and response format. To select highly similar tests (e.g., more than one vocabulary test) will contaminate the estimate of g with the influence of narrow abilities, which, to the mono-g-ist, are unimportant.

Fortunately, cognitive ability test publishers have saved us much trouble and have assembled such collections of subtests to create composite scales that can be used to estimate g. Such composite scores go by many different names[1] but I will refer to them as IQ scores. These operational measures of g tend to correlate strongly with one another, mostly in the range of 0.70 to 0.80 but sometimes as low as 0.60 or as high as 0.90 (Kamphaus, 2005). Even so, they are not perfectly interchangeable. If both tests have the traditional mean of 100 and standard deviation of 15, the probability that the two scores will be within a certain range of each other can be found in the Table below.[2] For example, for a person who takes two IQ tests that are correlated at 0.80, there is a 29% chance that the IQ scores will differ by 10 points or more.

What is the probability that a person’s scores on two IQ tests will differ by the specified amount or more?

 

Probability if the IQ tests correlate at r =

Difference

0.60

0 .70

0 .80

0 .90

> 5

0.71

0.67

0.60

0.46

> 10

0.46

0.39

0.29

0.14

> 15

0.26

0.20

0.11

0.03

> 20

0.14

0.09

0.03

0.003

> 25

0.06

0.03

0.01

0.0002

 

If a person has two or more IQ scores that differ by a wide margin, it does not necessarily mean that something is wrong. To insist on perfect correlations between IQ tests is not realistic and not fair.[3] However, when a child has taken two IQ tests recently and the scores are different, it raises the question of which IQ is more accurate.


[1] Full Scale IQ (WISC-IV, SB5, UNIT), Full Scale (Leiter-R, CAS), General Intellectual Ability (WJ III), General Conceptual Ability (DAS-II), Composite Intelligence Index (RIAS), Composite Intelligence Scale (KAIT), Fluid-Crystallized Index (KABC-II), and many others.

[2] This table was created by calculating the standard deviation of the difference between two correlated normally distributed variables and then applying the cumulative probability density function of the normal curve.

[3] “If I were to command a general to turn into a seagull, and if the general did not obey, that would not be the general’s fault. It would be mine.” – Antoine de Saint-Exupéry, The Little Prince

This post is an excerpt from:

Schneider, W. J. (2013). Principles of assessment of aptitude and achievement. In D. Saklofske, C. Reynolds, & V. Schwean (Eds.), Oxford handbook of psychological assessment of children and adolescents (pp. 286–330). New York: Oxford

Standard
Cognitive Assessment, Principles of assessment of aptitude and achievement

Mean-spirited Mono-g-ists vs. Muddleheaded Poly-G-ists

I hate the impudence of a claim that in fifty minutes you can judge and classify a human being’s predestined fitness in life. I hate the pretentiousness of the claim. I hate the abuse of scientific method which it involves. I hate the sense of superiority which it creates, and the sense of inferiority which it imposes.

– Walter Lippmann, in a 1923 essay on Lewis Terman and the IQ testers

Most of us have uncritically taken it for granted that children who attend school eight or ten years without passing the fourth grade or surmounting long division, are probably stupider than children who lead their classes into high school at twelve years and into college at sixteen. Mr. Lippmann contends that we can’t tell anything about how intelligent either one of these children is until he has lived out his life. Therefore, for a lifetime at least, Mr. Lippmann considers his position impregnable!

– Lewis Terman, in response to Walter Lippmann

Spearman’s (1904) little g caused a big stir when it was first proposed and has, for over a century now, been disrupting the natural state of harmony that would otherwise prevail amongst academics. Many a collegial tie has been severed, many a friendship has soured, perhaps even engagements broken off and marriages turned into dismal, loveless unions because of the rancor this topic provokes. I have seen otherwise mild-mannered professors in tweed jackets come to blows in bars over disagreements about g.1

It all began when Spearman observed that mental abilities that he measured were all positively correlated. This observation has been replicated by thousands of studies. No one who is familiar with this gigantic body of evidence doubts that essentially all cognitive abilities are positively correlated. This statistical regularity is typically referred to as the positive manifold.2 You could become an academic superstar (i.e., admired by six or seven other academics) if you were to find a pair of cognitive abilities that are negatively correlated with each other. So far, no one has.3 Thus, everyone in the know agrees with Spearman on this point. What some people hate is his explanation for it.

Spearman believed (and invented some very fancy statistical procedures to support his argument)4 that abilities are correlated because all abilities are influenced by a common cause, g (general intelligence). Spearman was careful to note that he did not know for certain what g was but was not shy about speculating about its nature. He thought that it might be a kind of mental energy and that some people had a lot of it and some had very little.

The essential points of contention in the byzantine quarrels between Spearmanian mono-g-ists and anti-Spearmanian poly-G-ists5 have not changed much over the decades. There is some diversity within both groups but the lines between them are fairly clear. Not only do the mono-g-ists insist that g be acknowledged as an ability, but they believe that it should be esteemed above all others. Some appear to believe that no ability other than g even matters. Some poly-G-ists will grant that g exists but deem it inconsequential compared to the myriad other abilities that influence the course of a human life. Other poly-G-ists deny that g exists and are disgusted by the very idea of it.

It turns out that these two groups are not merely on opposite sides of an intellectual debate—they are members of different tribes. They speak different dialects, vote for different candidates, and pray to different gods. Their heroic tales emphasize different virtues and their foundation myths offer radically different but still internally consistent explanations of how the world works. If you think that the matter will be settled by accumulating more data, you have not been paying attention for the last hundred years.

Poly-G-ists do not merely believe that mono-g-ists are mistaken but that they are mean-spirited, perhaps evil, or at the very least, Republicans. In their view, the course of human history can be summed up in this manner:

Since the dawn of time up to the beginning of the twentieth century, humans lived in a paradise of loving harmony and high self esteem. Then Spearman invented g and ruined everything. Previously, Live White Males (for back then they were not yet dead) had been content to be equal to everyone else and were really rather decent fellows. However, many of them were corrupted by Spearman’s flattery and convinced themselves they had more g than other people. The deceived began to call themselves Fascists and went around disempowering people with nasty labels. Though eventually defeated by George Lincoln King, Jr. in the Civil Liberties War, Fascists still wield influence via college aptitude tests. If we rid the world of all standardized tests, people will no longer label one another, low self esteem will be eradicated, and a new Utopia will be established.

On the other side, mono-g-ists know that poly-G-ists have seen the same data and read the same studies as they have. They believe that the poly-G-ists are simply too muddled-headed to understand the data, too blinded by their ideological wishes to see the world as it is, or too fearful of social consequences to proclaim publically that the emperor has no clothes. In the short epic tragedy, the Spearmaniad, mono-g-ists find this account of how things came to be:

In the dark mists of prehistory, life was nasty, brutish, and short. Worse, it was almost impossible to tell the common folk from their betters and some very mediocre presidents were elected. When the goddess of mathematics looked upon the chaos of the world, she cried crystal tears of pure correlation coefficients. Now Spearman was a mighty statistician and he gathered the correlations up and arranged them in matrices. From these matrices he invented factor analysis, from which flowed new knowledge: first IQ tests, then writing, then the wheel. All that was done with factor analysis was beautiful, virtuous, and true. But the brief flowering of civilization that followed was ended when a cabal of ignorant do-gooders objected to the use of IQ tests, presumably because they (or their ugly, talentless children) performed poorly on them. We now stand on the brink of disaster. Giving up IQ tests will be followed immediately by a rapid descent into barbarism. College aptitude tests may postpone or soften the impact of this catastrophe for a little while but cannot avert it entirely.

The theoretical status of g will not cease to be controversial until something extraordinary happens to the field. I do not pretend to know what this might be. Maybe a breakthrough from biology will resolve the matter. Maybe divine intervention. Until then, I feel no need to join either tribe. I will remain agnostic and I will not get too excited the next time really smart people eagerly announce that finally, once and for all, they have proof that the other side is wrong. This has happened too many times before.

1 Okay…not really…but I have seen some very sarcastic emails exchanged on professional listservs!

2 The term positive manifold at one time had a precise meaning drawn from mathematics, meaning that the correlation matrix of all tests was nearly a rank 1 matrix (i.e., a correlation matrix implied by Spearman’s Two-Factor Theory). As it is currently used, the term simply means that the correlation matrix consists entirely of positive correlations.

3 Not, at least, in more than one large representative sample. From time to time, someone gets an occasional negative correlation but other researchers have trouble replicating the finding.

4 Hence, the intense controversy is not merely a result of lies and damned lies.

5 So named because in perhaps the most important theory to deny the existence of g (Horn & Cattell, 1964), the most important abilities all have names that begin with a capital letter G (Gf, Gc, Gv, and so forth).

This post is an excerpt from:

Schneider, W. J. (2013). Principles of assessment of aptitude and achievement. In D. Saklofske, C. Reynolds, & V. Schwean (Eds.), Oxford handbook of psychological assessment of children and adolescents (pp. 286–330). New York: Oxford University Press.

Standard
Cognitive Assessment, Principles of assessment of aptitude and achievement

Aptitudes and Achievement: Definitions, Distinctions, and Difficulties

Achievement typically refers to knowledge and skills that are formally taught in academic settings. However, this definition of achievement can be broadened to include any ability that is valued and taught in a particular cultural setting (e.g., hunting, dancing, or computer programming). Aptitude refers to an individual’s characteristics that indicate the potential to develop a culturally valued ability, given the right circumstances. The difference between aptitudes and achievement at the definitional level is reasonably clear. However, at the measurement level, the distinction becomes rather murky.

Potential, which is latent within a person, is impossible to observe directly. It must be inferred by measuring characteristics that either are typically associated with an ability or are predictive of the future development of the ability. Most of the time, aptitude is assessed by measuring abilities that are considered to be necessary precursors of achievement. For example, children who understand speech have greater aptitude for reading comprehension than do children who do not understand speech. Such precursors may themselves be a form of achievement. For example, it is possible for researchers to consider students’ knowledge of history as an outcome variable that is intrinsically valuable. However, some researchers may measure knowledge of history as a predictor of being able to construct a well-reasoned essay on politics. Thus, aptitude and achievement tests are not distinguished by their content but by how they are used. If we use a test to measure current mastery of a culturally valued ability, it is an achievement test. If we use a test to explain or forecast mastery of a culturally valued ability, it is an aptitude test.

IQ tests are primarily used as aptitude tests. However, an inspection of the contents of most IQ tests reveals that many test items could be repurposed as items in an achievement test (e.g., vocabulary, general knowledge, and mental arithmetic items). Sometimes the normal roles of reading tests and IQ tests are reversed, such as when neuropsychologists estimate loss of function following a brain injury by comparing current IQ to performance on a word-reading test.

A simple method to distinguish between aptitude and achievement is to ask, “Do I care about whether a child has the ability measured by this test because it is inherently valuable or because it is associated with some other ability (the one that I actually care about)?” Most people want children to be able to comprehend what they read. Thus, reading tests are typically achievement tests. Most people are not particularly concerned about how well children can reorder numbers and letters in their heads. Thus, the WISC-IV Number-Letter Sequencing subtest is typically used as an aptitude test, presumably because the ability it measures is a necessary component of being able to master algebra, program computers, follow the chain of logic presented by debating candidates, and other skills that people in our culture care about.

This post is an excerpt from:

Schneider, W. J. (2013). Principles of assessment of aptitude and achievement. In D. Saklofske, C. Reynolds, & V. Schwean (Eds.), Oxford handbook of psychological assessment of children and adolescents (pp. 286–330). New York: Oxford University Press.

Standard