Cognitive Assessment, Principles of assessment of aptitude and achievement

How to Assess Aptitudes If You Are a Mono-g-ist

For the mono-g-ist, the assessment of aptitudes is rather simple: measure g and be done with it. Other abilities may have a little predictive validity beyond g, but not enough to make it worth all the additional effort needed (Glutting, Watkins, Konold, & McDermott, 2006). This advice is simple enough, but how does one measure g well?

The first step is to select a set of highly g-loaded tests. The term highly g-loaded simply means to correlate strongly with statistical g. This raises an important question. If the existence of g is in doubt, how can we know if a test correlates with it? To the poly-G-ist, this might sound like studying the environmental impact of unicorn overpopulation. The problem is resolved by distinguishing between two different meanings of g. First, there is theoretical g, a hypothetical entity thought to have causal relationships with many aspects of daily functioning. This is the g that many doubt exists. Second, there is statistical g, which is not in question. It is typically defined by a statistical procedure called factor analysis (or a closely related procedure called principal components analysis). All scholars agree that statistical g can be extracted from a correlation matrix and that virtually all cognitive tests correlate positively with it to some degree. Thus, a g-hating poly-G-ist can talk about a g-loaded test without fear of self-contradiction. A highly g-loaded test simply has a strong correlation with statistical g. A highly g-loaded test, then, is by definition highly correlated with many other tests. This means that it is probably a good predictor of academic achievement tests, which are, for the most part, also highly g-loaded. A cognitive test with a low g-loading (e.g., WJ III Planning or WISC-IV Cancellation) does not correlate with much of anything except itself. Mono-g-ists avoid such tests whenever possible (but Poly-g-ists love them—if they can be found to be uniquely predictive of an important outcome).

The second step to estimate g is to make sure that the highly g-loaded tests you have selected are as different from each other as possible in terms of item content and response format. To select highly similar tests (e.g., more than one vocabulary test) will contaminate the estimate of g with the influence of narrow abilities, which, to the mono-g-ist, are unimportant.

Fortunately, cognitive ability test publishers have saved us much trouble and have assembled such collections of subtests to create composite scales that can be used to estimate g. Such composite scores go by many different names[1] but I will refer to them as IQ scores. These operational measures of g tend to correlate strongly with one another, mostly in the range of 0.70 to 0.80 but sometimes as low as 0.60 or as high as 0.90 (Kamphaus, 2005). Even so, they are not perfectly interchangeable. If both tests have the traditional mean of 100 and standard deviation of 15, the probability that the two scores will be within a certain range of each other can be found in the Table below.[2] For example, for a person who takes two IQ tests that are correlated at 0.80, there is a 29% chance that the IQ scores will differ by 10 points or more.

What is the probability that a person’s scores on two IQ tests will differ by the specified amount or more?


Probability if the IQ tests correlate at r =



0 .70

0 .80

0 .90

> 5





> 10





> 15





> 20





> 25






If a person has two or more IQ scores that differ by a wide margin, it does not necessarily mean that something is wrong. To insist on perfect correlations between IQ tests is not realistic and not fair.[3] However, when a child has taken two IQ tests recently and the scores are different, it raises the question of which IQ is more accurate.

[1] Full Scale IQ (WISC-IV, SB5, UNIT), Full Scale (Leiter-R, CAS), General Intellectual Ability (WJ III), General Conceptual Ability (DAS-II), Composite Intelligence Index (RIAS), Composite Intelligence Scale (KAIT), Fluid-Crystallized Index (KABC-II), and many others.

[2] This table was created by calculating the standard deviation of the difference between two correlated normally distributed variables and then applying the cumulative probability density function of the normal curve.

[3] “If I were to command a general to turn into a seagull, and if the general did not obey, that would not be the general’s fault. It would be mine.” – Antoine de Saint-Exupéry, The Little Prince

This post is an excerpt from:

Schneider, W. J. (2013). Principles of assessment of aptitude and achievement. In D. Saklofske, C. Reynolds, & V. Schwean (Eds.), Oxford handbook of psychological assessment of children and adolescents (pp. 286–330). New York: Oxford

Cognitive Assessment, Principles of assessment of aptitude and achievement

Mean-spirited Mono-g-ists vs. Muddleheaded Poly-G-ists

I hate the impudence of a claim that in fifty minutes you can judge and classify a human being’s predestined fitness in life. I hate the pretentiousness of the claim. I hate the abuse of scientific method which it involves. I hate the sense of superiority which it creates, and the sense of inferiority which it imposes.

– Walter Lippmann, in a 1923 essay on Lewis Terman and the IQ testers

Most of us have uncritically taken it for granted that children who attend school eight or ten years without passing the fourth grade or surmounting long division, are probably stupider than children who lead their classes into high school at twelve years and into college at sixteen. Mr. Lippmann contends that we can’t tell anything about how intelligent either one of these children is until he has lived out his life. Therefore, for a lifetime at least, Mr. Lippmann considers his position impregnable!

– Lewis Terman, in response to Walter Lippmann

Spearman’s (1904) little g caused a big stir when it was first proposed and has, for over a century now, been disrupting the natural state of harmony that would otherwise prevail amongst academics. Many a collegial tie has been severed, many a friendship has soured, perhaps even engagements broken off and marriages turned into dismal, loveless unions because of the rancor this topic provokes. I have seen otherwise mild-mannered professors in tweed jackets come to blows in bars over disagreements about g.1

It all began when Spearman observed that mental abilities that he measured were all positively correlated. This observation has been replicated by thousands of studies. No one who is familiar with this gigantic body of evidence doubts that essentially all cognitive abilities are positively correlated. This statistical regularity is typically referred to as the positive manifold.2 You could become an academic superstar (i.e., admired by six or seven other academics) if you were to find a pair of cognitive abilities that are negatively correlated with each other. So far, no one has.3 Thus, everyone in the know agrees with Spearman on this point. What some people hate is his explanation for it.

Spearman believed (and invented some very fancy statistical procedures to support his argument)4 that abilities are correlated because all abilities are influenced by a common cause, g (general intelligence). Spearman was careful to note that he did not know for certain what g was but was not shy about speculating about its nature. He thought that it might be a kind of mental energy and that some people had a lot of it and some had very little.

The essential points of contention in the byzantine quarrels between Spearmanian mono-g-ists and anti-Spearmanian poly-G-ists5 have not changed much over the decades. There is some diversity within both groups but the lines between them are fairly clear. Not only do the mono-g-ists insist that g be acknowledged as an ability, but they believe that it should be esteemed above all others. Some appear to believe that no ability other than g even matters. Some poly-G-ists will grant that g exists but deem it inconsequential compared to the myriad other abilities that influence the course of a human life. Other poly-G-ists deny that g exists and are disgusted by the very idea of it.

It turns out that these two groups are not merely on opposite sides of an intellectual debate—they are members of different tribes. They speak different dialects, vote for different candidates, and pray to different gods. Their heroic tales emphasize different virtues and their foundation myths offer radically different but still internally consistent explanations of how the world works. If you think that the matter will be settled by accumulating more data, you have not been paying attention for the last hundred years.

Poly-G-ists do not merely believe that mono-g-ists are mistaken but that they are mean-spirited, perhaps evil, or at the very least, Republicans. In their view, the course of human history can be summed up in this manner:

Since the dawn of time up to the beginning of the twentieth century, humans lived in a paradise of loving harmony and high self esteem. Then Spearman invented g and ruined everything. Previously, Live White Males (for back then they were not yet dead) had been content to be equal to everyone else and were really rather decent fellows. However, many of them were corrupted by Spearman’s flattery and convinced themselves they had more g than other people. The deceived began to call themselves Fascists and went around disempowering people with nasty labels. Though eventually defeated by George Lincoln King, Jr. in the Civil Liberties War, Fascists still wield influence via college aptitude tests. If we rid the world of all standardized tests, people will no longer label one another, low self esteem will be eradicated, and a new Utopia will be established.

On the other side, mono-g-ists know that poly-G-ists have seen the same data and read the same studies as they have. They believe that the poly-G-ists are simply too muddled-headed to understand the data, too blinded by their ideological wishes to see the world as it is, or too fearful of social consequences to proclaim publically that the emperor has no clothes. In the short epic tragedy, the Spearmaniad, mono-g-ists find this account of how things came to be:

In the dark mists of prehistory, life was nasty, brutish, and short. Worse, it was almost impossible to tell the common folk from their betters and some very mediocre presidents were elected. When the goddess of mathematics looked upon the chaos of the world, she cried crystal tears of pure correlation coefficients. Now Spearman was a mighty statistician and he gathered the correlations up and arranged them in matrices. From these matrices he invented factor analysis, from which flowed new knowledge: first IQ tests, then writing, then the wheel. All that was done with factor analysis was beautiful, virtuous, and true. But the brief flowering of civilization that followed was ended when a cabal of ignorant do-gooders objected to the use of IQ tests, presumably because they (or their ugly, talentless children) performed poorly on them. We now stand on the brink of disaster. Giving up IQ tests will be followed immediately by a rapid descent into barbarism. College aptitude tests may postpone or soften the impact of this catastrophe for a little while but cannot avert it entirely.

The theoretical status of g will not cease to be controversial until something extraordinary happens to the field. I do not pretend to know what this might be. Maybe a breakthrough from biology will resolve the matter. Maybe divine intervention. Until then, I feel no need to join either tribe. I will remain agnostic and I will not get too excited the next time really smart people eagerly announce that finally, once and for all, they have proof that the other side is wrong. This has happened too many times before.

1 Okay…not really…but I have seen some very sarcastic emails exchanged on professional listservs!

2 The term positive manifold at one time had a precise meaning drawn from mathematics, meaning that the correlation matrix of all tests was nearly a rank 1 matrix (i.e., a correlation matrix implied by Spearman’s Two-Factor Theory). As it is currently used, the term simply means that the correlation matrix consists entirely of positive correlations.

3 Not, at least, in more than one large representative sample. From time to time, someone gets an occasional negative correlation but other researchers have trouble replicating the finding.

4 Hence, the intense controversy is not merely a result of lies and damned lies.

5 So named because in perhaps the most important theory to deny the existence of g (Horn & Cattell, 1964), the most important abilities all have names that begin with a capital letter G (Gf, Gc, Gv, and so forth).

This post is an excerpt from:

Schneider, W. J. (2013). Principles of assessment of aptitude and achievement. In D. Saklofske, C. Reynolds, & V. Schwean (Eds.), Oxford handbook of psychological assessment of children and adolescents (pp. 286–330). New York: Oxford University Press.

Cognitive Assessment, Principles of assessment of aptitude and achievement

Aptitudes and Achievement: Definitions, Distinctions, and Difficulties

Achievement typically refers to knowledge and skills that are formally taught in academic settings. However, this definition of achievement can be broadened to include any ability that is valued and taught in a particular cultural setting (e.g., hunting, dancing, or computer programming). Aptitude refers to an individual’s characteristics that indicate the potential to develop a culturally valued ability, given the right circumstances. The difference between aptitudes and achievement at the definitional level is reasonably clear. However, at the measurement level, the distinction becomes rather murky.

Potential, which is latent within a person, is impossible to observe directly. It must be inferred by measuring characteristics that either are typically associated with an ability or are predictive of the future development of the ability. Most of the time, aptitude is assessed by measuring abilities that are considered to be necessary precursors of achievement. For example, children who understand speech have greater aptitude for reading comprehension than do children who do not understand speech. Such precursors may themselves be a form of achievement. For example, it is possible for researchers to consider students’ knowledge of history as an outcome variable that is intrinsically valuable. However, some researchers may measure knowledge of history as a predictor of being able to construct a well-reasoned essay on politics. Thus, aptitude and achievement tests are not distinguished by their content but by how they are used. If we use a test to measure current mastery of a culturally valued ability, it is an achievement test. If we use a test to explain or forecast mastery of a culturally valued ability, it is an aptitude test.

IQ tests are primarily used as aptitude tests. However, an inspection of the contents of most IQ tests reveals that many test items could be repurposed as items in an achievement test (e.g., vocabulary, general knowledge, and mental arithmetic items). Sometimes the normal roles of reading tests and IQ tests are reversed, such as when neuropsychologists estimate loss of function following a brain injury by comparing current IQ to performance on a word-reading test.

A simple method to distinguish between aptitude and achievement is to ask, “Do I care about whether a child has the ability measured by this test because it is inherently valuable or because it is associated with some other ability (the one that I actually care about)?” Most people want children to be able to comprehend what they read. Thus, reading tests are typically achievement tests. Most people are not particularly concerned about how well children can reorder numbers and letters in their heads. Thus, the WISC-IV Number-Letter Sequencing subtest is typically used as an aptitude test, presumably because the ability it measures is a necessary component of being able to master algebra, program computers, follow the chain of logic presented by debating candidates, and other skills that people in our culture care about.

This post is an excerpt from:

Schneider, W. J. (2013). Principles of assessment of aptitude and achievement. In D. Saklofske, C. Reynolds, & V. Schwean (Eds.), Oxford handbook of psychological assessment of children and adolescents (pp. 286–330). New York: Oxford University Press.

Cognitive Assessment, Death Penalty

Execution by Miscalculation

Some people facing execution have had their IQ estimated multiple times. If an ability is measured more than once, there is an established way to combine all of the available information. These procedures are used in every psychological test ever published.

Some people who should not be eligible for execution are deemed eligible because psychologists are not combining IQ scores properly. The current standard of averaging the scores (or taking the median) is wrong. The correct procedure is the one we use in every other domain of psychological measurement, including in the computation of a single IQ score. When a person has been tested multiple times, this same procedure should be used to estimate a person’s IQ.

Those seeking detailed explanations of the correct calculation procedures will find them in this paper and in the second half of this video.

The correct calculation procedures are also explained on pages 289–291 of this chapter:

Schneider, W. J. (2013). Principles of assessment of aptitude and achievement. In D. Saklofske, C. Reynolds, & V. Schwean (Eds.), Oxford handbook of psychological assessment of children and adolescents (pp. 286–330). New York: Oxford University Press.

Cognitive Assessment, Death Penalty, Psychometrics, Statistics, Tutorial, Uncategorized, Video

Video Tutorial: Misunderstanding Regression to the Mean

One of the most widely misunderstood statistical concepts is regression to the mean. In this video tutorial, I address common false beliefs about regression to the mean and answer the following questions:

  1. What is regression to the mean?
  2. Do variables become less variable each time they are measured?
  3. Does regression to the mean happen all the time or just in certain situations?
  4. Does repeated testing cause people to come closer and closer to the mean?
  5. How is regression to the mean relevant in death penalty cases?

Cognitive Assessment, My Software & Spreadsheets, Psychometrics, Psychometrics from the Ground Up, Tutorial, Uncategorized, Video

Psychometrics from the Ground Up 9: Standard Scores and Why We Need Them

In this video tutorial, I explain why we have standard scores, why there are so many different kinds of standard scores, and how to convert between any two types of standard scores.

Here is my Excel spreadsheet that converts any type of standard score to any other type.

Cognitive Assessment, Research Link

Meta-analysis confirms association between premorbid IQ loss and schizophrenia onset

This meta-analysis of adolescents and young adults given IQ tests multiple times, the risk of schizophrenia increases about 55% for each standard deviation in IQ lost. This finding should be framed appropriately, however. The risk of schizophrenia is low and remains low, even when IQ drops. Most people in whom IQ drops do not go on to develop schizophrenia. However, the risk of developing schizophrenia is higher among people whose IQ drops significantly.