Assessing Psyche, Engaging Gauss, Seeking Sophia

Mental Factors of No Importance

W. Joel Schneider — Tue, 09 Jun 2020 05:33:01 +0000

There are few things I enjoy more in a scientific document than a particularly punchy patch of purple prose. In 1939, Truman Kelley had an important insight to share, and I love it that he got his point across by going all in and over the top:

The first earmark of an unimportant factor is its Venus-like birth from the meditations of a Spranger, the libido of a Freud, the hunch of an enthusiastic employment officer, or the dial of a differential analyzer. The second earmark is also Venus-like—the factor stands in virgin purity, untrammeled by clothes of doubt, untouched by considerations of probable error, unqualified by tentative endorsement. The third earmark is almost a consequent of the other two—the factor has never been put to work, it has never served the needs of man in school, in business, in social adaptation. Truly none of these earmarks detract from the beauty of the picture and possibly this Venus will scrub floors as well as win acclaim for harmony of proportion and then, indeed, we shall be blessed. But initially let us call a factor with these earmarks just a Venus-factor and not load it down or trust it with heavy work.
What is the heavy work that is to be done? We have eight million unemployed, another many millions feeling thwarted and believing that could they but find a channel of expression more fitted to their talents, their life would be richer and society of which they are a part would be the better. The heavy work resting squarely upon the shoulders of the typologist, of the mental factorist, of the character analyst, and upon the comprehensiveness, accuracy, and analytical nature of the mental measures that he uses, is to render aid in the mental and social adjustments necessary to alleviate the thwartings mentioned.
This heavy work must not be encumbered with trivial mental factors. Let me suggest a Factor-of-no-importance that might be derived by the method of statistical analysis of test data. If tests of taste sensitivity to para-ethoxy-phenol-thio-carbamide were included in a series of tests to be factorized by matrix methods I have no doubt that a taste trait would evolve as an independent factor. Certainly it is real, psychologically real, perhaps genetically real, but still in comparison with those mental, sensory, and motor things that underlie the adequate adjustment of individuals in the society in which we live it certainly is a Factor-of-no-importance. I can, by a vigorous stretch of the imagination, conceive of a society in which such a factor would be important. I do not object to students spending their time upon such factors with the hope of discovering something of genetic or other importance, but I do insist that evidence of existence of a factor be not cited as evidence that it is important in the meeting of pressing guidance and social problems. (pp. 140–141)
Kelley, T. L. (1939). Mental factors of no importance. Journal of Educational Psychology, 30(2), 139–142. https://doi.org/10.1037/h0056336

In his 1993 masterwork, Human Cognitive Abilities, A Survey of Factor-Analytic Studies, John Carroll occasionally noted that some of the factors he identified had no known utility at the time of writing. To prevent endless proliferation of useless ability constructs, Kevin McGrew and I (Schneider & McGrew, 2018) proposed that new ability constructs should meet the following criteria:

The content domain of a new ability must be laid out clearly.
The new ability must be measurable with performance tests using multiple test paradigms.
Measures of the new ability must demonstrate convergent and discriminant validity when measured alongside other abilities.
Measures of the new ability must demonstrate incremental validity over measures of other more established abilities when predicting important outcomes.
The new ability construct should be linked plausibly to specific neurological functions.
The new ability construct should be linked plausibly to functions that evolved to help humans survive and reproduce.

These criteria were inspired by similar proposals by Raymond Cattell, Howard Gardner, and John Mayer and his colleagues.

A systematic and thorough review of the evidence of each ability construct in CHC theory is sorely needed. Some of the broad CHC abilities have been vetted thoroughly enough such that they clearly meet all these criteria, but others have not. Most narrow abilities have long languished in the preliminary stages of construct validation. So much work to do…

Defining Intelligence: “It’s a trap!”

W. Joel Schneider — Sun, 19 Apr 2020 19:29:18 +0000

It is a favourite debating ploy in discussions about human intelligence to ask for a definition of the construct. One meaning of ‘define’ in the Oxford English Dictionary is ‘give the exact meaning of’. If differential psychologists are daft enough to attempt this, they will find they have been tricked into delivering a hostage to fortune, the premature issue will be rent by inquisitors. (p. 1)
Deary, I.J. (2000). Looking down on human intelligence: From psychometrics to the brain. Oxford University Press.

Deary advises intelligence researchers to instead try to mark the boundaries of the concept of intelligence. Some scholars in other disciplines also have ambivalent relationships with exact definitions. Mike Brown, who discovered several “dwarf planets” and many other large objects in our solar system beyond Pluto, explained why he does not take seriously the International Astronomical Union’s official definition of the term planet that excluded Pluto:

In the entire field of astronomy, there is no word other than planet that has a precise, lawyerly definition, in which certain criteria are specifically enumerated. Why does planet have such a definition but star, galaxy, and giant molecular cloud do not? Because in astronomy, as in most sciences, scientists work by concepts rather than by definitions. The concept of a star is clear; a star is a collection of gas with fusion reactions in the interior giving off energy. A galaxy is a large, bound collection of stars. A giant molecular cloud is a giant cloud of molecules. The concept of a planet—in the eight-planet solar system—is equally simple to state. A planet is a one of a small number of bodies that dominate a planetary system. That is a concept, not a definition. How would you write that down in a precise definition?
I wouldn’t. Once you write down a definition with lawyerly precision, you get the lawyers involved in deciding whether or not your objects are planets. Astronomers work in concepts. We rarely call in the attorneys for adjudication. (p. 242)
Brown, M. (2012). How I killed Pluto and why it had it coming. Spiegel & Grau.

Assessment Report Template

W. Joel Schneider — Tue, 26 Feb 2019 18:06:38 +0000

Today at NASP 2019, I gave two workshops on psychological assessment report writing. Here are links to two complete reports:

Liam

Renaldo

Caveat Observator: On the Seductive Validity of Behavioral Observations

W. Joel Schneider — Thu, 20 Dec 2018 13:08:20 +0000

Seeing is believing, but first impressions do not tell the whole truth.

Well, who you gonna believe, me or your own eyes?
Chico Marx in Duck Soup (1933)

The sirens of observed behavior do not seduce us to a watery grave; they sing of truths so satisfying that we cease to sail. At port, we tell tales of whole oceans after having seen a single cove just outside the harbor.

It might seem like direct observation would be the final authority that trumps all other forms of evidence. However, there are reliability and validity concerns about direct observation that are every bit as serious as those associated with ability tests, rating scales, and interviews (Meier, 1994). It is not that observed behavior gives false information, but the true information it provides is so vivid that other truths are ignored, and our interpretation is incomplete.

Even though we know that behavior can vary considerably from day to day, it is rare for examiners to observe examinees for more than an hour or two in naturalistic settings (e.g., classrooms, playgrounds, and group homes). Worse, most direct observation occurs in the unnaturalistic setting of the testing environment. The testing environment pulls for particular sets of temporary behaviors that are easily mistaken for persistent personality traits. Even those of us who intellectually appreciate the allure of the fundamental attribution error (Ross, 1977) find it hard to resist the urge to overgeneralize that which we have observed with our own eyes.

We have reason to reserve judgment when an examinee does something unusual in the testing environment because the testing environment is itself unusual. The testing environment differs from most other environments, in part because the interaction is most often one-to-one and thus more personal and focused than group interactions. The intense, unfailing attention of the typical examiner is a rather unusual experience for most people. Being assessed is a break from the examinee’s normal routine, which most examinees find to be quite interesting until the novelty wears off. In addition, the environment is carefully controlled to maximize the examinee’s attention and performance. In other words, the testing is designed to elicit the person’s optimal performance. Therefore, the observed behaviors may not be representative of a person’s typical behaviors in another setting, such as a chaotic home, a noisy classroom, or a competitive work environment.

If you believe that the observed test behaviors are indeed similar to those in the home, school, or workplace, you must confirm that this is the case with supplementary evidence. Direct observation is indispensable, but our best hope for accuracy is in a disciplined, systematic integration of all the available evidence.

Excerpt from pp. 103–104 of Schneider, W. J., Lichtenberger, E. O, Mather, N., & Kaufman, N. L. (2018). Essentials of Assessment Report Writing (2nd ed). Hoboken, NJ: Wiley.

Habitual Hedging Is Unnecessary, Unattractive, and Annoying

W. Joel Schneider — Mon, 10 Dec 2018 12:55:22 +0000

To escape criticism—do nothing, say nothing, be nothing.
Elbert Green Hubbard (1909, p. 38)

If you want to be a stickler about it, you can remind people in every statement you make of the deep-seated uncertainty of mortal existence. However, in everyday communication we only introduce doubt when there is reasonable doubt. If you ask a stranger for the time, and he tells you that it is 3:15, you thank him and move along. If he says, “It might be 3:15,” you still thank him, but you look around for someone else with a watch.

In much academic writing, clarity runs a poor second to invulnerability.
Richard Hugo (1992, p. 11)

Expressions of doubt exist for a reason. Suppose someone tells you that Shelby is angry with you. You must decide what to do with that information. Now suppose that someone tells you that Shelby might be angry with you. This information might lead to a different course of action. If the person is quite sure about Shelby’s anger but added “might” because of her philosophical stance that everything is uncertain, she is correct in what she said but incorrect in what she communicated. We rely on social conventions to communicate much that is unstated. If the public is not accustomed to the ways in which we introduce doubt into our sentences, we are miscommunicating. Suppose you write,

Her mother reported that Julia has a “severe peanut allergy.”

You might think the subtext of this sentence is “See how careful I am? I am telling you where I got all my information. Also, I’m not an allergist so it is not my place to say how severe the allergy is. Therefore, I am using Julia’s mother’s words instead of my own.” Many readers will understand that this is all we mean. However, to some readers, we might as well have written,

The “woman” who claims to be Julia’s mother asserted, without evidence, that Julia (if that is indeed her name) has a so-called peanut allergy, which, for reasons unspecified, was described as “severe.”

Why do we write reports with hyper-precise language? We want to be right … and to be respectful. We also want not to be wrong, not to be challenged, and, if we are wrong, not to be responsible. You never know when someone might sue you for saying that an allergy is severe when in fact it is only moderately severe. Steven Pinker (2014) observed,

Writers acquire the hedge habit to conform to the bureaucratic imperative that’s abbreviated as CYA, which I’ll spell out as Cover Your Anatomy. They hope it will get them off the hook, or at least allow them to plead guilty to a lesser charge, should a critic ever try to prove them wrong. …A classic writer counts on the common sense and ordinary charity of his readers, just as in everyday conversation we know when a speaker means “in general” or “all else being equal.” If someone tells you that Liz wants to move out of Seattle because it’s a rainy city, you don’t interpret him as claiming that it rains there twenty-four hours a day seven days a week just because he didn’t qualify his statement with relatively rainy or somewhat rainy. … An adversary who is unscrupulous enough to give the least charitable reading to an unhedged statement will find an opening to attack the writer in a thicket of hedged ones anyway. … It’s not that good writers never hedge their claims. It’s that their hedging is a choice, not a tic. (pp. 44–45)

Let’s start with an excessively hedged statement and then explore some alternatives:

Julia’s mother’s CBCL Externalizing score of 78 suggests that Julia may engage in antisocial behavior more often than her peers.

Suggests? May? These words were no doubt intended as a sign of respect for the uncertainty inherent in the assessment process, but they also reveal an assessment in limbo and only half completed. If the evaluator has no other information about Julia, then, yes, the CBCL Externalizing score does no more than suggest the presence of problems Julia may have. But to stop there means that the evaluator does not understand what rating scales are for.

Rating scales are tools for collecting information efficiently and can focus our investigation on areas of particular concern. However, nothing rating scales can tell us is trustworthy enough to mention in a report—unless it has been corroborated. Once her parents, her teachers, and Julia herself have told us that she has a long history of truancy, shoplifting, and fistfights, the score is beside the point. We base our interpretation on the totality of evidence, not on a particular score. A corroborated score might still tell us something about the rarity of the problem, but to insist on words like suggest bespeaks a perversely cautious epistemology.

The information, interpretations, and conclusions in a classically written report have been thoroughly vetted by the examiner and are verifiable—at least in theory—by anyone. For this reason, they are stated simply, directly, and without hedging. Opinions, predictions, and preferences are clearly labeled as such when necessary, but without compulsive hand-wringing. In this way, the writer shows respect for the reader’s competence in recognizing an opinion for what it is.

Remove Unnecessary Qualifications and Excessive Sourcing

Statement	Reason for Edit
~~If Julia’s mother’s recollection is accurate,~~ Julia was born 6 weeks premature.	If anyone is going to be accurate about such a matter, it is going to be Julia’s mother.
~~According to~~ Julia’s teacher~~, he~~ gives her extra incentives to stay focused on her seatwork.	There is no reason to doubt Julia’s teacher’s words here. The original wording suggests that Julia’s teacher might have lied, or at best, is confused.
~~The BASC-3 Self-Report of Personality indicates that~~ Julia ~~possibly~~ has high levels of anxiety.	Rating scales do not have enough authority to stand on their own. Your judgment cannot be outsourced to them. Once the interpretation has been properly confirmed, the reference to the rating scale as a source is superfluous.
~~Exposure therapy may help Julia manage her debilitating fear of dogs, but it is impossible to know for certain.~~ I recommend exposure therapy to help Julia manage her debilitating fear of dogs.	Almost anything may help Julia. What is your recommendation? There is no need to undermine confidence in your suggestions. It is widely understood that a recommendation is not a guarantee. If you are not ready to make a suggestion you can stand by, your assessment is not yet finished.

At first, the classic style seems overly bold, as if the writers present their opinions as immutable laws. There is legitimate cause for concern here, but the worry is overstated. It is easy to spot the difference between the clear, disinterested pronouncements of classic prose and the bloviation and bluster of pompous windbags. If there is anything that we social creatures are good at, it is recognizing self-promotion, especially when the self-promoter’s interests do not align with our own. Furthermore, there is no set of writing guidelines in the world that will stop pompous windbags from engaging in pompous windbaggery. Therefore, we might as well design our rules of decorum for sensible people of good will.

When there are lingering doubts about the accuracy of a statement in a report, you should gather more evidence until you can say something more definite. No one benefits from words parsed so carefully they are watered down to meaninglessness with mushy maybes, could be sometimes, and possibly some days. These doubt-inducing words are indispensable tools, to be sure, but they are to be used with skill and judgment instead of mechanically inserted in every statement.

Writing in the classic style gives the writer certain license to be clear and direct, but no license for high-handedness. This freedom to be direct in writing is paid for by scrupulous scientific modesty and soul-searching doubt during the assessment phase. Assessment is not a parlor trick in which we guess from minimal information all of the person’s deepest secrets. Rather, we work collaboratively with the person and then verify with all relevant parties whether a possible interpretation is true. Thus, a properly vetted interpretation will come as no surprise when it appears in a report. If despite best efforts, the report is found to have an interpretive error, the report can be amended.

Obviously, hedging is warranted if you expect the report to be included in a lawsuit. If you wish to adopt the classic style, eliminating unnecessary qualification and hedging, but you still want to play it safe, you can include in your report a blanket disclaimer in which you acknowledge the possibility of error and that your observations, conclusions, and recommendations are simply your best guesses rather than claims of absolute certainty.

Excerpt from pp. 37–40 of Schneider, W. J., Lichtenberger, E. O, Mather, N., & Kaufman, N. L. (2018). Essentials of Assessment Report Writing (2nd ed). Hoboken, NJ: Wiley.

Classic Prose Is Simple, Not Simplistic

W. Joel Schneider — Mon, 03 Dec 2018 08:31:02 +0000

Simple words, carefully arranged, stick in the memory and influence action long after they have been read. Let us consider three pithy one-liners written by masters of the classic style.

Marie de Rabutin‐Chantal, Madame de Sévigné (1626– 1696)

I fear nothing so much as a man who is witty all day long.

Here Madame de Sévigné jolts us into delightful awareness of a truth we have always felt but never articulated. Furthermore, she has shown us the great honor of trusting us to apply the appropriate scope to her generalization about the dangers of too much wit. To challenge her on her wording—that chronically witty men could not possibly frighten her more than ferocious beasts, incurable disease, and invading soldiers—breaks the spell of her obvious hyperbole and displeases the Madame.

François VI
Duc de La Rochefoucauld
(1613–1680)

The refusal of praise is but the wish to be praised twice.

With maximum efficiency and minimum effort, La Rochefoucauld performs verbal jujitsu on the excessively modest. Stop making yourself the center of attention, he says. Don’t be so awkward about letting people be nice to you. Just thank the person and be done with it.

Blaise Pascal
(1623–1662)

I have made this letter longer than usual because I lack the time to make it shorter.

Pascal’s oft-quoted apology could have been utterly forgettable (e.g., “Sorry about the long letter, but I did not have enough time to edit it properly.”). It achieved immortality because Pascal has skillfully led us to expect one thing and then surprises us with another. In this manner, a rather mundane observation—that editing for brevity is hard—feels fresh and insightful.

These examples of classic prose have a style of humor that does not belong in assessment reports, but they are nevertheless instructive. The three writers have noticed that even qualities that seem unambiguously positive—wit, modesty, and brevity—have hidden dangers, shortcomings, and costs. Assessment professionals, too, see the downsides of certain virtues and the hidden sense in what appear to be self-defeating behaviors. Similar to these masters of classic style, assessment professionals can make messages memorable with surprise, irony, and contrast:

Daniel is never comfortable, except when he is worrying. Worry helps him plan. Worry keeps him safe. To ask Daniel to stop worrying is to ask him to invite catastrophe.
Art and Lannie love each other so fiercely that 20 years of quarreling could not tear them apart.
Although Jackson intimidates other children, he is in some ways more afraid than they are. No one fears the bully more than the bully himself.
If Gina were more frightened of germs, she would not wash her hands so often. Her skin, rubbed raw from years of constant scrubbing, no longer protects her from infections.
For many years, procrastination has helped Karla be the productive person she is today. Procrastination may have its downsides, but it has been her partner in combating a worse problem: perfectionism. Her motto is “The task expands to fit the time allotted.” Only looming deadlines have had the power to focus her mind and reshuffle her priorities to work efficiently. Recently, however, this strategy has backfired dramatically …

It would strike the wrong tone if the entire report were ironic in this way, but a few memorable sentences might change a person’s life.

Excerpt from pp. 35–37 of Schneider, W. J., Lichtenberger, E. O, Mather, N., & Kaufman, N. L. (2018). Essentials of Assessment Report Writing (2nd ed). Hoboken, NJ: Wiley.

Why Do Assessment Reports Exist at All?

W. Joel Schneider — Mon, 26 Nov 2018 08:00:23 +0000

Think of the time and effort we could save if we simply did our assessments, gathered the relevant parties, and then had an engaging conversation about our findings. Why not let an automated transcript of the conversation serve as the permanent record of the assessment? Abandon all hope, ye who enter here. Even if the practice were feasible, it fundamentally misunderstands the nature of an assessment report.

What a hammer does for the fist, what pliers do for the grip, what a telescope does for the eye, writing does for the mind. Unaided, the mind can contemplate solutions to complex problems, but attention wanders and memories fade. Writing not only preserves our thoughts but also sharpens our thinking. By sequencing sound on durable paper, we can contemplate the products of our own minds from a higher vantage— and with a steady gaze. Our words, now external objects, can be revised, reshaped, refined, reorganized, and most important, revisited. As Susan Sontag (2000) observed, “what I write is smarter than I am. Because I can rewrite it.”

Think of writing not as a way to transmit a message but as a way to grow and cook a message. Writing is a way to end up thinking something you couldn’t have started out thinking. —Peter Elbow (1998, p. 15)

Excerpt from p. 30 of Schneider, W. J., Lichtenberger, E. O, Mather, N., & Kaufman, N. L. (2018). Essentials of Assessment Report Writing (2nd ed). Hoboken, NJ: Wiley.

Making Spirographs in R

W. Joel Schneider — Sat, 06 Oct 2018 15:21:10 +0000

I made spiro, an R package for creating animated spirographs. Check it out. I would appreciate any suggestions for improving it.

Spirograph Tutorial

My First Book! Essentials of Assessment Report Writing 2e

W. Joel Schneider — Wed, 19 Sep 2018 16:31:06 +0000

It’s official! The second edition of Essentials of Assessment Report Writing has been published. My co-authors and I worked hard to make sure every sentence was worth reading. We hope that our work helps professionals write reports that restore hope and inspire change in the lives of people who have found themselves overwhelmed by circumstance.

I am grateful to Alan and Nadeen Kaufman for the invitation to update and expand upon the first edition and to Liz Lichtenberger, Nancy Mather, and Nadeen Kaufman for welcoming me into their writing team. John Willis and Rita McCleary each contributed a chapter brimming with insight. We selected first-rate scholars and practitioners to contribute examples of great report writing along with annotations that let readers listen in on their report-writing process. Thank you Lisa King Chalukian, Robert Lichtenstein, Linda M. Fishman, Donna Goetz, Elaine Fletcher-Janzen, Christopher J. Nicholls, John M. Garruto, Alison Wilkinson-Smith, Jennie Kaufman Singer, and Susan Engi Raiford.

I wish I were one of those people who write with ease, but for me, every sentence is a wrestling match. I would still be pinned to the mat, with fading hopes of escape if my spouse, Renée Tobin, had not repeatedly made sacrifices in her own full-to-brim schedule to give me the gift of uninterrupted time and solitude. I am forever in awe or her.

A Review of the Receptive, Expressive & Social Communication Assessment—Elementary

W. Joel Schneider — Wed, 26 Oct 2016 17:18:21 +0000

The Receptive, Expressive & Social Communication Assessment–Elementary (RESCA-E) is a new and innovative measure of oral language abilities. When its publisher, ATP Assessments, asked me to classify the subtests of the RESCA-E according to their likely loadings on CHC Theory abilities, I billed them for the time it took me to do so. However, I was so impressed with the instrument, I thought that it merited a short review and some statistical investigations of its structure. The review was born of pure enthusiasm–I did not not bill APT Assessments for the many additional hours I spent performing statistical analyses, creating plots, and writing up the results. The review contains several features that cannot be displayed on this blogging platform (e.g., interactive 3D plots), but it can be seen in its entirety here.

The Composite Score Extremity Effect

W. Joel Schneider — Wed, 17 Feb 2016 18:13:04 +0000

When a person scores exactly 2 standard deviations below the mean on several tests, it is intuitive that the composite score that summarizes these scores should also be exactly 2 standard deviations below the mean. Out intuitions let us down in this case because in this case the composite score is lower than 2 standard deviations. I attempt to make this “composite score extremity effect” a little more intuitive in an Assessment Service Bulletin for the Woodcock-Johnson IV.

Schneider , W. J. (2016). Why Are WJ IV Cluster Scores More Extreme Than the Average of Their Parts? A Gentle Explanation of the Composite Score Extremity Effect (Woodcock-Johnson IV Assessment Service Bulletin No. 7). Itasca, IL: Houghton Mifflin Harcourt.

I thank Mark Ledbetter for the invitation to write the paper and support in the writing process, Erica LaForte for patiently editing a complex first draft down to a much more readable version, and Kevin McGrew for additional thoughtful comments and suggestions for improvement on the first draft.

The bulk of the paper is not mathematical. However, the first draft had a few bells and whistles like the animated graph below that shows how the composite score extremity effect is larger as the average correlation among the tests decreases and the number of tests in the composite increases.

Another plot that was originally animated shows what our best guess of a latent variable X if we have two indicators X₁ and X₂ that are both exactly 2 standard deviations below the mean. X₁ and X₂ correlate with each other at 0.64 and with X at 0.8. If we only know that X₁ = −2, our best guess is that X is −1.60. If we know that both X₁ and X₂ are −2, out best guess is that X is −1.95. Thus, our estimate is lower with 2 scores (−1.95) than with one score (−1.60).

WISC-V Expanded Composite Scores

W. Joel Schneider — Wed, 09 Sep 2015 01:47:59 +0000

I have long complained that making custom composite scores should not be difficult. The ability to combine scores as one wishes should be a feature of every scoring program for every cognitive battery.

No matter which tests I have given, I would like to be able to combine them into theoretically valid composite scores. For example, on the WISC-V, the Verbal Comprehension Index (VCI) consists of two subtest scores, Vocabulary and Similarities. However, the Information and Comprehension subtests measure verbal knowledge just as well as the other two tests. We should be able to combine them with the two VCI subtests to make a more valid estimate of verbal knowledge.

The good news is that the WISC-V now allows us to do just that: It now has two expanded composite scores:

Verbal Expanded Crystallized Index (VECI)
- Similarities
- Vocabulary
- Information
- Comprehension
Expanded Fluid Index (EFI)
- Matrix Reasoning
- Picture Concepts
- Figure Weights
- Arithmetic

At the risk of sounding greedy, I would like to have an expanded working memory index (Digit Span, Picture Span, and Letter-Number Sequencing) and an expanded processing speed index (Coding, Symbol Search, and Cancellation). Even so, I am grateful for this improvement in the WISC-V.

Is the structure of intelligence different for people with learning disorders? Let’s hope so!

W. Joel Schneider — Mon, 10 Aug 2015 15:59:44 +0000

I do not relish criticizing published studies. However, if a paper uses flawed reasoning to arrive at counterproductive recommendations for our field, I believe that it is proper to respectfully point out why the paper’s conclusions should be ignored. This study warrants such a response:

Giofrè, D., & Cornoldi, C. (2015). The structure of intelligence in children with specific learning disabilities is different as compared to typically developing children. Intelligence, 52, 36–43.

The authors of this study ask whether children with learning disorders have the same structure of intelligence as children in the general population. This might seem like an important question, but it is not—if the difference in structure is embedded in the very definition of learning disorders.

An Analogously Flawed Study

Imagine that a highly respected medical journal published a study titled Tall People Are Significantly Greater in Height than People in the General Population. Puzzled and intrigued, you decide to investigate. You find that the authors solicited medical records from physicians who labelled their patients as tall. The primary finding is that such patients have, on average, greater height than people in the general population. The authors speculate that the instruments used to measure height may be less accurate for tall people and suggest alternative measures of height for them.

This imaginary study is clearly ridiculous. No researcher would publish such a “finding” because it is not a finding. People who are tall have greater height than average by definition. There is no reason to suppose that the instruments used were inaccurate.

Things That Are True By Definition Are Not Empirical Findings.

It is not so easy to recognize that Giofrè and Cornoldi applied the same flawed logic to children with learning disorders and the structure of intelligence. Their primary finding is that in a sample of Italian children with clinical diagnoses of specific learning disorder, the four index scores of the WISC-IV have lower g-loadings than they do in the general population in Italy. The authors believe that this result implies that alternative measures of intelligence might be more appropriate than the WISC-IV for children with specific learning disorders.

What is the problem with this logic? The problem is that the WISC-IV was one of the tools used to diagnose the children in the first place. Having unusual patterns somewhere in one’s cognitive profile is part of the traditional definition of learning disorders. If the structure of intelligence were the same in this group, we would wonder if the children had been properly diagnosed. This is not a “finding” but an inevitable consequence of the traditional definition of learning disorders. Had the same study been conducted with any other cognitive ability battery, the same results would have been found.

People with Learning Disorders Have Unusual Cognitive Profiles.

A diagnosis of a learning disorder is often given when a child of broadly average intelligence has low academic achievement due to specific cognitive processing deficits. To have specific cognitive processing deficits, there must be a one or more specific cognitive abilities that are low compared to the population and also to the child’s other abilities. For example, in the profile below, the WISC-IV Processing Speed Index of 68 is much lower than the other three WISC-IV index scores, which are broadly average. Furthermore, the low processing speed score is a possible explanation of the low Reading Fluency score.

The profile above is unusual. The Processing Speed (PS) score is unexpectedly low compared to the other three index scores. This is just one of many unusual score patterns that clinicians look for when they diagnose specific learning disorders. When we gather together all the unusual WISC-IV profiles in which at least one score is low but others are average or better, it comes as no surprise that the structure of the scores in the sample is unusual. Because the scores are unusually scattered, they are less correlated, which implies lower g-loadings.

A Demonstration That Selecting Unusual Cases Can Alter Structural Coefficients

Suppose that the WISC-IV index scores have the correlations below (taken from the U.S. standardization sample, age 14).

	VC	PR	WM	PS
VC	1.00	0.59	0.59	0.37
PR	0.59	1.00	0.48	0.45
WM	0.59	0.48	1.00	0.39
PS	0.37	0.45	0.39	1.00

Now suppose that we select an “LD” sample from the general population all scores in which

At least one score is less than 90.
The remaining scores are greater than 90.
The average of the three highest scores is at least 15 points higher than the lowest score.

Obviously, LD diagnosis is more complex than this. The point is that we are selecting from the general population a group of people with unusual profiles and observing that the correlation matrix is different in the selected group. Using the R code at the end of the post, we see that the correlation matrix is:

	VC	PR	WM	PS
VC	1.00	0.15	0.18	−0.30
PR	0.15	1.00	0.10	−0.07
WM	0.18	0.10	1.00	−0.20
PS	−0.30	−0.07	−0.20	1.00

A single-factor confirmatory factor analysis of the two correlation matrices reveals dramatically lower g-loadings in the “LD” sample.

	Whole Sample	“LD” Sample
VC	0.80	0.60
PR	0.73	0.16
WM	0.71	0.32
PS	0.53	−0.51

Because the PS factor has the lowest g-loading in the whole sample, it is most frequently the score that is out of sync with the others and thus is negatively correlated with the other tests in the “LD” sample.

In the paper referenced above, the reduction in g-loadings was not nearly as severe as in this demonstration, most likely because clinicians frequently observe specific processing deficits in tests outside the WISC. Thus many people with learning disorders have perfectly normal-looking WISC profiles; their deficits lie elsewhere. A mixture of ordinary and unusual WISC profiles can easily produce the moderately lowered g-loadings observed in the paper.

Conclusion

In general, one cannot select a sample based on a particular measure and then report as an empirical finding that the sample differs from the population on that same measure. I understand that in this case it was not immediately obvious that the selection procedure would inevitably alter the correlations among the WISC-IV factors. It is clear that the authors of the paper submitted their research in good faith. However, I wish that the reviewers had noticed the problem and informed the authors that the paper was fundamentally flawed. Therefore, this study offers no valid evidence that casts doubt on the appropriateness of the WISC-IV for children with learning disorders. The same results would have occurred with any cognitive battery, including those recommended by the authors as alternatives to the WISC-IV.

R code used for the demonstration

# Correlation matrix from U.S. Standardization sample, age 14
WISC <- matrix(c(
  1,0.59,0.59,0.37, #VC
  0.59,1,0.48,0.45, #PR
  0.59,0.48,1,0.39, #WM
  0.37,0.45,0.39,1), #PS
  nrow= 4, byrow=TRUE)
colnames(WISC) <- rownames(WISC) <- c("VC", "PR", "WM", "PS")

#Set randomization seed to obtain consistent results
set.seed(1)

# Generate data
x <- as.data.frame(mvtnorm::rmvnorm(100000,sigma=WISC)*15+100)
colnames(x) <- colnames(WISC)

# Lowest score in profile
minSS <- apply(x,1,min)

# Mean of remaining scores
meanSS <- (apply(x,1,sum) - minSS) / 3

# LD sample
xLD <- x[(meanSS > 90) & (minSS < 90) & (meanSS - minSS > 15),]

# Correlation matrix of LD sample
rhoLD <- cor(xLD)

# Load package for CFA analyses
 library(lavaan)
# Model for CFA
m <- "g=~VC + PR + WM + PS"

# CFA for whole sample
summary(sem(m,x),standardized=TRUE)

# CFA for LD sample
summary(sem(m,xLD),standardized=TRUE)

No, the WISC-IV doesn’t underestimate the intelligence of children with autism.

W. Joel Schneider — Tue, 09 Dec 2014 21:14:10 +0000

The title of a new study asks “Does WISC-IV underestimate the intelligence of autistic children?” The authors’ answer is that it probably does. I believe that the reasoning behind this conclusion is faulty.

This study gives the unwarranted impression that it is a disservice to children with autism to use the WISC-IV. Let me be clear—I want to be helpful to children with autism. I certainly do not wish to do anything that hurts anyone. A naive reading of this article leads us to believe that there is an easy way to avoid causing harm (i.e., use the Raven’s Progressive Matrices test instead of the WISC-IV). In my opinion, acting on this advice does no favors to children with autism and may even result in harm.

Based on the evidence presented in the study, the average score differences between children with and without autism is smaller on Raven’s Progressive Matrices (RPM) and larger on the WISC-IV. The rhetoric of the introduction leaves the reader with the impression that the RPM is a better test of intelligence than the WISC-IV. Once we accept this, it is easy to discount the results of the WISC-IV and focus primarily on the RPM.

There is a seductive undercurrent to the argument: If you advocate for children with autism, don’t you want to show that they are more intelligent rather than less intelligent? Yes, of course! Doesn’t it seem harmful to give a test that will show that children with autism are less intelligent? It certainly seems so!

Such rhetoric reveals a fundamental misunderstanding of what individual intelligence tests like the WISC-IV are designed to do. In the vast majority of settings, they are not for certifying how intelligent a person is (whatever that means!). Their primary purpose is to help psychologists understand what a person can and cannot do. They are designed to help explain what is easy and what is difficult for a person so that appropriate interventions can be selected.

The WISC-IV provides a Full Scale IQ, which gives an overall summary of cognitive functions. However, it also gives more detailed information about various aspects of ability. Here is a graph I constructed from Figure 1 in the paper. In my graph, I converted percentiles to index scores and rearranged the order of the scores to facilitate interpretation.

asdf

" data-medium-file="https://assessingpsyche.files.wordpress.com/2014/12/wiscautism1.png?w=300" data-large-file="https://assessingpsyche.files.wordpress.com/2014/12/wiscautism1.png?w=696" class="wp-image-1887 size-full" src="https://assessingpsyche.files.wordpress.com/2014/12/wiscautism1.png" alt="asdf" width="721" height="415" srcset="https://assessingpsyche.files.wordpress.com/2014/12/wiscautism1.png 721w, https://assessingpsyche.files.wordpress.com/2014/12/wiscautism1.png?w=150&h=86 150w, https://assessingpsyche.files.wordpress.com/2014/12/wiscautism1.png?w=300&h=173 300w" sizes="(max-width: 721px) 100vw, 721px" />

Average Raven’s Progressive Matrices (RPM) and WISC-IV scores for children with and without autism

It is clear that the difference between the two groups of children is small for the RPM. It is also clear that the difference is also small for the WISC-IV Perceptual Reasoning Index (PRI). Why is this? The RPM and the PRI are both nonverbal measures of logical reasoning (AKA fluid intelligence). Both the WISC-IV and the RPM tell us that, on average, children with autism perform relatively well in this domain. The RPM is a great test, but it has no more to tell us. In contrast, the WISC-IV not only tells us what children with autism, on average, do relatively well, but also what they typically have difficulty with.

It is no surprise that the largest difference is in the Verbal Comprehension Index (VCI), a measure of verbal knowledge and language comprehension. Communication problems are a major component of the definition of autism. If children with autism had performed equally well on the VCI, we would wonder whether the VCI was really measuring what it was supposed to measure. Note that I am not saying that a low score on VCI is a requirement for the diagnosis of autism or that the VCI is the best measure of the kinds of language problems that are characteristic of autism. Rather, I am saying that children with autism, on average, have difficulties with language comprehension and that this difference is manifest to some degree in the WISC-IV scores.

The WISC-IV scores also suggest that, on average, children with autism not only have lower scores in verbal knowledge and comprehension, they are more likely to have other cognitive deficits, including in verbal working memory (as measured by the WMI) and information processing speed (as measured by the PSI).

Thus, as a clinical instrument, the WISC-IV performs its purpose reasonably well. Compared to the RPM, it gives a more complete picture of the kinds of cognitive strengths and weaknesses that are common in children with autism.

If the researchers wish to demonstrate that the WISC-IV truly underestimates the intelligence of children with autism, they would need to show that it underpredicts important life outcomes among this population. For example, suppose we compare children with and without autism who score similarly low on the WISC-IV. If the WISC-IV underestimated the intelligence of children with autism, they would be expected to do better in school than the low-scoring children without autism. Obviously, a sophisticated analysis of this matter would involve a more complex research design, but in principle this is the kind of result that would be needed to show that the WISC-IV is a poor measure of cognitive abilities for children with autism.

Intelligence and the Modern World of Work: A Special Issue of Human Resource Management Review

W. Joel Schneider — Mon, 08 Dec 2014 20:09:51 +0000

Charles Scherbaum and Harold Goldstein took an innovative approach to editing a special issue of Human Resource Management Review. They asked prominent I/O psychologists to collaborate with scholars from other disciplines to explore how advances in intelligence research might be incorporated into our understanding of the role of intelligence in the workplace.

It was an honor to be invited to participate, and it was a pleasure to be paired to work with Daniel Newman of the University of Illinois at Urbana/Champaign. Together we wrote an I/O psychology-friendly introduction to current psychometric theories of cognitive abilities, emphasizing Kevin McGrew‘s CHC theory. Before that could be done, we had to articulate compelling reasons I/O psychologists should care about assessing multiple cognitive abilities. This was a harder sell than I had anticipated.

Formal cognitive testing is not a part of most hiring decisions, though I imagine that employers typically have at least a vague sense of how bright job applicants are. When the hiring process does include formal cognitive testing, typically only general ability tests are used. Robust relationships between various aspects of job performance and general ability test scores have been established.

In comparison, the idea that multiple abilities should be measured and used in personnel selection decisions has not fared well in the marketplace of ideas. To explain this, there is no need to appeal to some conspiracy of test developers. I’m sure that they would love to develop and sell large, expensive, and complex test batteries to businesses. There is also no need to suppose that I/O psychology is peculiarly infected with a particularly virulent strain of g zealotry and that proponents of multiple ability theories have been unfairly excluded.

To the contrary, specific ability assessment has been given quite a bit of attention in the I/O psychology literature, mostly from researchers sympathetic to the idea of going beyond the assessment of general ability. Dozens (if not hundreds) of high-quality studies were conducted to test whether using specific ability measures added useful information beyond general ability measures. In general, specific ability measures provide only modest amounts of additional information beyond what can be had from general ability scores (ΔR² ≈ 0.02–0.06). In most cases, this incremental validity was not large enough to justify the added time, effort, and expense needed to measure multiple specific abilities. Thus it makes sense that relatively short measures of general ability have been preferred to longer, more complex measures of multiple abilities.

However, there are several reasons that the omission of specific ability tests in hiring decisions should be reexamined:

Since the time that those high quality studies were conducted, multidimensional theories of intelligence have advanced, and we have a better sense of which specific abilities might be important for specific tasks (e.g., working memory capacity for air traffic controllers). The tests measuring these specific abilities have also improved considerably.
With computerized administration, scoring, and interpretation, the cost of assessment and interpretation of multiple abilities is potentially far lower than it was in the past. Organizations that make use of the admittedly modest incremental validity of specific ability assessments would likely have a small but substantial advantage over organizations that do not. Over the long run, small advantages often accumulate into large advantages.
Measurement of specific abilities opens up degrees of freedom in balancing the need to maintain the predictive validity of cognitive ability assessments and the need to reduce the adverse impact on applicants from disadvantaged minority groups that can occur when using such assessments. Thus, organizations can benefit from using cognitive ability assessments in hiring decisions without sacrificing the benefits of diversity.

The publishers of Human Resource Management Review have made our paper available to download for free until January 25th, 2015.

Broad Abilities in CHC Theory

" data-medium-file="https://assessingpsyche.files.wordpress.com/2014/12/chclist.png?w=229" data-large-file="https://assessingpsyche.files.wordpress.com/2014/12/chclist.png?w=624" class="size-full wp-image-1876" src="https://assessingpsyche.files.wordpress.com/2014/12/chclist.png" alt="Broad Abilities in CHC Theory" width="624" height="816" srcset="https://assessingpsyche.files.wordpress.com/2014/12/chclist.png 624w, https://assessingpsyche.files.wordpress.com/2014/12/chclist.png?w=115&h=150 115w, https://assessingpsyche.files.wordpress.com/2014/12/chclist.png?w=229&h=300 229w" sizes="(max-width: 624px) 100vw, 624px" />

Broad Abilities in CHC Theory