For fun, I made spiro, an R package for creating animated spirographs. Check it out. I would appreciate any suggestions for improving it.
For fun, I made spiro, an R package for creating animated spirographs. Check it out. I would appreciate any suggestions for improving it.
It’s official! The second edition of Essentials of Assessment Report Writing has been published. My co-authors and I worked hard to make sure every sentence was worth reading. We hope that our work helps professionals write reports that restore hope and inspire change in the lives of people who have found themselves overwhelmed by circumstance.
I am grateful to Alan and Nadeen Kaufman for the invitation to update and expand upon the first edition and to Liz Lichtenberger, Nancy Mather, and Nadeen Kaufman for welcoming me into their writing team. John Willis and Rita McCleary each contributed a chapter brimming with insight. We selected first-rate scholars and practitioners to contribute examples of great report writing along with annotations that let readers listen in on their report-writing process. Thank you Lisa King Chalukian, Robert Lichtenstein, Linda M. Fishman, Donna Goetz, Elaine Fletcher-Janzen, Christopher J. Nicholls, John M. Garruto, Alison Wilkinson-Smith, Jennie Kaufman Singer, and Susan Engi Raiford.
I wish I were one of those people who write with ease, but for me, every sentence is a wrestling match. I would still be pinned to the mat, with fading hopes of escape if my spouse, Renée Tobin, had not repeatedly made sacrifices in her own full-to-brim schedule to give me the gift of uninterrupted time and solitude. I am forever in awe or her.
The Receptive, Expressive & Social Communication Assessment–Elementary (RESCA-E) is a new and innovative measure of oral language abilities. When its publisher, ATP Assessments, asked me to classify the subtests of the RESCA-E according to their likely loadings on CHC Theory abilities, I billed them for the time it took me to do so. However, I was so impressed with the instrument, I thought that it merited a short review and some statistical investigations of its structure. The review was born of pure enthusiasm–I did not not bill APT Assessments for the many additional hours I spent performing statistical analyses, creating plots, and writing up the results. The review contains several features that cannot be displayed on this blogging platform (e.g., interactive 3D plots), but it can be seen in its entirety here.
When a person scores exactly 2 standard deviations below the mean on several tests, it is intuitive that the composite score that summarizes these scores should also be exactly 2 standard deviations below the mean. Out intuitions let us down in this case because in this case the composite score is lower than 2 standard deviations. I attempt to make this “composite score extremity effect” a little more intuitive in an Assessment Service Bulletin for the Woodcock-Johnson IV.
Schneider , W. J. (2016). Why Are WJ IV Cluster Scores More Extreme Than the Average of Their Parts? A Gentle Explanation of the Composite Score Extremity Effect (Woodcock-Johnson IV Assessment Service Bulletin No. 7). Itasca, IL: Houghton Mifflin Harcourt.
I thank Mark Ledbetter for the invitation to write the paper and support in the writing process, Erica LaForte for patiently editing a complex first draft down to a much more readable version, and Kevin McGrew for additional thoughtful comments and suggestions for improvement on the first draft.
The bulk of the paper is not mathematical. However, the first draft had a few bells and whistles like the animated graph below that shows how the composite score extremity effect is larger as the average correlation among the tests decreases and the number of tests in the composite increases.
Another plot that was originally animated shows what our best guess of a latent variable X if we have two indicators X1 and X2 that are both exactly 2 standard deviations below the mean. X1 and X2 correlate with each other at 0.64 and with X at 0.8. If we only know that X1 = −2, our best guess is that X is −1.60. If we know that both X1 and X2 are −2, out best guess is that X is −1.95. Thus, our estimate is lower with 2 scores (−1.95) than with one score (−1.60).
I have long complained that making custom composite scores should not be difficult. The ability to combine scores as one wishes should be a feature of every scoring program for every cognitive battery.
No matter which tests I have given, I would like to be able to combine them into theoretically valid composite scores. For example, on the WISC-V, the Verbal Comprehension Index (VCI) consists of two subtest scores, Vocabulary and Similarities. However, the Information and Comprehension subtests measure verbal knowledge just as well as the other two tests. We should be able to combine them with the two VCI subtests to make a more valid estimate of verbal knowledge.
The good news is that the WISC-V now allows us to do just that: It now has two expanded composite scores:
At the risk of sounding greedy, I would like to have an expanded working memory index (Digit Span, Picture Span, and Letter-Number Sequencing) and an expanded processing speed index (Coding, Symbol Search, and Cancellation). Even so, I am grateful for this improvement in the WISC-V.
I do not relish criticizing published studies. However, if a paper uses flawed reasoning to arrive at counterproductive recommendations for our field, I believe that it is proper to respectfully point out why the paper’s conclusions should be ignored. This study warrants such a response:
The authors of this study ask whether children with learning disorders have the same structure of intelligence as children in the general population. This might seem like an important question, but it is not—if the difference in structure is embedded in the very definition of learning disorders.
Imagine that a highly respected medical journal published a study titled Tall People Are Significantly Greater in Height than People in the General Population. Puzzled and intrigued, you decide to investigate. You find that the authors solicited medical records from physicians who labelled their patients as tall. The primary finding is that such patients have, on average, greater height than people in the general population. The authors speculate that the instruments used to measure height may be less accurate for tall people and suggest alternative measures of height for them.
This imaginary study is clearly ridiculous. No researcher would publish such a “finding” because it is not a finding. People who are tall have greater height than average by definition. There is no reason to suppose that the instruments used were inaccurate.
It is not so easy to recognize that Giofrè and Cornoldi applied the same flawed logic to children with learning disorders and the structure of intelligence. Their primary finding is that in a sample of Italian children with clinical diagnoses of specific learning disorder, the four index scores of the WISC-IV have lower g-loadings than they do in the general population in Italy. The authors believe that this result implies that alternative measures of intelligence might be more appropriate than the WISC-IV for children with specific learning disorders.
What is the problem with this logic? The problem is that the WISC-IV was one of the tools used to diagnose the children in the first place. Having unusual patterns somewhere in one’s cognitive profile is part of the traditional definition of learning disorders. If the structure of intelligence were the same in this group, we would wonder if the children had been properly diagnosed. This is not a “finding” but an inevitable consequence of the traditional definition of learning disorders. Had the same study been conducted with any other cognitive ability battery, the same results would have been found.
A diagnosis of a learning disorder is often given when a child of broadly average intelligence has low academic achievement due to specific cognitive processing deficits. To have specific cognitive processing deficits, there must be a one or more specific cognitive abilities that are low compared to the population and also to the child’s other abilities. For example, in the profile below, the WISC-IV Processing Speed Index of 68 is much lower than the other three WISC-IV index scores, which are broadly average. Furthermore, the low processing speed score is a possible explanation of the low Reading Fluency score.
The profile above is unusual. The Processing Speed (PS) score is unexpectedly low compared to the other three index scores. This is just one of many unusual score patterns that clinicians look for when they diagnose specific learning disorders. When we gather together all the unusual WISC-IV profiles in which at least one score is low but others are average or better, it comes as no surprise that the structure of the scores in the sample is unusual. Because the scores are unusually scattered, they are less correlated, which implies lower g-loadings.
Suppose that the WISC-IV index scores have the correlations below (taken from the U.S. standardization sample, age 14).
Now suppose that we select an “LD” sample from the general population all scores in which
Obviously, LD diagnosis is more complex than this. The point is that we are selecting from the general population a group of people with unusual profiles and observing that the correlation matrix is different in the selected group. Using the R code at the end of the post, we see that the correlation matrix is:
A single-factor confirmatory factor analysis of the two correlation matrices reveals dramatically lower g-loadings in the “LD” sample.
|Whole Sample||“LD” Sample|
Because the PS factor has the lowest g-loading in the whole sample, it is most frequently the score that is out of sync with the others and thus is negatively correlated with the other tests in the “LD” sample.
In the paper referenced above, the reduction in g-loadings was not nearly as severe as in this demonstration, most likely because clinicians frequently observe specific processing deficits in tests outside the WISC. Thus many people with learning disorders have perfectly normal-looking WISC profiles; their deficits lie elsewhere. A mixture of ordinary and unusual WISC profiles can easily produce the moderately lowered g-loadings observed in the paper.
In general, one cannot select a sample based on a particular measure and then report as an empirical finding that the sample differs from the population on that same measure. I understand that in this case it was not immediately obvious that the selection procedure would inevitably alter the correlations among the WISC-IV factors. It is clear that the authors of the paper submitted their research in good faith. However, I wish that the reviewers had noticed the problem and informed the authors that the paper was fundamentally flawed. Therefore, this study offers no valid evidence that casts doubt on the appropriateness of the WISC-IV for children with learning disorders. The same results would have occurred with any cognitive battery, including those recommended by the authors as alternatives to the WISC-IV.
# Correlation matrix from U.S. Standardization sample, age 14 WISC <- matrix(c( 1,0.59,0.59,0.37, #VC 0.59,1,0.48,0.45, #PR 0.59,0.48,1,0.39, #WM 0.37,0.45,0.39,1), #PS nrow= 4, byrow=TRUE) colnames(WISC) <- rownames(WISC) <- c("VC", "PR", "WM", "PS") #Set randomization seed to obtain consistent results set.seed(1) # Generate data x <- as.data.frame(mvtnorm::rmvnorm(100000,sigma=WISC)*15+100) colnames(x) <- colnames(WISC) # Lowest score in profile minSS <- apply(x,1,min) # Mean of remaining scores meanSS <- (apply(x,1,sum) - minSS) / 3 # LD sample xLD <- x[(meanSS > 90) & (minSS < 90) & (meanSS - minSS > 15),] # Correlation matrix of LD sample rhoLD <- cor(xLD) # Load package for CFA analyses library(lavaan) # Model for CFA m <- "g=~VC + PR + WM + PS" # CFA for whole sample summary(sem(m,x),standardized=TRUE) # CFA for LD sample summary(sem(m,xLD),standardized=TRUE)
The title of a new study asks “Does WISC-IV underestimate the intelligence of autistic children?” The authors’ answer is that it probably does. I believe that the reasoning behind this conclusion is faulty.
This study gives the unwarranted impression that it is a disservice to children with autism to use the WISC-IV. Let me be clear—I want to be helpful to children with autism. I certainly do not wish to do anything that hurts anyone. A naive reading of this article leads us to believe that there is an easy way to avoid causing harm (i.e., use the Raven’s Progressive Matrices test instead of the WISC-IV). In my opinion, acting on this advice does no favors to children with autism and may even result in harm.
Based on the evidence presented in the study, the average score differences between children with and without autism is smaller on Raven’s Progressive Matrices (RPM) and larger on the WISC-IV. The rhetoric of the introduction leaves the reader with the impression that the RPM is a better test of intelligence than the WISC-IV. Once we accept this, it is easy to discount the results of the WISC-IV and focus primarily on the RPM.
There is a seductive undercurrent to the argument: If you advocate for children with autism, don’t you want to show that they are more intelligent rather than less intelligent? Yes, of course! Doesn’t it seem harmful to give a test that will show that children with autism are less intelligent? It certainly seems so!
Such rhetoric reveals a fundamental misunderstanding of what individual intelligence tests like the WISC-IV are designed to do. In the vast majority of settings, they are not for certifying how intelligent a person is (whatever that means!). Their primary purpose is to help psychologists understand what a person can and cannot do. They are designed to help explain what is easy and what is difficult for a person so that appropriate interventions can be selected.
The WISC-IV provides a Full Scale IQ, which gives an overall summary of cognitive functions. However, it also gives more detailed information about various aspects of ability. Here is a graph I constructed from Figure 1 in the paper. In my graph, I converted percentiles to index scores and rearranged the order of the scores to facilitate interpretation.
It is clear that the difference between the two groups of children is small for the RPM. It is also clear that the difference is also small for the WISC-IV Perceptual Reasoning Index (PRI). Why is this? The RPM and the PRI are both nonverbal measures of logical reasoning (AKA fluid intelligence). Both the WISC-IV and the RPM tell us that, on average, children with autism perform relatively well in this domain. The RPM is a great test, but it has no more to tell us. In contrast, the WISC-IV not only tells us what children with autism, on average, do relatively well, but also what they typically have difficulty with.
It is no surprise that the largest difference is in the Verbal Comprehension Index (VCI), a measure of verbal knowledge and language comprehension. Communication problems are a major component of the definition of autism. If children with autism had performed equally well on the VCI, we would wonder whether the VCI was really measuring what it was supposed to measure. Note that I am not saying that a low score on VCI is a requirement for the diagnosis of autism or that the VCI is the best measure of the kinds of language problems that are characteristic of autism. Rather, I am saying that children with autism, on average, have difficulties with language comprehension and that this difference is manifest to some degree in the WISC-IV scores.
The WISC-IV scores also suggest that, on average, children with autism not only have lower scores in verbal knowledge and comprehension, they are more likely to have other cognitive deficits, including in verbal working memory (as measured by the WMI) and information processing speed (as measured by the PSI).
Thus, as a clinical instrument, the WISC-IV performs its purpose reasonably well. Compared to the RPM, it gives a more complete picture of the kinds of cognitive strengths and weaknesses that are common in children with autism.
If the researchers wish to demonstrate that the WISC-IV truly underestimates the intelligence of children with autism, they would need to show that it underpredicts important life outcomes among this population. For example, suppose we compare children with and without autism who score similarly low on the WISC-IV. If the WISC-IV underestimated the intelligence of children with autism, they would be expected to do better in school than the low-scoring children without autism. Obviously, a sophisticated analysis of this matter would involve a more complex research design, but in principle this is the kind of result that would be needed to show that the WISC-IV is a poor measure of cognitive abilities for children with autism.