Psychometrics

# Why composite scores are more extreme than the average of their parts

Suppose that two tests have a correlation of 0.6. On both tests an individual obtained an index score of 130, which is 2 standard deviations above the mean. If both tests are combined, what is the composite score?

Our intuition is that if both tests are 130, the composite score is also 130. Unfortunately, taking the average is incorrect. In this example, the composite score is actually 134. How is it possible that the composite is higher than both of the scores?

If I measure the length of a board twice or if I take the temperature of a sick child twice, the average of the results is probably the best estimate of the quantity I am measuring. Why can’t I do this with standard scores?

Standard scores do not behave like many of our most familiar units of measurement. Degrees Celsius have meaning in reference to a standard, the temperature at which water freezes at sea level. In contrast, standard scores do not have meaning compared to some absolute standard. Instead, the meaning of a standard score derives from its position in the population distribution. One way to describe the position of a score is its distance from the population mean. The size of this distance is then compared to the standard deviation, which is how far scores typically are from the population mean (more precisely, the standard deviation is the square root of the average squared distance from the mean). Thus, the “standard” to which standard scores are compared are the mean and standard deviation.

An index score of 130 is 2 standard deviations above the mean of 100.

The average of two imperfectly correlated index scores is not an index score. Its standard deviation is smaller than 15 and thus our sense of what index scores mean does not apply to the average of two index scores. To make sense of the composite score, we must convert it into an index score that has a standard deviation of 15.

$\dfrac{(130+130-2*100)}{\sqrt{2+2*0.6}}+100\approx 134$

How is this possible. It is unusual for someone to score 130. It is even more unusual for someone to score 130 on two tests that are imperfectly correlated. The less correlated the tests, the more unusual it is to score high on both tests.

Below is a geometric representation of this phenomenon. Correlated tests can be graphed with oblique axes (as is done in factor analyses with oblique rotations). The cosine of the correlation is the angle between the axes. As seen below, the lower the correlation, the more extreme the composite. As the correlation approaches 1, the composite approaches the average of the scores.

The lower the correlation, the more extreme the composite score.

If the scores are lower than the population mean, the composite score is lower than the average of the parts. For example, if the two scores are 71, and the correlation between the scores is 0.9, the composite score is 70.

When the subtest scores are below the mean, the composite score is lower than the average of the subtest scores.

In a previous post, I presented this material in greater detail.

Standard
History of Intelligence Theories

# After money, comfort, and love, Raymond Cattell had to make one more sacrifice…

In Cattell’s (1974) autobiography, we find not only warm gratitude for his mentor (Charles Spearman) but also a taste of the kinds of personal sacrifices many have chosen to make while contributing to our field. After an idyllic childhood and a romantic courtship of his first wife, Cattell was unable to find a permanent academic position in England. After years of near poverty (and neglect from an especially driven husband), Cattell’s wife, with whom Cattell was still very much in love, left him for “more comfortable circumstances.” After the divorce and further failure in securing an academic position in Britain, Cattell (1974) considered leaving for America,

But England was deep in my bones…The personal crisis, well nigh of despair,…tested the truth of Scawn Blunt’s lines: “He who has once been happy is for aye, out of destruction’s reach.” The broken marriage and the bleak future could be met. But could I disloyally uproot myself from that which had created the fiber of my being? The die was cast one day when I received a persuasive letter from E. L. Thorndike, asking me to be a research associate with him for a year. Of course, I knew of Thorndike’s work and it seemed to me about the most imaginative and fundamental that I knew of in America…I was stirred by the privilege and the possibilities, and after three days of emotional struggle decided to go. After all, it was only for a year. It was characteristic of Thorndike’s perspective, and independence, that he had reached out to a stranger three thousand miles away, possessing no personal “pull.” He had reacted purely to what he had found in my publications. I have tried to do the same in my turn for oncoming psychologists, judging by performance, not the “old school associations.”

(p. 69)

After several temporary positions, Cattell took a position at the University of Illinois at Urbana/Champaign. He was extremely grateful to the taxpayers of Illinois that he was about to spend the next three decades pursuing any question that he deemed important to answer. He said that, for him, life began at 40. He spent his time productively, producing dozens of books and hundreds of articles:

For many years I rarely left the laboratory before 11 P.M., and then was generally so deep in thought or discussion that I could find my car only because it was the last still in the parking lot!

(p. 75)

Cattell, R. B. (1974). Raymond B. Cattell. In G. Lindzey, (Ed.) A history of psychology in autobiography (Vol. 6) (pp. 59–100). Englewood Cliffs, NJ: Prentice-Hall.

Standard

# Advice for Psychological Evaluation Reports: Write about people, not tests

At its best, the end product of a psychological assessment is that a child’s life is made better because something useful and true is communicated to people who can use that information to make better decisions. How is this information best communicated? I believe that it is by the skillful retelling of the story of the child’s struggle to cope with the difficulties that led to the testing referral.

Not only are humans storytelling creatures, we are also storylistening creatures. We are moved by drama, cleansed by tragedy, unified by cultural myths, and inspired by tales of heroic struggle. Most importantly, through stories we remember enormous amounts of information. Tabulated test results are inert until the evaluator weaves them together into a coherent narrative explanation that helps children and their caregivers construct a richer, more nuanced, and more organized understanding of the problem. Compare the following assessment results.

# Explanation 1

On a test in which Judy had to repeat words and segment them into individual phonemes, Judy earned a standard score of 78, which is in the Borderline Range. Only 7 percent of children performed at Judy’s level or lower on this test. This test is a good predictor of the ability to read single words isolated from contextual cues. On a test that measures this ability, Judy scored an 83, which is in the 13th percentile or in the Low Average Range. Reading single words is necessary to understand sentences and paragraphs. On a test that requires the evaluee to read a paragraph and then answer questions that test the evaluee’s understanding of the text, Judy scored an 84, which is in the Low Average Range. This is in the 14th percentile. An 84 in Reading Comprehension is 24 points lower than her Full Scale IQ of 110 (75th percentile, High Average Range). This is significant at the .01 level and only 3% of children in Judy’s age range have a 24-point discrepancy or larger between Reading Comprehension and Full Scale IQ. Thus, Judy meets criteria for Reading Disorder. More specifically, Judy appears to have phonological dyslexia. Phonological dyslexia refers to difficulties in reading single words because of the inability to hear individual phonemes distinctly. This difficulty in decoding single words makes reading narrative text difficult because the reading process is slow and error prone. Intensive remediation in phonics skills followed by reading fluency training is recommended.

# Explanation 2

For most 12-year-olds as bright as Judy is, reading is a skill that is so well developed and automatic that it becomes a pleasure. For Judy, however, reading is chore. It takes sustained mental effort for her to read each word one by one. It then requires further concentration for her to go back and figure out what these individual words mean when they are strung together in complete sentences, paragraphs, and stories. It is a slow, laborious process that is often unpleasant for Judy.

Why did Judy, a bright and delightfully creative girl, fail to learn to read fluently? It is impossible to know with certainty. However, the problem that most likely first caused Judy to fall behind her peers is that she does not hear speech sounds as clearly as most people do. It is as if she needs glasses for her ears: The sounds are blurry. For example, although she can hear the whole word cat perfectly well, she might not recognize as easily as most children do that the word consists of three distinct sounds: |k|, |a|, and |t|. For this reason, she has to work harder to remember that these three sounds correspond to three separate letters: |k|=C, |a|=A, and |t|=T. With simple words like cat, Judy’s natural ability is more than sufficient to help her remember what the letters mean. However, learning to recognize and remember larger words, uncommonly used words, or words with irregular spellings is much more difficult for Judy than it is for most children.

Many children with the same difficulty in hearing speech sounds distinctly eventually learn to work around the problem and come to read reasonably well. However, Judy is a perceptive and sensitive girl. These traits are typically helpful but, unfortunately, they allowed her to be acutely aware, from very early on, that she did not read as well as her classmates. She clearly remembers that her friends and classmates giggled when she made reading errors that were, to them, inexplicable. For example, for a while she earned the nickname “Tornado Girl” when she was reading aloud in class and misread “volcano” as “tornado.” She came to dread reading aloud in class and felt growing levels of shame even when she read silently to herself. She began to avoid reading at all costs. She did not read for pleasure, even when the texts were easy enough for her to read because she felt, in her words, “dumb, dumb, and dumb.” Over the next several years, she fell further behind her peers. By avoiding reading, she never developed the smooth, automatic reading skills that are necessary to make reading a pleasurable and self-sustaining activity.

Although Judy’s ability to hear speech sounds distinctly is still low compared to her 12-year old peers, this weakness is not what is holding her back now. Indeed, her current ability to hear speech sounds distinctly is actually better than that of most 6 and 7 year-olds, most of whom learn to read without difficulty. With extra help, Judy can learn to decode words phonetically. However, in order for her to develop her reading fluency and reading comprehension skills to the level that she is capable, she will need to engage in sustained practice reading texts that are both interesting for Judy and are at the correct level of difficulty. She is likely to be willing to read only if she is helped to manage the sense of shame she feels when she attempts to read a book. This may require the collaboration of a reading specialist and a behavior specialist with expertise in the cognitive-behavioral treatment of anxiety-related problems.

# Comparing Explanations

I am reasonably confident that most readers would find the second explanation to be much more useful than the first. The second explanation is not better than the first simply because it is more detailed. Explanation 1 could have been supplemented with more details if I had taken the time to fill it with even more information about test results. The second explanation is not better simply because it avoids statistical jargon that is difficult for parents and teachers to understand. Even if the jargon were removed from the first explanation and inserted into the second, the second explanation would still be better.

The second explanation is better because it is more about Judy than about her performance on tests. The narrative explanation of how her reading problem developed and how it was maintained is better because it leads to better treatment recommendations. More importantly, it leads to recommendations that will be understood and remembered by Judy’s parents and teachers. One of the problems with the first explanation is, ironically, that it is not difficult to understand if it is properly explained. Most parents and teachers will nod their heads as they hear it. However, they are likely to forget the explanation as soon as they leave the room. Most of us are not accustomed to thinking about people in terms of sets of continuous variables. Without a narrative structure to hold them together, assessment details slip through the cracks of our memories quickly. It is unfortunate that a forgotten explanation, no matter how accurate, no matter how brilliant, is as helpful as no explanation at all.

This post is an excerpt from:

Schneider, W. J. (2013). Principles of assessment of aptitude and achievement. In D. Saklofske, C. Reynolds, & V. Schwean (Eds.), Oxford handbook of psychological assessment of children and adolescents (pp. 286–330). New York: Oxford.

Standard
Cognitive Assessment

# Advice for psycholgical evaluation reports: Make every sentence worth reading

I have made this [letter] longer, because I have not had the time to make it shorter.

– Blaise Pascal, “Lettres provinciales”, letter 16, 1657

The secret of being a bore is to tell everything.

– Voltaire, “Sixième discours: sur la nature de l’homme,” Sept Discours en Vers sur l’Homme (1738)

A little inaccuracy sometimes saves tons of explanation.

– Saki, The Square Egg, 1924

When we get together, we psychologists often lament that we spend a lot of time writing psychological evaluation reports that no one reads, at least not in full. I have come to believe that this is mostly our fault. Much of what we write in our reports is boring (e.g., describing each test), canned (e.g., describing each test), confusing (e.g., describing each test), and irrelevant (e.g., describing each test). It would be an understatement to say that I am not the first to voice such opinions.

If we want people to read our reports carefully, we must write reports that are worth reading all the way through. If you insist on including boring, canned, confusing, and irrelevant content, consider tucking it away in an appendix.

# Explain what you know, not how you know

As students we are rewarded for “showing our work.” We are encouraged to state a position and then provide data and arguments that justify our claims. The resulting literary form (the student position paper) aligns well with the objectives of the course but it rarely aligns with the purpose of psychological evaluation reports. Reports should focus on communicating to the reader something that is useful and true about an individual. Presenting observations and data and then walking the reader through the steps in our diagnostic reasoning is rarely helpful to non-specialists. Most readers need the results of our assessment (our interpretations and suggestions), not an account of our process.

# My old reports are embarrassing

My earliest reports contained mini-tutorials on operant conditioning, attachment theory, psychometrics, and specific details about the tests I administered (e.g., the structure and format of WISC subtests). I naively thought that this information would be interesting and helpful to people. In retrospect, I think that writing these explanations may have helped me more than the reader. Bits and pieces of my newly acquired expertise were not fully integrated in my mind and writing everything out probably consolidated my understanding. Whatever the benefit for me, I cannot remember a time in which the inclusion of such details proved crucial to selecting the right interventions and I can remember times in which they were confusing or alienating to parents.

# Bad habits I let go

Over the years, I began a long, slow process of letting go of the report templates I was given in graduate school and unlearning bad habits of my own invention.

• I stopped talking about the names, content, and structure of tests and measures and focused on the constructs they measured. I stopped organizing my reports by test batteries and instead used a theoretical organization. If I learn something important about the evaluee’s personality during the academic achievement testing, I weave that information into the personality section (and I rarely explain how such information was obtained).
• I stopped talking about numbers (e.g., standard scores and percentiles). Instead I describe what a person can or cannot do and why it matters. I still make extensive use of numbers in the initial stages of case conceptualization but at some point they fade into the background of the overall narrative.
• I stopped talking about the details of my observations and simply stated the overall conclusions from my observations (combined with other data).
• I stopped including information that was true but uninformative (e.g., the teen is left-handed but plays guitar right-handed). My “Background Information” section became the “Relevant Background Information” section. I often re-read reports after I am finished and try to remove details that clutter the overall message of the report. Often this means bucking tradition. For example, I was trained to ask about a great many details, including allergies. If a child’s allergies are so severe that they interfere with the ability to concentrate in school, they are worth reporting. However, in most cases a person’s mild allergies are not worth reporting.
• I stopped merely reporting information (e.g., the scores may be underestimates of ability because sometimes the evaluee appeared to give up when frustrated by the hard items on a few tests) and instead focused on contextualizing and interpreting the information so that the implications are clear (e.g., outside of the testing in which situations and on which tasks is the evaluee likely to underperform and by how much?).
• I stopped explaining why certain scores might be misleading. For example, if the WAIS-IV Arithmetic was high but other measures of working memory capacity were low (after repeated follow-up testing), I no longer explain that follow-up testing was needed, nor that at some point in the assessment process I was unsure about the person’s working memory abilities. I just explain what working memory is and why the person’s weakness matters. I do not feel the need, in most cases, to explain that the WAIS-IV Working Memory Index is inflated because of a high score on Arithmetic.
• I stopped explaining why scores that measure the same thing are inconsistent. Non-professionals won’t understand the explanation and professionals don’t need it. If the inconsistency reveals something important (e.g., fluctuating attention), I just state what that something is and why it matters.
• I stopped treating questionnaire data as more important and precise than interview data. I came to treat all questionnaires, no matter how long, as screeners. In most cases, I do not treat questionnaire data as a “test” that provides information that is independent of what the person said in the interview. Interview data and questionnaire data come from the same source. If the questionnaire data and the interview data are inconsistent, I interview the person until the inconsistency is resolved.
• I stopped sourcing my data every time I made a statement. For example, I stopped writing, “On the MMPI-2 and in the interview, X reported high levels of depression. In an interview, X’s husband also reported that X had high levels of depression.” It does not usually matter where or how I obtained the information about the depression. What matters is whether the information is accurate and useful. In the narrative, I only report my final opinion of what is going on based on the totality of evidence, not the bits and pieces of information I collected along the way.
• I stopped sourcing interview data when I was quite sure that it was correct. For example, I no longer write: “Susie’s mother reported that Susie’s reading difficulties were first noticed when she was in the first grade.” If I have every reason to believe that this is true, I simply say, “Susie’s reading difficulties were first noticed when she was in the first grade.” However, if I am uncertain that something Susie’s mother said is true or if I am reporting Susie’s mother’s opinion, I attribute the statement to her.
Standard

# Cronbach: Factor analysis is more like photography than chemistry.

Lee Cronbach would later achieve immortality for his methodological contributions (e.g., coefficient α, construct validity, aptitude by treatment interactions, and generalizability theory). His first big splash, though, was a 1949 textbook Essentials of Psychological Testing. Last week I was reading the 1960 edition of his textbook and found this skillfully worded comparison:

“Factor analysis is in no sense comparable to the chemist’s search for elements. There is only one answer to the question: What elements make up table salt? In factor analysis there are many answers, all equally true but not equally satisfactory (Guttman, 1955). The factor analyst may be compared to the photographer trying to picture a building as revealingly as possible. Wherever he sets his camera, he will lose some information, but by a skillful choice he will be able to show a large number of important features of the building.” p. 259

Standard

# Allowing yourself to be wrong allows you to be right…eventually

The greatest enemy of knowledge is not ignorance, it is the illusion of knowledge.

– Stephen Hawking

It is wise to remember that you are one of those who can be fooled some of the time.

– Laurence J. Peter

We human beings are so good at pattern recognition that sometimes we find patterns that are not even there. I have never seen a cognitive profile, no matter how unusual and outlandish, that did not inspire a vivid interpretation that explained EVERYTHING about a child. In fact, the more outlandish, the better. On a few occasions, some of the anomalous scores that inspired the vivid interpretations turned out to be anomalous due to scoring errors. In these humbling experiences, I have learned something important. I noticed that in those cases, my interpretations seemed just as plausible to me as any other. If anything, I was more engaged with them because they were so interesting. Of course, there is nothing wrong with making sense of data and there is nothing wrong with doing so with a little creativity. Let your imagination soar! The danger is in taking yourself too seriously.

The scientific method is a system that saves us from our tendencies not to ask the hard questions after we have convinced ourselves of something. Put succinctly, the scientific method consists of not trusting any explanation until it survives your best efforts to kill it. There is much to be gained in reserving some time to imagine all the ways in which your interpretation might be wrong. The price of freedom is responsibility. The price of divergent thinking is prudence. It is better to be right in the end than to be right right now.

This post is an excerpt from:

Schneider, W. J. (2013). Principles of assessment of aptitude and achievement. In D. Saklofske, C. Reynolds, & V. Schwean (Eds.), Oxford handbook of psychological assessment of children and adolescents (pp. 286–330). New York: Oxford.

Standard

# Advice for psychological evaluation reports: Render abstruse jargon in the vernacular

PRIMUS DOCTOR: Most learned bachelor whom I esteem and honor, I would like to ask you the cause and reason why opium makes one sleep.

BACHELIERUS: ….The reason is that in opium resides a dormitive virtue, of which it is the nature to stupefy the senses.

—from Molière’s Le Malade Imaginaire (1673)

A man thinks that by mouthing hard words he understands hard things.

—Herman Melville

The veil of ignorance can be weaved of many threads, but the one spun with the jangly jargon of a privileged profession produces a diaphanous fabric of alluring luster and bewitching beauty. Such jargon not only impresses outsiders but comforts them with what Brian Eno called the last illusion: the belief that someone out there knows what is going on. Too often, it is a two-way illusion. Like Molière’s medical student, we psychologists fail to grasp that our (invariably Latinate) technical terms typically do not actually explain anything. There is nothing wrong with technical terms, per se; indeed, it would be hard for professionals to function without them. However, with them, it is easy to fall into logical traps and never notice. For example, saying that a child does not read well because she has dyslexia is not an explanation. It is almost a tautology, unless the time is taken to specify which precursors to reading are absent, and thus, make dyslexia an informative label.

An additional and not insubstantial benefit of using ordinary language is that you are more likely to be understood. This is not to say that your communication should be dumbed down to the point that the point is lost. Rather, as allegedly advised by Albert Einstein, “Make everything as simple as possible, but not simpler.”

This post is an excerpt from:

Schneider, W. J. (2013). Principles of assessment of aptitude and achievement. In D. Saklofske, C. Reynolds, & V. Schwean (Eds.), Oxford handbook of psychological assessment of children and adolescents (pp. 286–330). New York: Oxford.

Standard