Sunday, June 7, 2009

A Caveat on Tests and Scores

For many years I've included a Caveat on Tests and Scores in my psychological evaluation reports, before I report the test results. Here's the current version, taken from a report in process.

Before the Test Results: A Caveat About Tests and Scores

Psychological tests are useful tools as long as we remember their limitations. They are sampling procedures whose goal is to elicit examples of behavior which represent how the person being tested functions on those abilities which the tests seek to measure. The people who make the tests try to select items which are representative of those abilities, and to name those abilities accurately. Tests are interpreted by comparing individual results to how a broad sample of comparable people have responded to the same sets of items. There are thus several steps in the process at which test results can be misleading: test items might not be representative of the abilities being assessed, the abilities being assessed might not be accurately named, and the normative group on which the test was statistically standardized might not be representative of the population to which the individual’s results are being compared. In simple language, tests might be more or less well constructed, accurately named and meaningfully interpreted. All tests are approximate and all are works in progress. Each has its own history of evolution through theory and practice, and will continue to be modified and superseded.

“Intelligence tests,” for example, would be more realistically named, “Samples of Some Aspects of Intelligence.” Measuring human ability is not like measuring a plank of wood, even though we call them both “measurement” and express both as numbers. No one has ever seen “intelligence,” or the difference between one point and another on an I.Q. scale; intelligence is always inferred from samples of observed behavior.

Unlike the measurement of a plank of wood, the scores of psychological and psychoeducational tests nearly always show some variability. The scores of a person who takes the same test at different times, or equivalent forms of the same test, will probably not be quite the same. Similarly, a person who takes two different tests which are supposed to measure the same thing will nearly always receive somewhat different scores.

Test results that are presented as approximate grade or age levels can be affected by many factors, such as the availability of the child to instruction at the time the subject or skill being measured was learned, and the person’s functional abilities, then and now, in attention, receptive language, and working memory. Test results also may not reflect the potential benefit of repetition, study, or remediation, so they may not reflect overall ability. Considerable controversy exists about whether it’s even possible to have nationally accepted age and grade equivalent standards. In addition, a nationally balanced normative group, if such could ever be constituted, may not be representative of the level of academic achievement expected in, or typical of, a particular locality, so students might score higher or lower depending on the particular education they’d received.

Thus, it is usually the overall pattern of scores which are important, how they relate to one another and what they mean about how the person’s performance relates to that of his or her peers, and not individual scores.

It is also important to remember that scores might not always mean what we assume they do. Careful attention has to be given to the reasons why a child makes errors on some items. For example, a child may make an error on an item involving phonological processing because of a true phonemic processing deficit, because of a syntactic mistake rather than a phonemic mistake per se, because of lack of comprehension of what the item was asking of him or her, because of a lapse of attention, or because of some combination of factors.

Furthermore, the nature of intelligence and personality are still being researched, reconstrued, and debated, and the tests which we use to assess them are, of necessity, imperfect. Abilities such as practical intelligence, social intelligence, problem solving, creativity, and what might be called “self intelligence,” meaning ability to manage one’s emotions and behavior and improve through reflection on one’s experiences, are largely ignored by existing tests, but are vitally important for success in life. Wisdom, a vitally important characteristic, is virtually unmeasurable by any extant test.

In summary, psychological tests are useful to help develop an impression of how a particular person is functioning in some ways at a particular time in his or her life, as long as we keep their limitations firmly in mind.