Classification, Diagnosis, & Assessment

Transcript Classification, Diagnosis, & Assessment

Testing in
Clinical Psychology
Dr. Kline
FSU-PC
What is a test?

What do you think????
I. Tests-definitions

“A test is a systematic procedure for observing and
describing a person’s behavior in a standard situation,”
(Cronbach, 1970; in Nietzel et al., 2003)

In theory tests should provide clinicians with an
“accurate” measure of an individual’s ability, skill,
talent, or knowledge base.

In Clinical Psychology, tests are extremely useful in the
assessment process. This is largely because tests are
more systematic and objective than clinical interviews.
II. Ways in which tests are distinct from
other assessment methods:


1. A test can be administered in a non-social setting (while
interviews are always conducted socially).
Note: Some individuals with psychopathology can “fake”
sanity during a clinical interview, but are “detected” by tests like
the MMPI & SKID, where there is no room for “savvy” replies to
bias the results.

2. Standardized tests produce results that are compared with
“normed” populations. This insures bias does not influence
testing or the results.

3. Testing can be administered in groups (GRE) &
individually, so large numbers of people can be tested
simultaneously.
III. What do tests measure??





Tests can be grouped into 4 distinct categories:
1. Intellectual functioning**
2. Personality characteristics**
3. Attitudes, interests, preferences, & values
4. Ability

**Tests most commonly used by Clinicians (for
assessment, treatment & research purposes).

Because general level of intellectual functioning & personality
are often influenced by psychopathology (e.g., schizophrenia,
Bipolar disorder), tests assessing these constructs are of
significant interest to Clinical Psychologists.
IV. How do we construct tests?

Three are three basic approaches to test
construction. Which method is appropriate
depends on a variety of factors.

1. Analytical Approach- Clinicians using
this approach determine the content they
think reflects the construct they want to
measure & derive test items based on all
facets of this construct.
Analytical contd.



What are the qualities I want to measure?
How do I define these qualities?
What kind of test items would make sense to use to measure
these qualities?
E.g., Using this method how could we measure “motivation”
1. We’d have to operationally define what is “motivation?”
2.
How do we measure it? Actions, verbal responses, etc.
3.
We’d need to generate test items that reflect what we know or
believe makes up motivational tendencies.
“True or False, I’m a self-starter”
“True or false, “I like to lead, not follow,” etc.

Problem with this method: The test items strongly reflect the
tester's view of what concepts should be examined. This
may be inaccurate.
2. Empirical approach:

Instead of deciding in advance which content is suitable to assess a given
construct, this method lets the content choose which items reflect the
construct.

For instance, instead of defining “motivation,” we could measure what people
who have already been identified as “highly motivated” or “not motivated”
do, feel, think, and so forth to see what items reflect motivation in people.

This way, the researcher isn’t using his/her bias in the concept of motivation
to determine how to measure the construct.

Problem with this method:
A. Is costly in terms of time & manpower to conduct it.
B. Requires sampling significantly more people to identify
“groups” of people who demonstrate high or low levels of the
trait of interest.


3. Sequential System Approach:

Combines both analytical & empirical approaches.
 Test items may be chosen based on analytical method,
but results may be statistically analyzed to see which
items are or are not correlated with one another, which
are too difficult or too easy, & so forth.

Or items may be chosen empirically, but determining
which items to then test may be determined analytically.
V. Tests of Intellectual Functioning:

While there is a long & dubious history regarding intelligence
testing, there is still no clearly adopted definition of what
“constitutes” intelligence.

“Mental testing” of intelligence or the psychometric approach,
describe intelligence as a general characteristic (called g), as a
set of up to 150 specific intellectual functions (called s’s) or as
some hierarchical combination of both.

Clinicians use a variety of intelligence tests to measure specific
aspects of intellectual functioning and compare the results
with normed data. These tests are standardized in an attempt to
rule out systematic biases based on gender, age, race, culture, &
so forth.
Note: Biases may still influence results.

A. Binet Scales

Alfred Binet devised an intelligence test for children in 1905 that consisted of
30 items & tasks. The total score was the number of items correct.

Some of the tasks for Binet’s test included requiring children to:
unwrap a piece of candy
Track a moving light with their eyes
Compare objects of different weights
Repeat #s & words from memory





This primitive test was improved in 1908, when tasks in the test were graded based
on the age of the participants. This meant, younger children we expected to get the
easier items correct, while older children were expected to pass the more difficult
items.

Binet & Simon examined over 200 children and determined that at certain ages,
children with normal functioning should be able to do certain things.

For instance, 3-year-olds should be able to identify their body parts (eyes, nose,
mouth), repeat 2-digit #’s, & 6-syllable sentences, as well as their name. Older
children like 7-year-olds should be able to identify missing parts of a drawing, copy
simple geometric shapes/figures, and identify coin denominations (Neitzel, et al.,
2003).
Additional revisions to the 1908 Binet scale:

A Stanford Psychologist, Lewis Terman, revised Binet’s test so
that the mental & chronological age of the participant would be
examined.

Stanford-Binet results –produced an intelligence quotient (IQ)
calculated by the following formula:
IQ = (mental age (MA) /chronological age)*100
Therefore, if a 5 yr-old produces a MA score of 7 the IQ for this child would be
140.
Terman also designated certain labels based on IQ ranges to reflect different types
of general intellectual functioning.
These labels today are listed as: “very superior,” “superior,” “high average”,
“average,” “low average,” “borderline,” & “mentally retarded.”
The Stanford-Binet

The most popular intelligence test in the US, it was revised
several more times (in 1937, 1960, 1973, 1986).

In 1960, IQ was no longer computed, but determined based on
tables in which the formula’s results were corrected for variances
based on age. Norms have been established for
standardization.

The most recent edition of the test, groups test items into 15
subtests.

In each of these subtests, the difficulty of the items are arranged
in ascending order & are organized into four different areas of
functioning: verbal reasoning, abstract/visual reasoning,
quantitative reasoning, & STM.
Scoring for Stanford-Binet:

Standard age scores or SAS are obtained for each
subtest by using tables that convert raw scores to
normalized standard scores with a mean of 50 & a
standard deviation of 8.
 Therefore, an IQ score of 58, means a child scored 1
standard deviation above the mean, and did better than
84% of his/her cohorts on the test.
How suitable is the Stanford-Binet for assessing
children’s intelligence?

The Stanford-Binet appears to be very reliable for
assessing children’s intelligence.

It has high test-retest reliability (above.90) and
internal consistency.

The test is also highly correlated with other
measures of intelligence & appears to distinguish
samples of gifted, retarded, & learning-disabled
children (Neitzel et al., 2003).
B. Wechsler Scales

David Wechsler, a psychologist at Bellevue Psychiatric hospital in
New York (still famous today), developed an intelligence test for
adults in 1939.

This test, called the Wechsler-Bellevue (W-B) Intelligence Scale,
differed from the Stanford-Binet in several ways:

1. It was for adults aged 17+

2. It was on a point scale, in which credit is given for each correct
answer. Hence, IQ did not reflect the relationship between mental
& chronological age, but a comparison of points earned for the
individual tested to those earned by many individuals of equal age.
Wechsler Adult Intelligence Scale: WAIS

Wechsler revised the W-B in 1955 & restandardized it to
reflect ethnic populations.

The WAIS was comprised of 6 verbal and 5
performance subtests, which meant you could
calculate a verbal IQ, Performance IQ, and FullScale IQ (a combination of verbal & performance).

The test was revised & restandardized in 1981 and again
in 1997 (the WAIS-III). This was done to make the test
more reliable given the diversity of ethnicity in our
population. In addition, because data were obtained on a
sample of 2,450 people ages ranging from 16 to 89, the
test can be administered to elderly individuals as well.
Types of test items on the WAIS-III:





Information (verbal): What is the capital of France?
Comprehension:
Why do foreign cars cost more than domestic cars?
Arithmetic:
If you have 4 apples & give 2 away, how many do
you have left?
Similarities:
Identify similar aspects of pairs like: hammerscrewdriver, dog-flower, portrait-short story
Digit Symbol/coding: copy designs that are associated with different #s as
quickly as possible.
Digit Span:
Repeat in forward & reverse order: 2 to 9 digit numbers.
Vocabulary:
Define: chair, dime, lunch, valley, asylum, sanctuary
Picture Completion: Find missing objects in increasingly complex pictures.
Block Design:
Arrange blocks to match increasingly complex standard
patterns.
Picture Arrangement: Place increasing #s of pictures together to make
increasingly complex stories.
Symbol Search:
Visually scan & recognize a series of symbols.
Scoring for the WAIS-III

To get Full Scale, Verbal, & Performance IQ scores for
subjects, the individual’s total points correct for each
subtest are converted to standardized IQ scores with a
mean of 100 & a standard deviation of 15.
Premorbid IQ & suitability for measuring
intelligence:

Clinicians with time constraints can get a quick measure of
the Full-Scale IQ by combining the individual’s scores for the
vocabulary subtest and the Block design.
This is the premorbid IQ. The scaling is the same, 100 is the
mean score with a standard deviation of 15.
The WAIS-III is reliable and valid for measuring intelligence
in adults. Reliability for each subject and all subtests
combined is .93 and above across all age ranges.
It also correlates highly with other measures of intelligence.
Wechsler Intelligence Scale for Children (WISC-III):

The WISC originally developed in 1949 to assess intelligence in
children has been revised several times and is now the WISC-III
(1991). It is based on norms of 2,200 children aged 6-16 in the
US in 1988.

It is based on 13 subtests which examine verbal comprehension,
perceptual organization, freedom from distractibility (memory &
attention), and processing speed.

Reliability and validity of this test are high and it correlates
highly with the Stanford-Binet intelligence test.
VI. Personality Tests:

One of the most influential & widely used
personality tests is the Minnesota Multiphasic
Personality Inventory (MMPI).

Developed in the 1930s by Hathaway &
McKinley at the University of Minnesota, this
test is used to screen large groups of people for
psychological disorders.

This inventory is very useful in situations when
a clinical interview may not be conducted.
Construction of the MMPI:

Over 1,000 items from older personality tests & other
sources were converted into statements that individuals
could respond to with “true,” “false,” or “cannot say.” A
significant number of these items were then presented to
thousands of normal individuals as well as individuals
with diagnosed mental disorders.

When compared to normal individuals, several
patterns emerged for individuals diagnosed with
psychological disorders.

In all 10 scales were developed.
MMPI scales

Validity scales:

L (Lie scale)-15 items of overtly good self-report, e.g., “I smile at everyone I meet.”
F (frequency or infrequency)
K (correction): 30 items reflecting defensiveness in admitting problems, “I feel bad when others criticize
me.”



Clinical Scales:

1 or HS (hypochondriasis)- 32 items of patients’ abnormal preoccupation with their health. E.g., I
have chest pain several times a week. (true)
2 or D (Depression)- 57 items examining depressive symptoms
3 or Hy (conversion Hysteria)
4 or Pd (psychopathic deviate)-50 items examining patient’s disregard for social and conventional
customs.
5 or Mf (masculinity-femininity)- 56 items showing homoeroticism & items differentiating between
men & women.
6 or Pa (paranoia)- 40 items from patients showing abnormal suspiciousness
7 or Pt (Psychasthenia)- 48 items based on neurotic patients showing obsessions, compulsions, phobias,
guilt, & indecisiveness.
8 or Sc (Schizophrenia)- 78 items from patients showing bizarre or unusual thoughts or behavior.
9 or Ma (Hypomania)- 46 items from patients characterized by emotional excitement, over activity,
and flight of ideas.
0 or Si (Social Introversion)- 69 items from people showing shyness, little interest in others, and
insecurity.









MMPI

Current version has over 500 items.

Validity scales are crucial to detecting test-taking attitudes and
response biases.

The Lie Scale is very important as it is designed to catch
individuals trying to “fake” good on the inventory. These items
if answered truthfully would hint at negative information about
the person (e.g., a false answer in response to, “I always read the
editorial every morning.”).

If a person answers “Yes” to a significant # of lie scale questions,
then it indicates they are “impression managing” which
suggests a problem with their performance on the inventory.
MMPI-2

The MMPI was restandardized, revised, & made
available in 1989. It can be administered paper-npencil & by computer.

It compares patterns of responding in individuals to
determine if psychopathology is present and in what
form.

It is widely used in making diagnostic assessments of
individuals and remains the most important test used
by Clinicians.
California Psychological Inventory: CPI

Was designed to assess personality in the “normal” population.

Half of its items are taken from the MMPI and the other are
newly generated items.

Because the CPI was conducted on over 13,000 males & females
from all socioeconomic statuses & parts of the US, it provides a
very strong test of personality assessment.

It has been shown to be very reliable for predicting
delinquency, parole outcome, academic performance, and
likelihood of dropping out of school (Neitzel et al., 2003).

Its also computerized making administration easy to conduct.
VII. Projective Personality Tests

Standard stimuli are presented to the patient
(inkblots or drawings) ambiguous enough to
allow for variation in responses.

Patient’s responses should be based on
primarily unconscious processes & will
reveal his/her true feelings, thoughts,
motives.
Types of Projective tests:

A. Rorschach Inkblot test– patient presented with 10
inkblots.

Half of inkblots are in black, white, & shades of gray.
Two have red splotches, and 3 are in pastel colors.

Patients report what they “see.”

The test was developed by Swiss psychiatrist, Hermann
Rorschach in the early 1900s. Beck, an American
psychology student, published a standardized
procedure for measuring responses on the Rorschach in
1937. Following this, other reports came out.
Rorschach scoring:

The client reports what he/she sees in each inkblot.

The Clinician writes down the individual’s answer
verbatim. When all the cards have been presented,
the Clinician goes through the set of cards and
conducts an inquiry about what characteristics of
each inkblot led to their answer.

Answers are coded. Scoring involves the location,
determinants, content, & popularity of the responses.
Scoring Dimensions:

1. Location—what area of the blot led the client to respond
(e.g., whole blot, an unusual detail, white space, a
combination of these aspects).

2. Determinants-characteristic of the blot that influenced a
response; this includes form, color, shading, & movement.

3. Content- the subject of the blot. That is what is perceived
(e.g., animals, figures, objects, sexual symbols, blood, etc.)

4. Popularity- refers to frequency of specific kinds of
responses made my many individuals.
2. Thematic Apperception Test
(TAT)

Patient is shown a series of black-and-white
pictures one by one and asked to tell a story
related to each.

What is the symbolic meaning underlying
the story the patient provides?

Classification, Diagnosis, & Assessment

Transcript Classification, Diagnosis, & Assessment

Directory