Validity: Conceptual Issues Furr & Bacharach Chapter 8

Download Report

Transcript Validity: Conceptual Issues Furr & Bacharach Chapter 8

Validity: Conceptual Issues
Furr & Bacharach
Chapter 8
Contrasting Reliability &
Validity


Both fundamental to a sophisticated
understanding of psychometrics
Must have a clear understanding of the
relationship between the two
Definitions – notice differences

Reliability

Degree to which differences in test scores
reflect differences among people in their
levels of the trait that affects those scores,
whatever that trait may be


Quantitative property of the test scores
Validity

Tied to interpretation of test score

Tied to theory and implication of scores
LINK

Validity requires reliability

Stable traits (Intelligence & IQ)



States (Depression & BDI)


Measure at two point in time, scores should be stable
across time (test-retest reliability)
If not, the test cannot be a valid test of IQ
If poor internal consistency, can’t be valid
Reliability does not imply validity

Stable Trait (Autism & AQ)

May have excellent test-retest reliability or good internal
consistency, but may not be interpreted in a valid
manner
Iowa story


Don’t want to hire people who might abuse
clients anymore!!!
Personality tests…



Is there a test that measures the construct?
Does it validly measure abusive personality?
Is there a test that was designed to predict the
likelihood that a particular individual will abuse
people?
What is validity?


Definition
Implications of the contemporary
definition of validity
Validity ----- Definition

Basic Definition


The degree to which a test measures what
it is supposed to measure
Contemporary Definition

“The degree to which evidence and theory
support the interpretations of test scores
entailed by the proposed uses” of the test
Implications of the
contemporary definition
Implication 1

Interpretation and use of test scores
Validity  about interpretation
& use of test scores
NEO-PI-R

Conscientiousness scale – 48 items

High scores reflect an “active process
of planning, organizing and
carrying out tasks,“ and people with
high scores on this scale are
“purposeful, strong willed, and
determined”
NEO-PI-R
Conscientiousness Scale

What is the correct question about the scale’s
validity or invalidity?

Are the test items valid or invalid?

Are the test scores valid or invalid?

Is the interpretation of the test scores valid or
invalid?
Not “are items or scores valid
or invalid?”

The question is:

Are the authors’
interpretations of the
scores valid or invalid?

Are conscientiousness scores validly
interpreted in terms of planfulness,
organization, and determination?
Proposed use of scores…


Employers may use NEO-PI-R
Conscientiousness Scale to screen
potential employees
BELIEF: Differentiates potentially
better and worse employees?

Predictive power of conscientiousness
scale score?
Hammer is a useful tool if you
need to drive a nail…
What if you need to saw a
piece of wood?

Hammer is not a useful tool irrespective
of the need
Simplistic & inaccurate to
say…
“Conscientiousness scale is valid without
regard to the way in which it will be
interpreted and used”
Rather (what is accurate)



Scores can be interpreted validly as an
indicator of conscientiousness
Scale is not valid as a measure of intelligence or
extraversion
Not a valid predictor of successful
employment
Compare:
“Scores on the Conscientiousness scale
of the NEO-PI-R are validly interpreted
as a measure of conscientiousness.”
vs.
“The Conscientiousness scale of the
NEO-PI-R is valid.”
Implication 2

Validity is a matter of degree




Strong vs. weak
NOT valid vs. invalid
Select test if strong enough evidence
supporting intended interpretation and
use
http://www.wired.com/wired/archive/9.
12/aqtest.html
Concern about the Autism
Spectrum Quotient…


Marginal internal consistency, so
reliability is already of concern
What about validity?

Is it valid to interpret a high score on the
test as reflecting a high degree of autism
traits?
Interpretation of AQ
Autism
Spectrum
Quotient
Autism
Spectrum
Quotient
Magical
Ideation
Physical
Anhedonia
Perceptual
Aberration
Social
Anhedonia
.371**
.231*
.230*
.573**
Sig. (2tailed)
.000
.013
.014
.000
N
114
114
114
114
Pearson
Correlation
SCID-II
Paranoid
SCID II
Schizotypy
SCID II
Schizoid
.399**
.314**
.309**
.255**
.194**
Sig. (2tailed)
.000
.000
.000
.001
.010
N
179
179
179
178
178
Pearson
Correlatio
n
**. Correlation is significant at the 0.01 level (2-tailed).
*. Correlation is significant at the 0.05 level (2-tailed).
SCL-90
SCL-90
Paranoid Psychoticism
Regret vs. Autism? (r = .45)
Regret Scale
1. Whenever I make a choice, I’m curious about what would have happened if I had chosen
differently.
2. Whenever I make a choice, I try to get information about how the other alternatives
would have turned out.
3. If I make a choice and it turns out well, I still feel like somewhat of a failure if I find out
another choice would have turned out better.
4. When I think about how I’m doing in life, I often assess opportunities I have passed up.
5. Once I make a decision, I don’t look back.
Maximization Scale
1. When I watch TV, I channel surf, often scanning through the available options even
while attempting to watch one program.
2. When I am in the car listening to the radio, I often check other stations to see if
something better is playing, even if I’m relatively satisfied with what I’m listening to.
AQ
http://www.wired.co
m/wired/archive/9.12
/aqtest.html

What is to be measured?


What are the relative strengths of the
alternatives that are available to
measure that construct?
Select best measures of specific
characteristics to be assessed
Implication 3


Validity of a test’s interpretation is
based on evidence and theory
Human resources: “…in her experience,
use of NEO-PI-R was useful in
selection”
“Personality Color Test”

Based on color psychology (Max
Luscher)


Color preferences reveal something about
your personality
Survey of scientific literature finds almost
no empirical evidence of validity of color
preferences as a measure of personality
characteristics
Evidence for “color test”



Less than clear
Cite implies validity
Web site:


“Is the test reliable? We leave that to your
opinion. We can only say that there are a number
of corporations and colleges that use the Lûscher
test as part of their hiring/admissions processes. It
can be a useful tool for doctors and psychologists
as well and is used to get a quick overview of
potential issues patients may have in their lives.”
http://colorquiz.com/
“Color Quiz”


Is the test useful as a measure of
personality?
Denied employment based on such a
test?
Empirical evidence &
theoretical underpinnings?


Data from high quality research must
be available.
Theory alone is not adequate.
Contemporary view of validity

Although 3 forms, content, criterion,
and construct, contemporary
perspective highlights CONSTRUCT
VALIDITY
Standards


Standards for Educational and
Psychological Testing - revised (1999)
Co-published by



American Education Research Association
(AERA)
American Psychological Association (APA)
National Council on Measurement in
Education (NCME
Remember

Contemporary perspective highlights
CONSTRUCT VALIDITY
Standards outline 5 types of evidence relevant
for establishing validity of test interpretations
(AERA, APA, NCME, 1999)
Internal
Structure
Response
Processes
Associations With
Other Variables
Construct
Validity
Test
Content
Consequences
of Use
Construct
Validity
Test
Content
Validity Evidence:
Test Content


Match between the actual content of a
test and the content that should be
included in the test.
Psychological nature of the construct
should dictate the appropriate content
of the test.
Face Validity


Face validity – the degree to which a
measure appears to be related to a
specific construct in the judgment of
non-experts such as test takers and
representatives of the legal system.
LOOKS relevant, and this fact may
increase likelihood that the test will be
well received by users and takers
Threats to content validity



Construct-irrelevant content – e.g., test
includes questions on content not covered in
book, lecture, or discussion
Construct under-representation – e.g.,
test content fails to represent the full scope
of the content implied from the construct
Related practical issues – e.g., time,
respondent fatigue, respondent attention,
and etc. – Is content a fair representation?
Content Validity vs.
Face Validity

Content validity is the



degree to which the content reflects the full
domain of the construct &
can only be evaluated by experts who have a
deep understanding of the construct
Face validity is the

degree to which non-experts perceive the test to
be relevant to what they believe is being
measured by it
Internal
Structure
Construct
Validity
Validity Evidence:
Internal Structure of the Test

For a test to be validly interpreted as a
measure of a particular construct,


the actual structure of the test should
match the theoretically based
structure of the construct
Does the theoretical basis suggest a
unidimensional or a multi-dimensional
structure?
Internal Structure

Often assess via examination of factor
structure (factor analysis)



Items that are more strongly correlated
with each other than other items form
clusters called factors…
Factor analysis should clarify the number
of factors within a set of test questions
Example: Self esteem – is the construct
uni- or multi-dimensional?
Factor analysis
1. Clarifies number of factors
2. Reveals associations among the
factors within a multi-dimensional test
3. Identifies which items are linked to
which factors
Rosenberg Self-Esteem Inventory
(RSEI; Rosenberg 1989)
1.
2.
3.
4.
5.
6.
7.
On the whole, I am satisfied with myself
At times, I think I am no good at all.
I feel that I have a number of good qualities
I am able to do things as well as most other people
I feel I do not have much to be proud of
I certainly feel useless at time
I feel that I’m a person of worth, at least on an equal plan with
others
8. I wish I could have more respect for myself
9. All in all, I am inclined to feel that I am a failure
10. I take a positive attitude toward myself
RSEI - Scree Plot
Number of factors
evident in the plot?
Question:

This scree plot
provides evidence for
what type of structure
Scree Plot

5
Eigenvalues
a. Unidimensional
b. Multidimensional
6
4
3
2
1
0
0
1
2
3
4
5
Number
6
7
8
9
Response
Processes
Construct
Validity
Validity Evidence:
Response Processes

Match between the psychological processes
that respondents actually use when
completing a measure and the processes that
they should use.


When I say start, raise your finger when you feel
10 s have elapsed.
Assumption: should use “feel” (feels like time is
up)

but could use another process such as covert counting,
copying others, or looking at a second hand on a watch
Response processes

If a different response process used is
different than the one assumed to be
used, then the scores may not be
interpretable as the test developer
intended

Attention to the internal feel of time
passing vs. use of some selected process
to intentionally mark passage of time
Associations With
Other Variables
Construct
Validity
Validity Evidence:
Association With Other Variables

Match between a measure’s actual
associations with other measures and
the associations that the test should
have with the other measures.
Convergent evidence

The degree to which test scores are
correlated with tests of related
constructs
Discriminant evidence

Degree to which test scores are
uncorrelated with tests of unrelated
constructs
Example

Hypothesis: Schizophrenia and autism
are diametrically opposed constructs
Measure of autism should be uncorrelated
with measures of schizophrenia
Autism
Spectrum
Quotient
Autism
Spectrum
Quotient
Magical
Ideation
Physical
Anhedonia
Perceptual
Aberration
Social
Anhedonia
.371**
.231*
.230*
.573**
Sig. (2tailed)
.000
.013
.014
.000
N
114
114
114
114
Pearson
Correlation
SCID-II
Paranoid
SCID II
Schizotypy
SCID II
Schizoid
.399**
.314**
.309**
.255**
.194**
Sig. (2tailed)
.000
.000
.000
.001
.010
N
179
179
179
178
178
Pearson
Correlatio
n
**. Correlation is significant at the 0.01 level (2-tailed).
*. Correlation is significant at the 0.05 level (2-tailed).
SCL-90
SCL-90
Paranoid Psychoticism
Support for C & B’s theory?

NO: Convergent evidence - autism measure
correlated positively with sz measures




Finding: AU & SZ are related constructs?
i.e., Crespi & Badcock are wrong
Or
Not really yes, but could assume strong
correlations indicate weak validity of AQ as a
measure of autism construct
Concurrent validity evidence


The degree to which test scores are
correlated with other relevant variables
that are measured at the same time
as the primary test of interest
SAT is a measure of skills needed for
academic success?

Compare SAT administered during high
school senior year to hs senior year GPA
Predictive validity evidence


The degree to which test scores are
correlated with relevant variables that
are measured at a future point in
time.
SAT is a measure of skills needed for
academic success?

Compare SAT administered during senior
year of high school to college freshman
year GPA
Validity Evidence:
Consequences of Testing


Social consequences of test are a
facet of validity…
Standards for Educational and
Psychological Testing


Validity includes “the intended and
unintended consequences of test use”
E.g., does a construct and its
measurement benefit one group?
Not all agree…



Consequences of a testing program
should be considered a facet of the
scientific evaluation of the meaning of a
test score.
Some feel that this is an intrusion of
politics into science…
Can science be separated from personal
and social values?
Summary

Conceptual basis for validity
Internal
Structure
Response
Processes
Associations With
Other Variables
Construct
Validity
Test
Content
Consequences
of Use
Validity

Standard for Education and
Psychological Tests (1999)

The degree to which
evidence and theory support the
 interpretations of test scores entailed

by the

proposed uses of a test
Validity

Are decisions based on valid
interpretations of test scores?




Educational placement
Access to services
Hiring
Clinical decisions