Chapter 5 Validity - East China Normal University

Download Report

Transcript Chapter 5 Validity - East China Normal University

Chapter 6 Validity
§1 Basic Concepts of Validity
 What is the Validity?
Interpretation
•
The validity of a test concerns what the test
measure and how well it does so.
—Anne Anastasi
It tell us what can be inferred from test scores
—Anne Anastasi
Figure6.1
One Funny
Picture
•
Validity can be defined as the agreement
between a test score or measure and the quality it
is believed to measure.
— Robert M. Kaplan
Dennis P. Saccuzzo
Does the test measure what it is supposed to
measure?
•
Validity is the evidence for inferences made
about a test score.
—AERA, APA, NCME
STANDARS FOR EDUCATIONAL AND PSYCHOLOGICAL TESTING
Validity effected by random and
systematic errors.
Random errors and systematic errors both
reduce the accuracy of the test.
Mathematic Definition of Validity
2
s2  sco
 ssp2
2
sco
Val  2
st
(6.1)
Validity coefficient is the ratio of
The variance concerned to the trait
measured to observed score
variance.
Comparing Validity with Reliability
The reliability of test is low, usually, the validity is low too;
The reliability of test is high, the validity isn’t necessarily high.
s2
ssp2
2
sco
se2
Figure 6.2 Components of the Variance of Observed Scores
Reliability is a necessary premise for validity and validity
represents the ultimate purpose of the test.
 Three
Types of Validity
Criterion-Related Validity
Content-Related validity
Construct –Related Validity
Note: The most recent standards emphasize that validity
is a unitary concept. The use of categories does not
imply that there are distinct forms of validity

Effect Factors for Validity

Test Itself

Test Administration and Scoring

Examinees

The Criterion Chosen for Criterion Validity
 Effect
from test itself
The statement of the items is clear or not
The items represent the trait measured or not
The length of the test is adequate or not
The test difficulty is proper or not.
…
 Test
administration and scoring
Whether the sample is representative, heterogeneous.
Whether the testing conditions are appropriate and
unexpected disturbances occur.
Whether the tester administers the test according to
the manual.
Whether the test guides for examinees are clear.
Whether the Scoring system is object and standard.
 Examinees
Interests and Motivation on the Test
Emotional State and Attitude During the Testing
State of Physical Health
Experiences on Test
 The
criterion chosen for criterion
validity
rXY  rXX  rYY 
§2 Content Validity and
Construct Validity
Content Validity
Interpretation
Content validity involves the careful definition
of the domain of behaviors to be measured by the
test and the logical design of items to cover all the
important areas of the domain.
The purpose of a content validity is to assess
whether the items adequately represents a
performance domain or construct of specific
interest
It is established through a rational analysis of
the content of a test.
Steps for Content Validation Using
Experts Judgment
1.
2.
3.
4.
Defining the performance domain of interest
Selection a panel of qualified experts in the
content domain
Providing a structured framework for the
process of matching items to the
performance domain
Collecting and summarizing the data from
the matching process
Application
Content validity is most often employed with
achievement test, so the performance domain is
often defined by a list of instructional objectives.
Content validity is also applicable to certain
occupational test designed for employee
selection and classification.
Table6.1 Table of Instructional objectives
knowledge
Chapter1
Chapter2
Chapter3
Chapter4
3
2
Sum
5
Comprehension application
10
6
9
25
analysis
evaluation synthesis
Sum
8
6
2
12
2
2
4
6
10
7
5
6
10
28
22
40
28
14
22
6
100
Distinction form Face Validity
The face validity refers to what it appears
superficially to measure, not to what the test
actually measures.
 Construct
Validity
Interpretation
The construct validity of a test is the extent to
which the test may be said to measure a theoretical
construct or trait.
What is Construct?
Each construct is developed to explain and
organize observed response consistencies. It
derives from established interrelationships among
behavioral measures.
Examples: scholastic aptitude, intelligence, verbal fluency,
anxiety, depression, self-esteem, etc..
Construct validation has focused
attention on the role of psychological theory
in test construction and on the need for
formulate hypotheses that can be proved or
disproved in validation process.
Anne Anastasi
Procedures for Construct Validation
Correlations between a measure of the
construct and designated
Internal Consistency
Differentiation between Groups
Development Changes
Factor Analysis
Multitrait –multimethod matrix
Method 1
Trait A B C
Method 2
A B C
Method 3
A B C
1.True-False
A. Sex-Guilt
(.95)
B. Hostility-Guilt
.28 (.86)
C. Morality-Conscience .58 .39 (.92)
2.Force Choice
A. Sex-Guilt
.86 .32 .57 (.95)
B. Hostility-Guilt
.30 .90 .40 .39 (.76)
C. Morality-Conscience .52 .31 .86 .55 .26 (.84)
3.Incomplete Sentences
A. Sex-Guilt
.73 .10 .43 .64 .17 .37 (.48)
B. Hostility-Guilt
.10 .63 .17 .22 .67 .19 .15 (.41)
C. Morality-Conscience .35 .16 .52 .31 .17 .56 .41 .30 (.58)
Example How to Search the Evidences for a
Supposed Intelligence Test?


State the theory hypotheses of test:
1. Intelligence grows with the age growing
2. IQ is relatively stable
3. Intelligence is substantially related to school achievement
4.Intelligence is affected by inheritance
Administer the test to population and analyze the data.
Judge: whether the test scores increase with the ages
increasing; whether IQ and school achievements is correlated;
IQs keep stably cross a time interval; whether the correlation
between MZ is higher than the correlation between DZ.
§3 Criterion-Related Validity
 Concepts
1.interpretation of Criterion-related
Validity
It is the degree on which the test scores can be
related to a criterion.
It indicate the effectiveness of a test in
predicting an individual performance in specified
activities.
Two Types
Predictive Validity refers to the degree to which
test scores predict criterion measurement that will
be made at some point in the future.
Concurrent Validity refers to the relationship
between test scores and criterion measurements
made at the time the test was given.
2.What is criterion?
The Criterion is some behavior that the test
scores are used to predicted.
For example, use the grade-point averages as the criterion of
a school admissions test .
The problems About Criterion
The reliability of criterion
The validity of criterion
Whether it can be measured
Criterion contamination
Usually Used Criterion
academic achievement ( for intelligence test)
performance in specialized training (for special aptitude test)
job performance
contrasted group (for personality, domain-referenced test)
psychiatric diagnosis ( for personality test )
ratings by schoolteachers, job supervisor
previously available tests
 Procedures
of Criterion-Related
Validation
Validity Coefficient
Discrimination Between Two Groups
1.
•
rXY 
Estimate Validity Coefficient
Pearson Product Moment Correlation
Coefficient
 xy
r XY 
 XY 
 X 2  ( X )
2
Ns X sY
rXY
XY n  XY


s X sY
1
 X  Y 
n
/n
 Y 2  ( Y ) / n
2
Exercise 1
Suppose that 10 male applicants were examined one job
interests test and the admitted as salesman by one company.
The job interest test scores (X) and the sale amount for the first
year (Y, unit is “ten thousands $”) of each applicant are listed
in the following table.
table 6.2 10 Applicants’ Test Scores and Sale Amount
1
X
Y
2
3
examinees
4 5 6 7
8
9
10
30 34 32 47 20 24 27 25 22 16
2.5 3.8 3 4 0.7 1 2.2 3.5 2.8 1.2
•
Biserial Correlation Coefficient
(for correlation between a continuous variable and a dichotomous
variable)
X p  X q pq
rb 

st
Y
(6.2)
p , is the percentage of examinees who get
q
point “1” on dichotomous variable
, is equal to 1-p
X p , is the mean of the test scores on the continuous variable
of the examinees who get point “1” on dichotomous variable
X q ,is the mean of the test scores on the continuous variable
of the examinees who get point “0” on dichotomous variable
st
, is the standard deviation of test scores
for all examinees on continusous variable
Y , is the Y oirdinate of the
standard normal curvve at
the z-score associated with
the p value.
Research Case
Use rb to estimate the validity of the fist application for WISC-R in
Shanghai.
Data concerned:
the number of first level middle school students is 66
the number of second level middle school students is 286
the mean of IQs of the first level students is 114
the mean of IQs of the second level students is 96
the standard deviation of all students’ IQs is 14.53
if p=.1875, then Y is .2685
p  66 352  0.1875
q  286 352  0.8125
X p  X q pq
rb 

st
Y
114 96 .1875 .8125

.
14.53
.2685
 .70
p=.1875,
then Y is .2685
X p  60.188
Exercise 2
The middle school students attended a math test. The
mean scores of students who have been instructed with
higher math program is 60.188, and their number is 382.
The mean of the students who have accepted normal
program is 47.429, and their number is 618. The standard
deviation for all students is 11.910. Please estimate the
validity coefficient of the math test.
2. Discrimination Between Two Groups
• Compare the means of two groups (t Test)
X1  X 2
t
s X1  X 2
s X1  X 2 
Degree of freedom
df  (n1  1)  (n2  1)
(n1  1) s12  (n2  1) s 22 1
1
(  )
n1  n2  2
n1 n2
•
Compute the overlap amount of the
two groups
Method 1
Compute the number of the examinees from one
group (usually contrasted )whose test scores is
higher than the mean of the other group;
Compute the rate of the number of those test
scores is higher than the mean for the other
group;
Then calculate the rate of the two numbers.
Method 2
Compute the overlap percentage of the score
distribution for each group
§4 Application of Validity
Coefficient
 Predict the Criterion Score
1.
Establish Regression Equation
Yˆ  bYX X  aYX
Yˆ
, is the predicted criterion
score for a examinee
X , the test score of a examinee
bYX
, is the regression coefficient, and
aYX
, is the intercept, and
bYX  rYX sY s X
aYX  Y  bYX X
Example
Figure 6.3 100 Examinees ’ Scores on Job Aptitude Test and Real
Performance Scores
X  5.35,Y  4,28,s X  1.80,sY  1.89,rXY  0.68
bYX  rYX sY s X  0.68 (1.89 / 1.80)  0.714
aYX  Y  bYX X  4.28  0.714 5.35  0.46
Yˆ  0.714X  0.46
If one applicant get 6 on the test, then we can use
the regression equation to predict his job
performance in the future.
Yˆ  0.714 6  0.46  4.744
Exercise 3
Suppose a group of students from high school
were examined a job interests test. Researcher
obtained these statistics:
X  50, s X  10
Y  2.4, sY  0.8
The validity coefficient is 0.6. If John got 54
points on the job interest test, then what his
criterion scores (job performance) would be?
2. Estimate Error
Standard Error of Estimate (
sest
)
The error of estimate shows the margin of error to
be expected in the individual’s predicted criterion
score, as a result of the imperfect validity of the rest.
sest  sY 1  r
2
XY
Yˆ1
X1
X
2
XY
r
Coefficient of Determination,
indicating the proportion of the
variance of criterion test scores which
is related to the variance of the
predictor test scores.
3. Establish the approximate interval for an
actual criterion Y
Yˆ  z p sest
Validity Coefficient and
Classification Decision
Y
Yc
Xc
X
Figure 6.4 Scatter Plots of the Predictor and Criterion Scores
Basic Concepts
Cut-off Scores
Valid Acceptance
Valid Rejection
False Acceptence
False Rejection
Four rates
Base Rate
the proportion of successful applicants selected
without the use of a test.
Selection Ratio
the proportion of applicants who must be
accepted
Hit Rate
the percentage of predictions that are correct.
Success Ratio
the proportion of selected applicants who succeed
Table 6.3 Taylor-Russell Table foe a Base Rate of .60