Stricom 1.21.9 C - National Center for Research on

Download Report

Transcript Stricom 1.21.9 C - National Center for Research on

C RE SS T/U C LA
Psychometric Issues in the Assessment of English
Language Learners
Presented at the:
CRESST 2002 Annual Conference
Research Goes to School
Assessment, Accountability, and Improvement
Jamal Abedi
UCLA Graduate School of Education
National Center for Research on Evaluation, Standards, and
Student Testing (CRESST)
September 10-11, 2002
C RE SS T/U C LA
Measurement /Psychometric Theory
1
Do the same underlying measurement theories used for the mainstream
assessment apply equally to English language learners?
Yes  No 
Do the psychometric textbooks have enough coverage of issues
concerning measurement of ELLs?
Yes 
No 
Are there specific measurement issues that are unique to the assessment
of ELLs?
Yes  No 
Can the low performance of ELLs in content-based areas be explained
mainly by the lack of their content knowledge?
Yes  No 
Are there any extraneous variables that could specifically impact the
performance of ELLs?
Yes  No 
C RE SS T/U C LA
2
Psychometric Methods
Development and application of modern mental measures
by Steven J. Osterlind, University of Missouri
Chapter 3. Classical Measurement Theory
Supposed a sample population of examinees is comprised of individuals
from two different cultures, in one culture dogs are considered as close
family members and in other culture, dogs are considered non-family
and meant for work.
Now suppose that some of the reading test questions incidentally
describe the treatment of dogs. Remember, this is a test of one’s reading
ability, not a test about dogs.
Abedi, J., & Lord, C. (2001). The Language Factor in Mathematics Tests. Applied Measurement in
Education, 14(3), 219-234.
C RE SS T/U C LA
3
Classical Test Theory: Reliability
s 2X = s 2T + s 2E
X:
T:
E:
Observed Score
True Score
Error Score
rXX’= s2T /s2X
rXX’= 1- s2E /s2X
Textbook examples of possible sources that contribute to the
measurement error:
Rater
Occasion
Item
Test Form
C RE SS T/U C LA
Assumptions of Classical True-Score Theory
1.
X=T+E
2.
e(x) = T
3.
rET = 0
4.
rE E
2
=0
5.
rE T
1 2
=0
1
4
C RE SS T/U C LA
5
Generalizability Theory:
Partitioning Error Variance into Its Components
s2(Xpro) = s2p + s2r + s2o + s2pr + s2po + s2ro + s2pro,e
p:
r:
o:
Person
Rater
Occasion
There may be other sources such as: test forms, test
instructions, item difficulty, and test-taking skills.
Are there any sources of measurement error that may
specifically influence ELL performance?
C RE SS T/U C LA
6
Validity of Academic Achievement Measures
We will focus on construct and content validity approaches:
A test’s construct validity is the degree to which it measures the theoretical
construct or trait that it was designed to measure (Allen & Yen, 1979, p.
108).
A test’s content validity involves the careful definition of the domain of
behaviors to be measured by a test and the logical design of items to
cover all the important areas of this domain (Allen & Yen, 1979, p. 96).
Examples:
A content-based achievement test has construct validity if it
measures the content that it is supposed to measure.
A content-based achievement test has content validity if the test
content is representative of the content being measured.
C RE SS T/U C LA
7
Study #2
Interview study (Abedi, Lord, & Plummer, 1997).
37 students asked to express their preference between the original NAEP
items and the linguistically modified version of these same items. Math
test items were modified to reduce the level of linguistic complexity.
Finding
Over
80% interviewed preferred the linguistically modified items over the
original version.
C RE SS T/U C LA
8
Study #3
Impact of linguistic factors on students’ performance (Abedi, Lord, & Plummer,
1997).
Two studies: test performance and speed.
SAMPLE: 1,031 grade 8 ELL and non-ELL students.
41 classes from 21 southern California schools.
Finding
ELL
students who received a linguistically modified version of the math
test items performed significantly better than those receiving the original
test items.
C RE SS T/U C LA
Study #4
The impact of different types of accommodations on students with limited
English proficiency (Abedi, Lord, & Hofstetter, 1997).
SAMPLE: 1,394 grade 8 students. 56 classes from 27 southern California
schools.
Finding
Spanish translation of NAEP math test.
Spanish-speakers taking the Spanish translation version performed
significantly lower than Spanish-speakers taking the English version.
We believe that this is due to the impact of language of instruction on
assessment.
Linguistic Modification
Contributed to improved performance on 49% of the items.
Extra Time
Helped grade 8 ELL students on NAEP math tests.
Also aided non-ELL students. Limited potential as an assessment
accommodation.
9
C RE SS T/U C LA
10
Study #5
Impact of selected background variables on students’ NAEP math
performance.
(Abedi, Hofstetter, & Lord, 1998).
SAMPLE: 946 grade 8 ELL and non-ELL students. 38 classes from 19
southern California schools.
Finding
Four different accommodations used (linguistically modified, a glossary
only, extra time only, and a glossary plus extra time).
The glossary plus extra time was the most effective accommodation.
Glossary plus extra time accommodation
Non-ELLs showed a greater improvement (16%) than the ELLs (13%).
This is the opposite of what is expected and casts doubt on the validity of
this accommodation.
C RE SS T/U C LA
11
Study #8
Language accommodation for large-scale assessment in science
(Abedi, Courtney, & Leon, 2001).
SAMPLE: 1,856 grade 4 and 1,512 grade 8 ELL and non-ELL students.
132 classes from 40 school sites in four cities, three states.
Finding
Results
suggested: linguistic modification of test items improved
performance of ELLs in grade 8.
No change on the performance of non-ELLs with modified test.
The validity of assessment was not compromised by the provision of an
accommodation.
C RE SS T/U C LA
Study #9
12
Impact of students’ language background on content-based performance: analyses
of extant data (Abedi & Leon, 1999).
Analyses were performed on extant data, such as Stanford 9 and ITBS
SAMPLE: Over 900,000 students from four different sites nationwide.
Study #10
Examining ELL and non-ELL student performance differences and their relationship
to background factors (Abedi, Leon, & Mirocha, 2001).
Data were analyzed for the language impact on assessment and accommodations of
ELL students.
SAMPLE: Over 700,000 students from four different sites nationwide.
Finding
The
higher the level of language demand of the test items, the higher the
performance gap between ELL and non-ELL students.
Large performance gap between ELL and non-ELL students on reading, science
and math problem solving (about 15 NCE score points).
This performance gap was reduced to zero in math computation.
C RE SS T/U C LA
Normal Curve Equivalent Means and Standard Deviations for Students in 13
Grades 10 and 11, Site 3 School District
M
Reading
SD
Science
M
SD
Grade 10
SD only
LEP only
LEP & SD
Non-LEP & SD
All students
16.4
24.0
16.3
38.0
36.0
12.7
16.4
11.2
16.0
16.9
25.5
32.9
24.8
42.6
41.3
13.3
15.3
9.3
17.2
17.5
22.5
36.8
23.6
39.6
38.5
11.7
16.0
9.8
16.9
17.0
Grade 11
SD Only
LEP Only
LEP & SD
Non-LEP & SD
All students
14.9
22.5
15.5
38.4
36.2
13.2
16.1
12.7
18.3
19.0
21.5
28.4
26.1
39.6
38.2
12.3
14.4
20.1
18.8
18.9
24.3
45.5
25.1
45.2
44.0
13.2
18.2
13.0
21.1
21.2
M
Math
SD
Note. LEP = limited English proficient. SD = students with disabilities.
C RE SS T/U C LA
14
Disparity Index (DI) was an index of performance
differences between LEP and non-LEP.
Site 3 Disparity Index (DI)
Non-LEP/Non-SD Students Compared to LEP-Only Students
Grade
3
6
8
Reading
Disparity Index (DI)
Math
Math Total
Calculation
Math
Analytical
53.4
81.6
125.2
25.8
37.6
36.9
32.8
46.1
44.0
12.9
22.2
25.2
C RE SS T/U C LA
15
C RE SS T/U C LA
16
C RE SS T/U C LA
Generalizability Theory:
Language as an additional source of measurement error
s2(Xprl) = s2p + s2r + s2l + s2pr + s2pl + s2rl + s2prl,e
p:
r:
l:
Person
Rater
Language
Are there any sources of measurement error that may specifically
influence ELL performance?
17
C RE SS T/U C LA
18
Main effect language factors
s2l (Different level of English/Native language proficiency)

Interactions of language factors with other
factors

s2pl
(Different level of English/Native language proficiency)

s2rl
(Differential treatment of ELL students by raters with
different background)

s2prl,e (A combination of different level of language proficiency,
interaction of rater with language and subjects, and
unspecified sources of measurement error)
C RE SS T/U C LA
19
Issues and problems in classification of
students with limited English proficiency
C RE SS T/U C LA
20
Findings
The relationship between language proficiency test scores and LEP
classification.
Since LEP classification is based on students’ level of language proficiency and
because LAS is a measure of language proficiency, one would expect to find a
perfect correlation between LAS scores and LEP levels (LEP versus non-LEP).
The results of analyses indicated a weak relationship between language proficiency
test scores and language classification codes (LEP categories).
Correlation between LAS rating and LEP classification for Site 4
Correlation
G2
Pearson r
.223 .195 .187 .199
.224 .261 .252 .265 .304
.272
.176
Sig (2-tailed)
.000 .000 .000 .000
.000 .000 .000 .000 .000
.000
.000
N
587
G3
721
G4
G5
621 1002
G6
803
G7
938
G8
G9
G10
796 1102
G11 G12
945
782
836
C RE SS T/U C LA
Correlation coefficients between LEP classification code and ITBS
subscales for Site 1
Grade
Reading
Computation
Grade 3
Math Concept
& Estimation
Math Problem
Solving
21
Math
Pearson r
-.160
-.045
-.076
.028
Sig (2-tailed)
.000
.000
.000
.000
36,006
35,981
35,948
36,000
Pearson r
-.256
-.154
-.180
-.081
Sig (2-tailed)
.000
.000
.000
.000
28,272
28,273
28,250
Pearson r
-.257
-.168
-.206
-.099
Sig (2-tailed)
.000
.000
.000
.000
N
Grade 6
N
28,261
Grade 8