48x36 Poster Template - Carnegie Mellon University

Download Report

Transcript 48x36 Poster Template - Carnegie Mellon University

Comparing Statistical Analyses of a Middle School Fractions Exam
Elizabeth Ayers, April Galyardt, Cassandra Studer, and Tracy Sweet
Advisor: Brian Junker
Carnegie Mellon University Statistics Department
Methods
Introduction
The CMU statistics education group was approached by
Martina Rau, a Human and Computer Interaction student, to
aid in the analysis of her pilot fractions exam. Martina was
interested in understanding how students use different
representations of fractions when solving problems. She
assembled two versions of a 23 item exam, differing only by
the numbers used, that she gave to a group of middle school
students. We were asked to verify her factor analysis (FA) of
this exam and also to perform an item response theory (IRT)
analysis in order to validate the individual items.
Data
Students
FA and IRT are both assessment models that relate student
and item characteristics to item responses.
• NB = 117 students take exam Form B
• The 222 students did all items.
J = 23 Items
• Labeled with a number (1-4) and a letter (A-F)
reflecting Marina’s design criteria
• procedural vs. conceptual knowledge
• reproduction vs. transfer of knowledge
Responses
Martina’s Coding, Xij
•
•
with different items having different
numbers of partial credit scores possible
Our Coding, Yij
•
•
Items 3 A-E
FA models each response as
, where
• θi is student i’s latent ability to answer fraction items
• λj = cov(Xij, θi) is item j’s ability to discriminate between
students, called the factor loading
• Items 1 A ( a reproduction item) & D (a transfer item) are
difficult and have low discrimination
We use the 2 parameter logistic (2PL) model where the
probability that student i correctly answers item j is modeled
, where
• θi is student i’s latent ability to answer fraction items
• βj is item j’s difficulty
• αj is item j’s ability to discriminate between students
It may be unclear whether one or both
pizzas should be considered “whole”
Recommendation: Reword item
It may be difficult to draw “sevenths”. Alternatively, if students split
boxes as a whole instead of individually, the item may be
challenging despite more common fractions.
Recommendation: Eliminate prompt to draw fraction or use
different numbers.
• Versions of the exam can be validated
This was the only item that showed significant differences
between Forms A & B. Form B (left) was harder possibly
because the answer required students to visually depict 9/20.
Recommendation: Use different numbers
• Estimates of item difficulty give additional information
• Plotting Pij vs. θi (Item Characteristic Curves)
allows a visual comparison of items
• Steep (flat) slopes correspond to highly (low)
discriminating items
• Curves that are shifted to the left (right) correspond
to easier (harder) items
•
αj is comparable to λj
Results: Comparing Forms A & B
Results: Comparing FA to IRT
• Item 2 A (a reproduction item) is moderately difficult but
has low discrimination
• Item 2 D (a transfer item) is both difficult and has low
discrimination
Item
Description
Conceptual
Reproduction
Identification
Conceptual
Transfer
Identification
Conceptual
Reproduction
Comparison
Conceptual
Transfer
Comparison
Procedural
Reproduction
Conversion
Procedural
Transfer
Conversion
Procedural
Reproduction
Addition
Label
1A
1B
1C
1D
1E
1F
2A
2B
2C
2D
2E
2F
3A
3B
FA
IRT
Factor Loading Discrimination Difficulty
0.444
2.117
1.128
-2.124
0.51
1.197
-0.738
0.554
4.487
0.49
0.956
-0.878
0.610
-0.843
0.493
1.096
0.60
1.817
-0.157
0.75
1.564
0.745
0.478
3.513
1.112
2.034
0.60
1.519
1.285
0.67
1.994
-0.589
1.134
-1.509
• 13 items remained in
the factor analysis
after removing
items with low factor
loadings
• Factor loadings and
discrimination
parameters are
4C
0.938
0.167
Procedural
highly correlated
4D
0.595
-0.617
Transfer
4E
0.74
2.269
-0.440
Add & Subtract
4F
0.89
2.362
-0.185
(r = 0.916)
• 3 highest factor loadings correspond to 3 highest
discrimination parameters
(Items 4 A, B & F)
•
• We chose 0.9 as the cutoff for correct responses
because 0.1 was deducted only when students didn’t
reduce, draw a picture, give units, or perform other
small tasks that were not explicitly stated in the item
instructions.
Students may try to solve the item visually
since these fractions look similar.
Recommendation: Eliminate picture or use
different numbers.
3C
3D
3E
4A
4B
0.48
0.54
0.55
0.90
0.91
1.010
1.241
1.378
2.365
2.324
1.196
-0.364
1.125
0.375
0.313
• 10 lowest discrimination parameters correspond to the
10 items that were removed
(Items 1 A, B, D, F; 2 A, D, E; 3 B; 4 C, D)
• Item 4 D, showed significantly different difficulty estimates
between the two forms. (Refer to Item Curves)
• No other items showed significantly different estimates for
either parameter and so we treated the forms as equivalent
www.PosterPresentations.com
• Item 4 D (a transfer item) is fairly easy but has low
discrimination
Items 2 A-F
IRT models only allow dichotomous
(0/1 - incorrect/ correct) responses
TEMPLATE DESIGN © 2008
• Items 3 A-E are relatively easy (3 C & E are more difficult)
and highly discriminating
Items 4 A-F
IRT: 2 Parameter Logistic (2PL) Model
Advantages of the 2PL Model
Items
•
Items 1 A-F
Results: Item Characteristic Curves Cont’d
Factor Analysis
• N = 222 middle school students
• NA = 105 students take exam Form A
Results: Item Characteristic Curves
The item text may be clearer if parts A) and B) had their
own lines. Additionally, students may benefit if the words
“grey” and “dark grey” were emphasized.
Recommendation: Reword item
Students may be unfamiliar with drill bits.
Recommendation: Change item scenario
Conclusion
Using IRT, we were able to verify Martina’s factor analysis as
well as give her additional information about the difficulty and
content of individual items.