Transcript MWu
The Appropriate Use of NAPLAN Data
National Symposium, 23 July, 2010 Margaret Wu University of Melbourne [email protected]
NAPLAN Tests
Conducted once a year About 40 test questions per subject area Test scores are used to infer ◦ the achievement levels of students How reliable can NAPLAN test scores reflect ◦ ◦ Student achievement level?
School performance?
Margin of error in measuring student performance
David - a Grade 5 student in 2008. Reading score was 25 out of 40.
David’s reading test scores could vary between 20 and 30, out of 40.
◦ if similar tests are administered (e.g., 2009, 2010 tests ) One test collects only a small sample of performance.
Variation in scores is called
Measurement Error .
How big an error size is acceptable?
The answer is ◦ It depends.
◦ ◦ An example ◦ Effectiveness of a weight loss program Expect a loss of 0.5 kg after one week.
Measurement scale is accurate to 1kg.
◦ Not good enough for measuring individual change ◦ OK for a group change, if group size is ‘large’.
On the NAPLAN scale…
NAPLAN 2008 reading scores
800 700 600 2.5%tile mean 97.5%tile 500 400 300 200 grade 3 grade 5 grade 7 grade 9
On the NAPLAN scale…
NAPLAN 2008 reading scores
800 700 600 2.5%tile mean 97.5%tile 500 400 300 200 grade 3 grade 5 grade 7 grade 9
Measuring Growth
NAPLAN 2008 reading scores
800
Growth measure?
700 600 500 400 300 200 grade 3 grade 5 grade 7 grade 9
Expected growth is 50 points
2.5%tile mean 97.5%tile
Margin of error of growth measure ± 76 points
Class mean scores
Average score for a class ◦ Effect of measurement error reduces New source of error ◦
Sampling error
Cohort of students changes from year to year Variation in class mean score because of the sample of students in a class Class mean ± 20 points ◦ (1 year’s growth)
Teacher effect
A high performing teacher can raise student standards by one more year of growth as compared to a low performing 700 teacher.
excellent teacher
600 500
average teacher
2.5%tile mean 97.5%tile
50 points
400
poor teacher
300
Margin of error of teacher effect based on two testing
200
occasions: ± 20 points
MySchool Website
It is a league table ◦ It compares and ranks schools It is the worst kind of league table ◦ Because it is claimed that the red bars reflect “underperforming schools” ◦ Simple league tables do not have this claim.
10
Summary - 1
NAPLAN results are NOT measuring suitable for ◦
Student achievement level
beyond a rough “lower”, “average”, “higher” groups
Student progress
Teacher effect
School performance
Summary - 2
◦ ◦ NAPLAN results are for the systems, e.g.
◦ Compare girls and boys Compare rural and urban Trends, if equating design is improved NAPLAN results should NEVER published. be Parents/caregivers should not be encouraged to use the results to judge schools.
Finally…
Conflicting advice from different experts?
An easy way to check out: Ask proponents of MySchool website to publicly name one underperforming school.
References
Wu, M.L. (2010). Measurement, sampling and equating errors in large-scale assessments.
Educational Measurement: Issues and Practice, (In press: Volume 29 Number 4).
Nye, B., Konstantopoulos, S., & Hedges, L. (2004). How Large Are Teacher Effects?
Educational Evaluation and Policy Analysis, Vol. 26,
No. 3 (Autumn, 2004), pp. 237-257 .
14
Leigh, A. (2009). Estimating teacher effectiveness from two-year changes in students’ test scores. Economics of
Education Review.
Byrne, Coventry, Olson, Wadsworth, Samuelsson, Petrill, Willcutt and Corley. (2009). Teacher Effects in Early Literacy Development: Evidence From a Study of Twins. Journal of Educational Psychology, 2009.
15