Transcript MWu

The Appropriate Use of NAPLAN Data

National Symposium, 23 July, 2010 Margaret Wu University of Melbourne [email protected]

NAPLAN Tests

    Conducted once a year About 40 test questions per subject area Test scores are used to infer ◦ the achievement levels of students How reliable can NAPLAN test scores reflect ◦ ◦ Student achievement level?

School performance?

Margin of error in measuring student performance

     David - a Grade 5 student in 2008. Reading score was 25 out of 40.

David’s reading test scores could vary between 20 and 30, out of 40.

◦ if similar tests are administered (e.g., 2009, 2010 tests ) One test collects only a small sample of performance.

Variation in scores is called

Measurement Error .

How big an error size is acceptable?

  The answer is ◦ It depends.

◦ ◦ An example ◦ Effectiveness of a weight loss program Expect a loss of 0.5 kg after one week.

Measurement scale is accurate to 1kg.

◦ Not good enough for measuring individual change ◦ OK for a group change, if group size is ‘large’.

On the NAPLAN scale…

NAPLAN 2008 reading scores

800 700 600 2.5%tile mean 97.5%tile 500 400 300 200 grade 3 grade 5 grade 7 grade 9

On the NAPLAN scale…

NAPLAN 2008 reading scores

800 700 600 2.5%tile mean 97.5%tile 500 400 300 200 grade 3 grade 5 grade 7 grade 9

Measuring Growth

NAPLAN 2008 reading scores

800

Growth measure?

700 600 500 400 300 200 grade 3 grade 5 grade 7 grade 9

Expected growth is 50 points

2.5%tile mean 97.5%tile

Margin of error of growth measure ± 76 points

Class mean scores

     Average score for a class ◦ Effect of measurement error reduces New source of error ◦

Sampling error

Cohort of students changes from year to year Variation in class mean score because of the sample of students in a class Class mean ± 20 points ◦ (1 year’s growth)

Teacher effect

 A high performing teacher can raise student standards by one more year of growth as compared to a low performing 700 teacher.

excellent teacher

600 500

average teacher

2.5%tile mean 97.5%tile

50 points

400

poor teacher

300

Margin of error of teacher effect based on two testing

200

occasions: ± 20 points

MySchool Website

  It is a league table ◦ It compares and ranks schools It is the worst kind of league table ◦ Because it is claimed that the red bars reflect “underperforming schools” ◦ Simple league tables do not have this claim.

10

Summary - 1

 NAPLAN results are NOT measuring suitable for     ◦

Student achievement level

 beyond a rough “lower”, “average”, “higher” groups

Student progress

Teacher effect

School performance

Summary - 2

   ◦ ◦ NAPLAN results are for the systems, e.g.

◦ Compare girls and boys Compare rural and urban Trends, if equating design is improved NAPLAN results should NEVER published. be Parents/caregivers should not be encouraged to use the results to judge schools.

Finally…

   Conflicting advice from different experts?

An easy way to check out: Ask proponents of MySchool website to publicly name one underperforming school.

References

 Wu, M.L. (2010). Measurement, sampling and equating errors in large-scale assessments.

Educational Measurement: Issues and Practice, (In press: Volume 29 Number 4).

 Nye, B., Konstantopoulos, S., & Hedges, L. (2004). How Large Are Teacher Effects?

Educational Evaluation and Policy Analysis, Vol. 26,

No. 3 (Autumn, 2004), pp. 237-257 .

14

 Leigh, A. (2009). Estimating teacher effectiveness from two-year changes in students’ test scores. Economics of

Education Review.

 Byrne, Coventry, Olson, Wadsworth, Samuelsson, Petrill, Willcutt and Corley. (2009). Teacher Effects in Early Literacy Development: Evidence From a Study of Twins. Journal of Educational Psychology, 2009.

15