Why education will never be a research

Transcript Why education will never be a research

Why teaching will never be a research-based profession and why that’s a Good Thing

Dylan Wiliam (@dylanwiliam)

www.dylanwiliam.net

Outline

    What does it mean for a practice to be “research based”?

Why educational research falls short What educational research should do, and how it should do it The role of teachers in educational research

What does it mean to be research-based?

 In a ‘research-based’ profession:  Professionals would, for the majority of decisions they need to take, be able to find and access credible research studies that provided evidence that particular courses of action that would, implemented as directed, be substantially more likely to lead to better outcomes than others.

Important caveats about research findings

  Educational research can only tell us what was, not what might be.

Moreover, in education, “What works?” is rarely the right question, because    everything works somewhere, and nothing works everywhere, which is why in education, the right question is, “Under what conditions does this work?”

Causality: a tricky issue

 Traditionally, causality has been defined in terms of a counter-factual argument  “We may define a cause to be an object followed by another, and where all the objects, similar to the first, are followed by objects similar to the second. Or, in other words, where, if the first object had not been, the second never had existed.” (Hume, 1748 Section VII)  “If c and e are two actual events such that e would not have occurred without c, then c is a cause of e.” (Lewis 1973 p. 563)

Research methods 101: causality

 Does c cause e?

   Given c, e happened (factual)  Problem: post hoc ergo propter hoc If c had not happened, e would not have happened (counterfactual)  Problem: c did happen So we need to create a parallel world where c did not happen    Same group different time (baseline measurement)  Need to assume stability over time Different group same time (control group)  Need to assume groups are equivalent Randomized controlled trial

Problems with RCTs in education

    Clustering Power Implementation Context

Meta-analysis in education: “I think you’ll find it’s a bit more complicated than that” (Goldacre, 2008)

Educational Endowment Foundation toolkit

9 Intervention

Feedback Metacognition and self-regulation Peer tutoring Early years intervention One to one tuition Homework (secondary) Collaborative learning Phonics Small group tuition Behaviour interventions Digital technology Social and emotional learning

Cost

££ ££ ££ £££££ ££££ £ £ £ £££ £££ ££££ £

Quality of evidence Extra months of learning

+8 +8 +6 +6 +5 +5 +5 +4 +4 +4 +4 +4

Educational Endowment Foundation toolkit

10 Intervention

Parental involvement Reducing class size Summer schools Sports participation Arts participation Extended school time Individualized instruction After school programmes Learning styles Mentoring Homework (primary)

Cost

£££ £££££ £££ £££ ££ £££ £ ££££ £ £££ £

Quality of evidence Extra months of learning

+3 +3 +3 +2 +2 +2 +2 +2 +2 +1 +1

Educational Endowment Foundation toolkit

11 Intervention

Teaching assistants Performance pay Aspiration interventions Block scheduling School uniform Physical environment Ability grouping

Cost

££££ ££ £££ £ £ ££ £

Quality of evidence Extra months of learning

0 0 0 0 0 0 -1

An illustrative example: feedback

    Kluger and DeNisi (1996) review of 3000 research reports Excluding those:      without adequate controls with poor design with fewer than 10 participants where performance was not measured without details of effect sizes left 131 reports, 607 effect sizes, involving 12652 individuals On average, feedback increases achievement   Effect sizes highly variable 38% (50 out of 131) of effect sizes were negative

Understanding meta-analysis

  A technique for aggregating results from different studies by converting empirical results to a common measure (usually effect size) Standardized effect size is defined as:  Problems with meta-analysis  The “file drawer” problem    Variation in population variability Selection of studies Sensitivity of outcome measures

The “file drawer” problem

The importance of statistical power

    The statistical power of an experiment is the probability that the experiment will yield an effect that is large enough to be statistically significant.

In single-level designs, power depends on  significance level set  magnitude of effect  size of experiment The power of most social studies experiments is low  Psychology: 0.4 (Sedlmeier & Gigerenzer, 1989)  Neuroscience: 0.2 (Button et al., 2013)  Education: 0.4

Only lucky experiments get published…

Variation in variability

Annual growth in achievement, by age

0,6 0,4 0,2 0,0 1,6 1,4 1,2 1,0 0,8 5 6 7 A 50% increase in the rate of learning for six year-olds is equivalent to an effect size of 0.76

8 Bloom, Hill, Black, and Lipsey (2008) A 50% increase in the rate of learning for 15 year-olds is equivalent to an effect size of 0.1

9 10

Age

11 12 13 14 15 16

Variation in variability

  Studies with younger children will produce larger effect size estimates Studies with restricted populations (e.g., children with special needs, gifted students) will produce larger effect size estimates

Selection of studies

Feedback in STEM subjects

  Review of 9000 papers on feedback in mathematics, science and technology Only 238 papers retained  Background papers    Descriptive papers Qualitative papers Quantitative papers    Mathematics Science Technology 24 79 24 111 60 35 16 Ruiz-Primo and Li (2013)

Classification of feedback studies

Who provided the feedback (teacher, peer, self, or technology-based)?

How was the feedback delivered (individual, small group, or whole class)?

What was the role of the student in the feedback (provider or receiver)?

What was the focus of the feedback (e.g., product, process, self regulation for cognitive feedback; or goal orientation, self-efficacy for affective feedback) On what was the feedback based (student product or process)?

What type of feedback was provided (evaluative, descriptive, or holistic)?

How was feedback provided or presented (written, video, oral, or video)?

What was the referent of feedback (self, others, or mastery criteria)?

How, and how often was feedback given in the study (one time or multiple times; with or without pedagogical use)?

Main findings

22 Characteristic of studies included

Feedback treatment is a single event lasting minutes Reliability of outcome measures Validity of outcome measures Dealing only or mainly with declarative knowledge Schematic knowledge (e.g., knowing why) Multiple feedback events in a week

Maths

85% 39% 24% 12% 9% 14%

Science

72% 63% 3% 36% 0% 17%

Sensitivity to instruction

Sensitivity of outcome measures

 Distance of assessment from the curriculum      Immediate  e.g., science journals, notebooks, and classroom tests Close  e.g., where an immediate assessment asked about number of pendulum swings in 15 seconds, a close assessment asks about the time taken for 10 swings Proximal  e.g., if an immediate assessment asked students to construct boats out of paper cups, the proximal assessment would ask for an explanation of what makes bottles float Distal  e.g., where the assessment task is sampled from a different domain and where the problem, procedures, materials and measurement methods differed from those used in the original activities Remote  standardized national achievement tests. Ruiz-Primo, Shavelson, Hamilton, and Klein (2002)

Impact of sensitivity to instruction

Effect size Close Proximal

Why research hasn’t changed teaching

  Aristotle’s main intellectual virtues    Episteme: knowledge of universal truths Techne: ability to make things Phronesis: practical wisdom Flyvbjerg (2001)  “By definition, phronetic researchers focus on values; for example by taking their point of departure in the classic value-rational questions: Where are we going? Is it desirable? What should be done?” (p130)

Maxims and rules

“ Maxims are rules, the correct application of which is part of the art which they govern. The true maxims of golfing or of poetry increase our insight into golfing or poetry and may even give valuable guidance to golfers and poets; but these maxims would instantly condemn themselves to absurdity if they tried to replace the golfer's skill or the poet's art. Maxims cannot be understood, still less applied by anyone not already possessing a good practical knowledge of the art. They derive their interest from our appreciation of the art and cannot themselves either replace or establish that appreciation.” Polanyi (1958 pp. 31-32)

The knowledge-creating spiral

Tacit knowledge Explicit knowledge Tacit knowledge

Socialization

sympathised knowledge Dialogue

Externalization

conceptual knowledge Sharing experience Networking

from

Explicit knowledge

Internalization

operational knowledge Learning by doing

Combination

systemic knowledge Nonaka and Takeuchi (1995)

Inquiry systems

System Leibnizian Lockean Kantian Hegelian Singerian Evidence Rationality Observation Representation Dialectic Values, ethics, practical consequences Churchman (1971)

Inquiry systems

The Lockean inquirer displays the

‘

fundamental

’

data that all experts agree are accurate and relevant, and then builds a consistent story out of these. The Kantian inquirer displays the same story from different points of view, emphasising thereby that what is put into the story by the internal mode of representation is not given from the outside. But the Hegelian inquirer, using the same data, tells two stories, one supporting the most prominent policy on one side, the other supporting the most promising story on the other side (Churchman, 1971 p. 177).

Singerian inquiry systems

The

‘

is taken to be

’

is a self-imposed imperative of the community. Taken in the context of the whole Singerian theory of inquiry and progress, the imperative has the status of an ethical judgment. That is, the community judges that to accept its instruction is to bring about a suitable tactic or strategy [...]. The acceptance may lead to social actions outside of inquiry, or to new kinds of inquiry, or whatever. Part of the community

’

s judgement is concerned with the appropriateness of these actions from an ethical point of view. Hence the linguistic puzzle which bothered some empiricists —how the inquiring system can pass linguistically from

“

”

statements to

“

ought

”

statements — is no puzzle at all in the Singerian inquirer: the inquiring system speaks exclusively in the

“

ought,

”

the

“

”

being only a convenient façon de parler when one wants to block out the uncertainty in the discourse. (Churchman, 1971: 202).

Educational research…

  …can be characterised as a never-ending process of assembling evidence that:  particular inferences are warranted on the basis of the available evidence;   such inferences are more warranted than plausible rival inferences; the consequences of such inferences are ethically defensible.

The basis for warrants, the other plausible interpretations, and the ethical bases for defending the consequences, are themselves constantly open to scrutiny and question.

A way forward: in Pasteur’s quadrant

Considerations of use

No Yes Quest for fundamental understanding?

Yes Pure basic research (Bohr) No Applied research unmotivated by applications (Brahe) Use-inspired basic research (Pasteur) Pure applied research (Edison) Stokes (1997)

The roles of teachers and researchers

  The role of teachers  All teachers should be seeking to improve their practice through a process of ‘disciplined inquiry’     Some may wish to share their work with others Some may wish to write their work up for publication Some may wish to pursue research degrees Some may even wish to undertake research The role of education researchers  Abandoning “physics envy”  Working with teachers to make their findings applicable in contexts other than the context of data collection

References

Bloom, H. S., Hill, C. J., Black, A. R., & Lipsey, M. W. (2008). Performance trajectories and performance gaps as achievement effect-size benchmarks for educational interventions. Journal of Research on Educational Effectiveness, 1(4), 289–328.

Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., & Munafo, M. R. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, advance online publication. doi: 10.1038/nrn3475 Churchman, C. W. (1971). The design of inquiring systems: basic concepts of systems and organization. New York, NY: Basic Books.

Flyvbjerg, B. (2001). Making social science matter: why social inquiry fails and how it can succeed again. Cambridge, UK: Cambridge University Press.

Goldacre, B. (2008). Bad science. London, UK: Fourth Estate.

Hume, D. (1748). An enquiry concerning human understanding. London, UK: Andrew Millar.

Kluger, A. N., & DeNisi, A. (1996). The effects of feedback interventions on performance: a historical review, a meta-analysis, and a preliminary feedback intervention theory. Psychological Bulletin, 119(2), 254-284.

Lewis, D. (1973). Causation. Journal of Philosophy, 70(17), 556-567.

Nonaka, I., & Takeuchi, H. (1995). The knowledge-creating

company: how Japanese companies create the dynamics

of innovation. New York, NY: Oxford University Press.

Polanyi, M. (1958). Personal knowledge. London, UK: Routledge & Kegan Paul.

Ruiz-Primo, M. A., & Li, M. (2013). Examining formative feedback in the classroom context: New research perspectives. In J. H. McMillan (Ed.), Sage handbook of research on classroom assessment (2 ed., pp. 215-232). Thousand Oaks, CA: Sage.

Ruiz-Primo, M. A., Shavelson, R. J., Hamilton, L., & Klein, S. (2002). On the evaluation of systemic science education reform: searching for instructional sensitivity. Journal of Research in Science Teaching, 39(5), 369-393.

Sedlmeier, P., & Gigerenzer, G. (1989). Do studies of statistical power have an effect on the power of studies? Psychological Bulletin, 105(2), 309-316. doi: 10.1037/0033-2909.105.2.309

Stokes, D. E. (1997). Pasteur's quadrant: basic science and technological innovation. Washington, DC: Brookings Institution Press.

Why education will never be a research

Transcript Why education will never be a research

Why teaching will never be a research-based profession and why that’s a Good Thing

Dylan Wiliam (@dylanwiliam)

Outline

What does it mean to be research-based?

Important caveats about research findings

Causality: a tricky issue

Research methods 101: causality

Problems with RCTs in education

Meta-analysis in education: “I think you’ll find it’s a bit more complicated than that” (Goldacre, 2008)

Educational Endowment Foundation toolkit

Educational Endowment Foundation toolkit

Educational Endowment Foundation toolkit

An illustrative example: feedback

Understanding meta-analysis

The “file drawer” problem

The importance of statistical power

Variation in variability

Annual growth in achievement, by age

Variation in variability

Selection of studies

Feedback in STEM subjects

Classification of feedback studies

Main findings

Sensitivity to instruction

Sensitivity of outcome measures

Impact of sensitivity to instruction

Why research hasn’t changed teaching

Maxims and rules

The knowledge-creating spiral

Inquiry systems

Inquiry systems

Singerian inquiry systems

Educational research…

A way forward: in Pasteur’s quadrant

The roles of teachers and researchers

References

Directory