Randomization workshop

Download Report

Transcript Randomization workshop

Randomization
workshop
eCOTS
May 22, 2014
Presenters: Nathan Tintle
and Beth Chance
Introductions
 Presenters


Nathan Tintle, Dordt College
Beth Chance, Cal Poly
 Participants

Telling you a little about you!
Participant profile
Workshop participants
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
Institution
Participant profile
 Over

85% are here because of
AP-equivalent introductory statistics
 Remainder


Calculus based introductory statistics
Other statistics course
Participant profile
0.6
Previous experience with randomization in the
classroom
Workshop participants
0.5
0.4
0.3
0.2
0.1
0
Never
Once or twice
Handful of times
Pilot tested existing
curricula
Participant profile
 Goals







What is ‘randomization-based’ inference?
Why would I want it in my course?
How does it look in my course?
Challenged with new ideas
Implementation tips and advice
Comparisons with other randomization texts
And a lot more!
Our primary goals are for you




Understand what a ‘randomization/
simulation’ based approach to statistical
inference is
Understand why it is an increasingly popular
approach to teaching introductory statistics
Have experienced two concrete examples of
how it works in the classroom
Have a sense of one of the major curriculum
options available for teaching with this
approach
Overview

Hour #1




Note: We will post slides
after workshop, plus
recording available via
eCOTS
First 15-20 minutes: Welcome/introductions/overview/goals
Next 15-20 minutes: What is a randomization-based curriculum? Why?*
Next 15-20 minutes: Activity: Meet Doris and Buzz *
Hour #2 (after short 5 minute break)




First 10-15 minutes: The ISI curriculum: What, how, and why*
Next 15-20 minutes: Activity: Is yawning contagious?*
Final 10-15 minutes: Cautions, implementation, assessment*
Final 10-15 minutes: Next steps, class testing, ongoing discussion*
*Ask questions both during and immediately following each presentation
To ask a question, type the question into the “Questions” pane of the
GoToWebinar control panel. Do not use the ‘raise hands’ feature.
During all sessions we STRONGLY encourage you to ask questions!
What do we
mean by a
‘randomizationbased’
curriculum and
why consider
it?
Overview

Why look at the content of Stat 101?


George Cobb’s challenge about how the
content might change


Stat 101= general algebra based intro stats
course (equivalent to AP Statistics)
Randomization/simulation as an overarching
approach to statistical inference
Some general trends, themes in
randomization curricula to date
Brief history of stat ed

Consensus curriculum by late 1990s, but nexus
in early 1980s




Descriptive Statistics
Probability/Design/Sampling Distributions
Inference (testing and intervals)
GAISE College Report (2005)

Six pedagogical suggestions for Stat 101:
Conceptual understanding, Active learning,
Real data, Statistical literacy and thinking, Use
technology, and Use assessments for learning
Brief history of stat ed

No real pressure to change content

Major changes

Computer changed from an institutionally owned
behemoth to the individually owned desktop or laptop.

As computers became ubiquitous, so did data
collection, and with it the need for data analysis.

Statistical practice changed as well. More computer
intensive methods, large datasets, multivariable
methods, etc.

Recognition of the utility of simulation to enhance
student understanding of random processes
Brief history of stat ed
 Other
changes directly impacting Stat 101

Stats increasing in K-12 curriculum (NCTM,
Common Core, Advanced Placement)
(Franklin eCOTS plenary talk)

Enrollments have skyrocketed (High school,
two and four year colleges)

Stat ed research has given us more
knowledge of how and what students learn
in Stat 101
Potential shortcomings

Overlap with K-12 treatment of descriptive statistics is inefficient

Eventually will also have exposure to informal inference

Although computer-intensive methods have become a central part
of statistical practice, they are largely or wholly absent from the
typical first course.

Assessment methods developed over the last decades show that
student understanding of the logic of inference is typically limited at
best (Cobb 2007, TISE http://escholarship.org/uc/item/6hb3k0nz)

The traditional first course does not devote sufficient time or space
to the connections among the method of data production, the
method used to analyze the data, and the scope of inference
justified by the analysis. For randomization-based methods, these
connections are simple and direct.
Intro Stat as a Cathedral of Tweaks
(A George Cobb analogy)
Boswell famously describe Samuel Johnson as a
“cathedral of tics.” Thesis: The usual normal
distribution-worshipping intro course is a cathedral of
tweaks.

The orthodox doctrine is simple




Use the CLT to justify the normal
Use the normal to compute tail areas
Reject if |observed – expected0| > 2SEs
Interval = Est +/- M95 with M95 = 2SEs
The Cathedral of Tweaks
(a) z vs t: If we know the population SD we use z; when
we estimate the SD we use t
(b) z vs. t: (a) holds except for proportions; then we use z,
not t, even when we estimate the SD.
(c) Estimating the SD. For proportions, we estimate the SD
for intervals, but use the null value for tests.
(d) n vs. (n-1) vs. (n-2). The SE is SD/(root n): We divide by
n because we have n observations. But for estimating the
SD we divide by (n-1), even though there are n deviations
… except that when we get to regression we use (n-2).
Still More Tweaks
 If
your data set is not normal you may need to
transform
 If
you work with small samples there are guidelines
for when you can use methods based on the
normal,
e.g., n > 30, or np > 5 and n(1-p) > 5
The consequence
 Few
students ever leave our course seeing this
The consequence
 The
better students may get a fuzzy impression
The consequence
 All
too many noses stay too close to the canvas,
and see disconnected details
A potential solution?

‘Randomization’ = simulation, bootstrapping
and/or permutation tests

Use of computationally intensive methods to:

Estimate/approximate the null distribution for
significance tests

Estimate/approximate the margin of error for
confidence intervals
A potential solution: Simulation
 Flip
coins or spin spinners to simulate the
binomial distribution instead of starting
with


Binomial distribution theory
Normal approximation to the binomial
Simulation example

What are the chances a basketball player is
shooting free throws better in the playoffs (16/20 in
game 1) than they typically do, if they “typically”
make 50% of their free throws?


Flip 20 coins to simulate performance of player if no
change in free throw percentage

Repeat to assess the likelihood of such a player
making 16 FTs
How often would we get such a statistic as in the
study by chance alone? That is, if still 50%?
A potential solution: Bootstrap
 Bootstrap

Use 1000s of resamples (with replacement)
of the observed data to generate an
approximate sampling distribution which
can be used to estimate the margin of error
Bootstrap example

Example: Gather hours slept last night for 20
students
5,5,6,6,6,7,7,7,7,8,8,8,6.5,6.5,7,7.5,7.5,7.5,7,4
Mean=6.7 hours, SD=1.1 hours
Keep going!
Find: 95% CI for population average sleep hours
Bootstrap: Model the data gathering process:
1000 random samples of 20 with replacement,
compute sample mean each time.
Bootstrap example
(re)Sample means
Bootstrap example
 Find
middle 95% of resample means = 95%
CI for true population mean
 6.2
to 7.1 hours
 How
much might these sample means
vary from sample to sample by chance
alone?
A potential solution:
Permutation tests
 Permutation


testing
Compare 2 or more groups
Null: No treatment effect; distribution of
response variable is the same in all groups
 Ex.
Is new treatment better than placebo?
A potential solution:
Permutation tests

Write the values of the response variable (cat or
quant) on slips of paper. Shuffle slips and rerandomize to the two or more groups

Recompute value of the statistic and get
empirical null distribution, compare to actual
statistic

How often would we get such a statistic as in
the study by chance alone if null is true?
A potential solution
 These
methods may offer a quicker, less
abstract bridge to the logic of inference
while also emphasizing the scope of
inference (random sampling, random
assignment)
 May
scaffold the transition to ‘traditional’
(asymptotic; theory-based methods)
better than traditional theory/probability
theory, etc.
General trends

Momentum behind randomization-approach to
inference in last 8-10 years




Cobb 2005 talk (USCOTS)
Cobb 2007 paper (TISE)
2011 USCOTS: The Next Big Thing
New and coming soon curricula




Lock5 (theory and randomization, more traditional
sequence of topics)
Tintle et al. (theory and randomization, four pillars
of inference and then chapters based on type of
data)
CATALST (emphasis on modelling)
Others
General trends
 Many
sessions at conferences talking
about approach, benefits,
questions/concerns
 Assessment:
Two papers (Tintle et al. 2011,
Tintle et al. 2012); Better on some things,
do no harm on others; more coming
Q+A
Doris and
Buzz
Simulation for a single
proportion
Introduction
 First
‘main’ example (after brief
Preliminaries)
 Story
 Questions for students




Can we prove dolphins can communicate
abstract concepts?
What other explanations are there?
How explain/justify to someone else?
Chance model, simulation
Three S Strategy
 Statistic:
Compute the statistic from the observed
sample data.
 Simulate:
Identify a model that represents a “by
chance” explanation. Repeatedly simulate values
of the statistic that could have happened when the
chance model is true.
 Strength
of evidence: Consider whether the value of
the observed statistic is unlikely to occur when the
chance model is true.
Dolphin Communication
Statistic
In one set of trials, Buzz chose the correct
button 15 out of 16 times.
Based on
these results, do you think Buzz
knew which button to push or is he just
guessing?
Dolphin Communication
Simulate
coin flip
heads
tails
chance of
heads
one set of 16
coin flips
=
=
=
=
=
guess by Buzz
correct guess
wrong guess
probability of correct button
when Buzz is just guessing
one set of 16 attempts by Buzz
Dolphin Communication
Simulate
What might
be on the front
board in class
of 25 students
Larger class fine!
Dolphin Communication
Simulate
Still not convinced 15 is unlikely? Go to applet to get a
‘very large class’ flipping coins with you
http://math.hope.edu/isi Click on Applets, then click one
proportion
Applets are javascript and so work on all platforms including
iPhones, iPads, etc.
Moving past Doris and Buzz
 Null/Alt
hypotheses, non-50/50 null (1.2)
 Parameter (1.1)
 Strength of evidence


P-value (1.2)
Standardized statistic, Z (1.3)
 Two-sided
tests, what impacts strength of
evidence (1.4)
 Theory-based approaches (overlay
normal) (1.5)
Q+A
End of hour #1
 Short
break-back in 1 minute!!
 We’ll
start at 38 minutes after the hour