CAUSE Webinar: Introducing Math Majors to Statistics Allan Rossman and Beth Chance Cal Poly – San Luis Obispo April 8, 2008

Download Report

Transcript CAUSE Webinar: Introducing Math Majors to Statistics Allan Rossman and Beth Chance Cal Poly – San Luis Obispo April 8, 2008

CAUSE Webinar:
Introducing Math Majors to Statistics
Allan Rossman and Beth Chance
Cal Poly – San Luis Obispo
April 8, 2008
Outline





Goals
Guiding principles
Content of an example course
Assessment
Examples (four)
April 8, 2008
CAUSE Webinar
2
Goals

Redesign introductory statistics course for
mathematically inclined students in order to:


Provide balanced introduction to the practice
of statistics at appropriate mathematical level
Better alternative than “Stat 101” or “Math
Stat” sequence for math majors’ first statistics
course
April 8, 2008
CAUSE Webinar
3
Guiding principles (Overview)
1.
2.
3.
4.
5.
6.
7.
Put students in role of active investigator
Motivate with real studies, genuine data
Repeatedly experience entire statistical
process from data collection to conclusion
Emphasize connections among study design,
inference technique, scope of conclusions
Use variety of computational tools
Investigate mathematical underpinnings
Introduce probability “just in time”
April 8, 2008
CAUSE Webinar
4
Principle 1: Active investigator

Curricular materials consist of investigations
that lead students to discover statistical
concepts and methods



Students learn through constructing own
knowledge, developing own understanding
Need direction, guidance to do that
Students spend class time engaged with
these materials, working collaboratively, with
technology close at hand
April 8, 2008
CAUSE Webinar
5
Principle 2: Real studies, genuine data

Almost all investigations focus on a recent
scientific study, existing data set, or student
collected data



Statistics as a science
Frequent discussions of data collection issues and
cautions
Wide variety of contexts, research questions
April 8, 2008
CAUSE Webinar
6
Real studies, genuine data









Popcorn and lung cancer
Historical smoking studies
Night lights and myopia
Effect of observer with
vested interest
Kissing the right way
Do pets resemble their
owners
Who uses shared armrest
Halloween treats
Heart transplant mortality
April 8, 2008








Lasting effects of sleep
deprivation
Sleep deprivation and car
crashes
Fan cost index
Drive for show, putt for
dough
Spock legal trial
Hiring discrimination
Comparison shopping
Computational linguistics
CAUSE Webinar
7
Principle 3: Entire statistical process

First two weeks:

Data collection


Descriptive analysis



Segmented bar graph
Conditional proportions, relative risk, odds ratio
Inference



Observation vs. experiment (Confounding, random assignment vs.
random sampling, bias)
Simulating randomization test for p-value, significance
Hypergeometric distribution, Fisher’s exact test
Repeat, repeat, repeat, …


April 8, 2008
Random assignment  dotplots/boxplots/means/medians 
randomization test
Sampling  bar graph  binomial  normal approximation
CAUSE Webinar
8
Principle 4: Emphasize connections

Emphasize connections among study design,
inference technique, scope of conclusions

Appropriate inference technique determined by
randomness in data collection process



Simulation of randomization test (e.g., hypergeometric)
Repeated sampling from population (e.g., binomial)
Appropriate scope of conclusion also determined
by randomness in data collection process


April 8, 2008
Causation
Generalizability
CAUSE Webinar
9
Principle 5: Variety of computational tools


For analyzing data, exploring statistical concepts
Assume that students have frequent access to
computing


Not necessarily every class meeting in computer lab
Choose right tool for task at hand



Analyzing data: statistics package (e.g., Minitab)
Exploring concepts: Applets (interactivity,
visualization)
Immediate updating of calculations: spreadsheet
(Excel)
April 8, 2008
CAUSE Webinar
10
Principle 6: Mathematical underpinnings

Primary distinction from “Stat 101” course


Some use of calculus but not much
Assume some mathematical sophistication



E.g., function, summation, logarithm, optimization, proof
Often occurs as follow-up homework exercises
Examples

Counting rules for probability


Principle of least squares, derivatives to find minimum


Hypergeometric, binomial distributions
Univariate as well as bivariate setting
Margin-of-error as function of sample size, population
parameters, confidence level
April 8, 2008
CAUSE Webinar
11
Principle 7: Probability “just in time”

Whither probability?



Not the primary goal
Studied as needed to address statistical issues
Often introduced through simulation



Tactile and then computer-based
Addressing “how often would this happen by chance?”
Examples



April 8, 2008
Hypergeometric distribution: Fisher’s exact test for 2×2
table
Binomial distribution: Sampling from random process
Continuous probability models as approximations
CAUSE Webinar
12
Content of Example Course (ISCAM)
Chapter 1
Data Collection
Observation vs.
experiment,
confounding,
randomization
Descriptive
Statistics
Conditional
proportions,
segmented bar
graphs, odds
ratio
Probability
Sampling/
Randomization
Distribution
Chapter 3
Chapter 4
Chapter 5
Paired data
Quantitative
summaries,
transformations,
z-scores,
resistance
Bar graph
Models,
Probability
plots, trimmed
mean
Counting,
random
variable,
expected value
empirical rule
Bermoulli
processes, rules
for variances,
expected value
Normal, Central
Limit Theorem
Randomization
distribution for
Randomization
distribution for
Sampling
distribution for
X, pˆ
Large sample
sampling
distributions for
x , pˆ
Sampling
distributions of
pˆ1  pˆ 2 , OR,
Binomial
Normal, t
Normal, t, lognormal
Binomial tests
and intervals,
two-sided pvalues, type I/II
errors
z-procedures for
proportions tprocedures,
robustness,
bootstrapping
Two-sample zChi-square for
and thomogeneity,
procedures,
independence,
bootstrap, CI for ANOVA,
CAUSE Webinar
OR
regression 13
pˆ1  pˆ 2
Hypergeometric
Statistical
Inference
p-value,
significance,
Fisher’s Exact
Test
x1  x2
p-value,
significance,
effect of
variability
Independent
random samples
Chapter 6
Random
sampling, bias,
precision,
nonsampling
errors
Model
April 8, 2008
Chapter 2
Bivariate
Scatterplots,
correlation,
simple linear
regression
x1  x2
Chi-square
statistic, F
statistic,
regression
coefficients
Chi-square, F, t
Assessments



Investigations with summaries of conclusions
Worked out examples
Practice problems



Homework exercises
Technology explorations (labs)


Quick practice, opportunity for immediate feedback,
adjustment to class discussion
e.g., comparison of sampling variability with stratified
sampling vs. simple random sampling
Student projects

Student-generated research questions, data collection
plans, implementation, data analyses, report
April 8, 2008
CAUSE Webinar
14
Example 1: Friendly Observers

Psychology experiment

Butler and Baumeister (1998) studied the effect of
observer with vested interest on skilled
performance
A: vested
interest
B: no vested
interest
Total
Beat
threshold
3
8
11
Do not beat
threshold
9
4
13
Total
12
12
24
pˆ A  .250
pˆ B  .667
How often would such an extreme experimental difference occur by
April 8, 2008chance, if there was no vested interest effect?
CAUSE Webinar
15
Example 1: Friendly Observers

Students investigate this question through



Hands-on simulation (playing cards)
Computer simulation (Java applet)
Mathematical model

counting techniques
1113 1113 1113 1113
              
3  9   2 10  1 11  0 12

p  value  P( X  3) 
 .0498
 24
 
 12 
April 8, 2008
CAUSE Webinar
16
Example 1: Friendly Observers

Focus on statistical process

Data collection, descriptive statistics, inferential analysis



Connection between the randomization in the design and the
inference procedure used
Scope of conclusions depends on study design


Arising from genuine research study
Cause/effect inference is valid
Use of simulation motivates the derivation of the
mathematical probability model

Investigate/answer real research questions in first two weeks
April 8, 2008
CAUSE Webinar
17
Example 2: Sleep Deprivation

Physiology Experiment

Stickgold, James, and Hobson (2000) studied the
long-term effects of sleep deprivation on a visual
discrimination task (3 days later!)
sleep condition
deprived
unrestricted
n
11
10
Mean
3.90
19.82
StDev
12.17
14.73
Median
4.50
16.55
IQR
20.7
19.53
How often would such an extreme experimental difference occur by
April 8, 2008chance, if there was no sleep deprivation effect?
CAUSE Webinar
18
Example 2: Sleep Deprivation

Students investigate this question through



Hands-on simulation (index cards)
Computer simulation (Minitab)
Mathematical model
p-value .002
April 8, 2008
p-value=.0072
CAUSE Webinar
15.92
19
Example 2: Sleep Deprivation

Experience the entire statistical process
again


Tools change, but reasoning remains same


Develop deeper understanding of key ideas
(randomization, significance, p-value)
Tools based on research study, question – not for
their own sake
Simulation as a problem solving tool

Empirical vs. exact p-values
April 8, 2008
CAUSE Webinar
20
Example 3: Infants’ Social Evaluation

Sociology study




Hamlin, Wynn, Bloom (2007) investigated whether infants
would prefer a toy showing “helpful” behavior to a toy
showing “hindering” behavior
Infants were shown a video with these two kinds of toys,
then asked to select one
14 of 16 10-month-olds selected helper
Is this result surprising enough (under null model of
no preference) to indicate a genuine preference for
the helper toy?
Example 3: Infants’ Social Evaluation


Simulate with coin flipping
Then simulate with applet
Example 3: Infants’ Social Evaluation
Then learn binomial distribution, calculate exact pvalue
p  value  P( X  14)
16 14
16 15
16 16
2
1
0
  .5 1  .5   .5 1  .5   .5 1  .5
14
15
16
 .0021
Distribution Plot
Binomial, n=16, p=0.5
0.20
0.15
Probability

0.10
0.05
0.00
0.00209
2
X = number who choose helper toy
14
Example 3: Infants’ Social Evaluation


Learn probability distribution to answer inference
question from research study
Again the analysis is completed with




Modeling process of statistical investigation


Tactile simulation
Technology simulation
Mathematical model
Examination of methodology, further questions in study
Follow-ups


Different number of successes
Different sample size
Example 4: Sleepless Drivers

Sociology case-control study

Connor et al (2002) investigated whether those in
recent car accidents had been more sleep
deprived than a control group of drivers
April 8, 2008
No full
night’s sleep
in past week
At least one full
night’s sleep in
past week
Sample sizes
“case” drivers
(crash)
61
510
571
“control” drivers
(no crash)
44
544
588
CAUSE Webinar
25
Example 4: Sleepless Drivers
Sample proportion that were in a car crash
Sleep deprived: .581
Not sleep deprived: .484
Odds ratio: 1.48
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
no crash
crash
No full night’s sleep in past
week
At least one full night’s
sleep in past week
How often would such an extreme observed odds ratio occur by
April 8, 2008chance, if there was no sleep deprivation effect?
CAUSE Webinar
26
Example 4: Sleepless Drivers

Students investigate this question through

Computer simulation (Minitab)



Empirical sampling distribution of odds-ratio
Empirical p-value
Approximate mathematical model
April 8, 2008
CAUSE Webinar
1.48
27
Example 4: Sleepless Drivers
1 1 1 1
  
a b c d

SE(log-odds) =

Confidence interval for population log odds:



sample log-odds + z* SE(log-odds)
Back-transformation
90% CI for odds ratio: 1.05 – 2.08
April 8, 2008
CAUSE Webinar
28
Example 4: Sleepless Drivers


Students understand process through which
they can investigate statistical ideas
Students piece together powerful statistical
tools learned throughout the course to derive
new (to them) procedures

Concepts, applications, methods, theory
April 8, 2008
CAUSE Webinar
29
For more information


Investigating Statistical Concepts,
Applications, and Methods (ISCAM),
Cengage Learning, www.cengage.com
Instructor resources:
www.rossmanchance.com/iscam/




Solutions to investigations, practice problems,
homework exercises
Instructor’s guide
Sample syllabi
Sample exams
April 8, 2008
CAUSE Webinar
30