Why Statistics? Two Purposes 1. Descriptive 2. Inferential

Download Report

Transcript Why Statistics? Two Purposes 1. Descriptive 2. Inferential

Why Statistics?
Two Purposes
1. Descriptive
 Finding ways to summarize the
important characteristics of a dataset
2. Inferential
 How (and when) to generalize from a
sample dataset to the larger population
Descriptive
Statistics
Frequency
3.38
.63
3.13
4.25
.50
3.75
1.50
1.88
.88
2.25
1.13
3.38
1.00
-.25
1.63
1.50
2.00
2.13
6
4
2
0
-0.5
0.5
1.5
2.5
3.5
4.5
More
3.5
4.5
More
Secondhand
Frequency
3.88
1.88
2.00
3.88
2.50
3.25
3.13
1.50
3.75
2.00
2.38
3.25
2.88
.88
3.50
4.13
.38
4.63
Secondhand
Impression
8
6
4
2
0
-0.5
0.5
1.5
2.5
Difference
Frequency
Firsthand
Impression
Firsthand
10
5
0
-3
-2
-1
0
1
2
3
More
3.38
.63
3.13
4.25
.50
3.75
1.50
1.88
.88
2.25
1.13
3.38
1.00
-.25
1.63
1.50
2.00
2.13
4
Secondhand
3.88
1.88
2.00
3.88
2.50
3.25
3.13
1.50
3.75
2.00
2.38
3.25
2.88
.88
3.50
4.13
.38
4.63
5
Secondhand
Impression
3
2
1
0
0
1
2
3
4
5
-1
Firsthand
2
1.5
1
0.5
Change
Firsthand
Impression
0
-0.5 0
1
2
3
-1
-1.5
-2
-2.5
-3
-3.5
Firsthand
4
5
Characterizing a Distribution of Data
12
10
Frequency
8
frequency
6
4
2
0
1.00
2.00
3.00
4.00
Voice
5.00
6.00
7.00
Comparing Distributions of Data
10
8
Men
Women
8
6
frequency
6
Frequency
Frequency
frequency
4
4
2
2
0
1.00
2.00
3.00
4.00
Voice
5.00
6.00
7.00
0
1.00
2.00
3.00
4.00
5.00
6.00
7.00
Voice
How could you summarize the differences?
Looking for Linear Relationships
7.0
6.0
Anger during Conflict
5.0
4.0
3.0
2.0
1
2
3
4
5
Conflict Significance
6
7
Looking for Linear Relationships
7
6
Current relationship
satisfaction
5
4
3
2
1
2
3
4
5
Conflict Significance
6
7
Comparing Linear Relationships
7
7.0
Current
relationship
satisfaction
6.0
Anger during
Conflict
5.0
6
5
4.0
4
3.0
3
2.0
2
1
2
3
4
5
Conflict Significance
6
7
1
2
3
4
5
Conflict Significance
How could you summarize the differences?
6
7
Complex Linear Relationships
theory
7
entity
incremental
6
Current
relationship
satisfaction
5
4
3
2
1
2
3
4
5
Conflict Significance
6
7
Descriptive Statistics
Provides graphical and numerical ways to
organize, summarize, and characterize a dataset.
Types of Studies
Experimental:
The predictor variable is manipulated by the
researcher.
Observational:
The predictor variables are merely observed and
recorded by the researcher.
Types of Variables
Predictor variable:
The antecedent conditions that are going to be used
to predict the outcome of interest. If an experimental
study, then called an “independent variable”.
Outcome variable:
The variable you want to be able to predict. If an
experimental study, then called a “dependent
variable”.
Types of Variables
Continuous variable:
There are an infinite number of possible values that fall
between any two observed values.
Discrete variable:
Consists of separate, indivisible categories
Categorical
A set of categories that have
different names
Ordinal
A set of categories that are
organized in an ordered sequence
Summarizing Discrete Data
Name
Eye Color
Janice
brown
Tom
blue
Danielle
green
Ian
brown
Eduardo
brown
Emily
brown
Anja
blue
Cara
brown
Adrian
brown
Eric
blue
Sarah
brown
David
brown
Frequency Tables
Eye Color
Frequency
Brown
33
Blue
14
Green
3
Frequency Tables
Eye Color
Frequency
Relative
Frequency
Brown
33
66%
Blue
14
28%
Green
3
6%
Frequency Bar Graph
35
30
25
20
Frequency
15
10
5
0
Brown
Blue
Green
Eye Color
Relative Frequency Bar Graph
100
80
Relative 60
Frequency 40
Brown
Blue
Green
20
0
Eye Color
Summarizing Continuous Data
Name
Hours of
Sleep / Night
Janice
6
Tom
7.5
Danielle
10.5
Ian
9
Eduardo
7
Emily
6
Anja
8
Cara
5
Adrian
8.5
Eric
6.5
Sarah
7.5
David
4
Frequency Tables
Hours of
Sleep
Frequency
3 - 4 hrs
1
4 - 5 hrs
3
5 - 6 hrs
6
6 - 7 hrs
14
7 - 8 hrs
16
8 - 9 hrs
5
9 - 10 hrs
3
10 - 11 hrs
2
Frequency
Histogram (Frequency)
16
14
12
10
8
6
4
2
0
3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5
Nightly Hours of Sleep
Frequency Tables
Hours of
Sleep
Frequency
Relative
Frequency
3 - 4 hrs
1
2%
4 - 5 hrs
3
6%
5 - 6 hrs
6
12%
6 - 7 hrs
14
28%
7 - 8 hrs
16
32%
8 - 9 hrs
5
10%
9 - 10 hrs
3
6%
10 - 11 hrs
2
4%
Histogram (Relative Frequency)
Relative Frequency
100
80
60
40
20
0
3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5
Nightly Hours of Sleep
Frequency Tables
Hours of
Sleep
Frequency
Relative
Frequency
Cumulative
Frequency
3 - 4 hrs
1
2%
2%
4 - 5 hrs
3
6%
8%
5 - 6 hrs
6
12%
20%
6 - 7 hrs
14
28%
48%
7 - 8 hrs
16
32%
80%
8 - 9 hrs
5
10%
90%
9 - 10 hrs
3
6%
96%
10 - 11 hrs
2
4%
100%
Stem and Leaf Plots
Name
Janice
54
Tom
59
Danielle
35
Ian
41
Eduardo
46
Emily
25
Anja
47
Cara
60
Adrian
41
Eric
34
Sarah
22
David
45
Stem
Leaves
2
25
3
45
4
11567
5
49
6
0
Stem and Leaf Plots
25
Anja
47
Cara
60
Adrian
41
Eric
34
Sarah
22
David
45
0
Emily
6
46
49
Eduardo
5
41
11567
Ian
4
35
45
Danielle
3
59
25
Tom
2
54
Stem
Janice
Leaves
Name
Stem and Leaf Plots
Name
Janice
54
Tom
59
Danielle
35
Ian
41
Eduardo
46
Emily
25
Anja
47
Cara
60
Adrian
41
Eric
34
Sarah
22
David
45
Stem
Leaves
2
25
3
45
4
11567
5
49
6
0
Back-to-Back Stem and Leaf Plots
Name
Janice
54
Tom
59
Danielle
35
Ian
41
Eduardo
46
Emily
25
Anja
47
Cara
60
Adrian
41
Eric
34
Sarah
22
David
45
men
women
2
25
4
3
5
1156
4
7
9
5
4
6
0
Visual Depictions of Distributions
Summary
Discrete Data
Frequency Tables
Bar Graphs
Continuous Data
Frequency Tables
Bar Graphs
Stem and Leaf Plots
Visual Depictions of Relationships
IV -- categorical; DV -- continuous
• Charts
Feelings of
Caring
With
peers
With
profs
With
women
With
men
With
familiar
Unfriendly
Female
-2.61
1.60
-2.37
-1.60
1.83
With
unfamiliar
-3.38
Average
-1.62
Visual Depictions of Relationships
IV -- categorical; DV -- continuous
• Bar Graphs
Voicing Response
4.9
Mild
Conflict
Extreme
Conflict
5
4.5
4
3.8
3.5
3.5
3
2.5
2.5
2
1.5
1
0.5
0
Entity
Incremental
Visual Depictions of Relationships
IV -- categorical; DV -- continuous
• Bar Graphs
Voicing Response
4.9
Entity
5
Look at the same
graph differently!
4.5
4
Incrementa
l
3.8
3.5
3.5
3
2.5
2.5
2
1.5
1
0.5
0
Mild Conflict
Extreme Conflict
Visual Depictions of Relationships
IV -- categorical; DV -- continuous
• Bar Graphs
Voicing Response
4.9
Entity
5
Look again!
Incrementa
l
4.5
3.8
4
3.5
3.5
3
2.5
2.5
2
Mild Conflict
Extreme Conflict
Visual Depictions of Relationships
IV -- categorical; DV -- continuous
• Line Graphs
Intention-Reading Performance
100
90
80
Percentile
70
60
Estimated
Actual
50
40
30
20
10
0
1st
Quartile
2nd
Quartile
3rd
Quartile
4th
Quartile
Visual Depictions of Relationships
IV -- categorical; DV -- continuous
• Box-plots
Voice
Implicit Theory
Visual Depictions of Relationships
IV -- categorical; DV -- continuous
• Error-bar plots
Person
attributions
Implicit Theory
Visual Depictions of Relationships
IV -- continuous; DV -- continuous
• Scatterplots
7
6
Current
relationship
satisfaction
5
4
3
2
1
2
3
4
5
Conflict Significance
6
7
Visual Depictions of Relationships
IV -- continuous; DV -- continuous
• Scatterplots
theory
7
entity
incremental
6
Current
relationship
satisfaction
5
4
3
2
1
2
3
4
5
Conflict Significance
6
7
Visual Depictions of Relationships
IV -- continuous; DV -- continuous
• Scatterplots
with regression
lines
theory
7
entity
incremental
6
Current
relationship
satisfaction
5
4
3
2
1
2
3
4
5
Conflict Significance
6
7
Visual Depictions of Relationships
IV -- categorical; DV -- categorical
• Contingency
table
Narcicists * Ma le Cros stabulation
Coun t
Male
.0 0
Narcici sts
To tal
1.00
To tal
.0 0
19
53
72
1.00
15
34
51
10 4
66
13 8
Visual Depictions of Relationships
IV -- categorical; DV -- continuous
• Charts, bar graphs, line graphs, box plots, error
bar plots
IV -- continuous; DV -- continuous
• Scatterplot (regression line)
IV -- categorical; DV -- categorical
• Contingency table
Inferential
Statistics
Inferential Statistics
Population:
The set of all individuals of interest (e.g. all
women, all college students)
Sample:
Inferential statistics
A subset of individuals selected from the
population from whom data is collected
8
7
6
5
4
3
2
1
0
Women
3.5
5.5
7.5
9.5
Nightly Hours of Sleep
11.5
No. of People
No. of People
Are these sample differences simply
due to chance?
8
7
6
5
4
3
2
1
0
Men
3.5
5.5
7.5
9.5
Nightly Hours of Sleep
11.5
Some important terms
Parameter:
A characteristic of the population. Denoted
with Greek letters such as  or .
Statistic:
A characteristic of a sample. Denoted with
English letters such as X or S.
Sampling Error:
Describes the amount of error that exists
between a sample statistic and the
corresponding population parameter.
We want to know whether Joe is an above average
free-throw shooter. We collect some data
Would you bet $10.00 that he makes the next shot?
B
B
B
M
M
M
B
B
B
B
B
B
B
B
M
M
M
M
B
B
% baskets = .75
M
% baskets = .63
B
B
M
% baskets = .58
Chance is “Lumpy”
H
H
H
T
T
T
H
H
H
H
H
H
H
H
T
T
T
T
H
H
% heads = .75
T
% heads = .63
H
H
T
% heads = .58
So how do we decide?
H
T
H
H
Sample proportion = .75
Inferential Statistics helps us answer the
question:
Given a fair coin tossed four times, how often
would we get the result 75% heads by chance
alone?
Answer: If we took a fair coin and repeated
this procedure many times, we’d get this result
one out of every four times. Pretty often!
So differences we see between
samples might not be reliable
(especially when the differences are small
or the samples are small)
Inferential statistics can tell us
whether or not our results are likely to
be due to chance alone
Important Point of Clarification
Statistics asks: Was this observed “effect” caused
by (lumpy) chance alone?
Random Causes:
Inferential
statistics
separates
Fluctuations of chance
Non-random causes:
True differences in the population
Bias in the design of the study
A statistically significant result doesn’t mean the results have to be
“true”. Just that they are non-random.
Inferential Statistics
Descriptive
Statistics
Probability
Theory
Types of Analyses
IV -- categorical (groups); DV -- continuous
One Sample T-test. Inferences about the mean of one group
Two Sample T-test. Differences between the means of two groups.
ANOVA. Differences between the means of three or more groups.
50
40
30
First Grade
Third Grade
20
Fifth Grade
10
0
Score
Types of Analyses
IV -- continuous; DV -- continuous
Correlation. The linear association between two continuous
variables
Regression. The best fit line of prediction.
10
9
8
Sleep
7
6
5
4
3
2
1
0
0
10
20
30
40
Age
50
60
70
80
Types of Analyses
IV -- categorical (groups); DV -- categorical
Z-test for proportions. The difference between two sample
proportions.
Chi-square test. The distribution of counts in each category,
compared across groups.
Narcicists * Ma le Cros stabulation
Coun t
Male
.0 0
Narcici sts
To tal
1.00
To tal
.0 0
19
53
72
1.00
15
34
51
10 4
66
13 8
Fallibility of
Everyday
Reasoning
Everyday Statistical Reasoning
1. Something out of nothing: the misperception of
random data.
2. Too much from too little: the misinterpretation of
incomplete data
3. Seeing what you expect: biased evaluation of
ambiguous data
Misperceiving Random Data
“The human understanding supposes a greater degree of
order and equality in things than it really finds; and
although many things in nature be most irregular, will yet
invest parallels and conjugates and relatives where no such
thing is.”
-Francis Bacon
• The clustering illusion
People do not intuitively expect chance to be lumpy.
They reject the possibility that clustering can be
random.
“Hot hand” in basketball. “Winning streak” or “hot
table” in gambling.
Gilovich et al., 1985
• Interviewed 100 basketball fans
• 91% thought a player has a better chance of making a shot after
having just made his last 2-3 shots than he does after having just
missed his last 2-3 shots.
• They estimated that a player’s shooting percentage would be
61% after having just made a shot and 42% after having just
missed a shot.
• 84% of the respondents thought that it is important to pass the
ball to someone who has just made several shots in a row.
The data
Gilovich et al., 1985
• On average, players made 51% of shots after making their
previous shot, 54% of shots after missing their previous shot.
• They made 50% of shots after making their previous two shots,
53% after missing their previous two shots.
• They made 46% of shots after making their previous three
shots, 56% of shots after missing three in a row.
• There were no more streaks of 4, 5, or 6 hits in a row than
chance would have predicted.
The players, however, believed that they tended to shoot in
streaks.
The data
Gilovich et al., 1985
• A group of college b-ball players were asked to take 100 shots.
Before each shot they chose either a risky or conservative bet on
their ability to make the shot.
•They tended to make risky bets after hitting their previous shot
and conservative bets after missing their previous shot.
• However, there was no correlation between the outcome of
consecutive shots. No correlation between bets and outcomes.
The response
Gilovich et al., 1985
“Who is this guy? So he makes a study. I couldn’t care less.”
-Red Auerbach, Celtics
“There are so many variables involved in shooting the basketball
that a paper like this doesn’t mean anything.”
-Bobby Knight
• Selective Attention
• Post-hoc causal explanations
Dangers of Post-Hoc
theorizing!
LAW of LARGE NUMBERS
The correct proportion of heads and tails or
hits and misses will be present globally in a
long sequence.
It will NOT, however, always be present
locally, in each of its parts.
Misinterpreting Incomplete Data
“They still cling stubbornly to the idea that the only good
answer is a yes answer. If they say, “Is the number
between 5,000 and 10,000” and I say yes, they cheer; if I
say no, they groan, even though they get exactly the same
amount of information in either case.”
-John Holt
“Are professors particularly likely to be
absent-minded?”
Absent-Minded
Not Absent-Minded
Professors
600
400
Not Professors
300
200
“Does carrying an umbrella make it less likely
to rain?”
Rain
Umbrella
No umbrella
No rain
“Does the Cosmo horoscope predict the
future?”
Event happens
Cosmo predicts event
Cosmo doesn’t predict event
Event doesn’t happen
Can alternative medical technique X help
cancer patients who have been diagnosed as
“incurable”?
Patient recovers Patient fails to recover
Patient gets
alternative med
500
4000
Patient does not
get alternative med
700
3800
• Selective attention
•Available information
•Positive test strategy
A
B
2
3
“All cards with a vowel on one side have an even number
on the other.”
• Selective attention
•Available information
•Positive test strategy
•Under-appreciation of base rates
Watch out for incomplete data!
Event occurs
Event
hypothesized
No event
hypothesized
Event does not occur
III. Projecting onto Ambiguous Data
“I’ll see it when I believe it.”
-Thane Pittman
• Illusory correlations
When people “see” an association that is not present in
the data.
“Arthritis pain is influenced by the weather.”
“Most women get bad moods before their menstrual
periods.”
Chapman et al., 1967
• Why do clinical psychologists continue to use projective
tests even though dozens of studies have shown these
tests are not valid indicators of personality?
•Showed clinicians a series of Rorschach cards as well as
the patient’s response to the card and some info
describing the patient’s characteristics. (including
sometimes sexual orientation).
• Examined the correlations that clinicians “saw” between
particular responses and homosexuality.
Chapman et al., 1967
• In truth, there are some counter-intuitive relationships.
Homosexuals are more likely to see a monstrous figure on
one card and an ambiguous animal-human figure on
another card.
•Many of the intuitive relationships do not hold.
Homosexuals are not more likely to see anal content,
feminine clothing, or humans of uncertain gender.
Chapman et al., 1967
• In Study 1, researchers designed the materials so that
there was no correlation between any of the responses
and homosexuality.
•Clinicians did, however, believe the highly intuitive -- but
invalid -- correlations.
Chapman et al., 1967
•In followup studies, researchers designed the materials
so that there was a negative correlation between the
intuitive responses and homosexuality.
•The size of the illusory correlation was not reduced.
• Clinicians may “see” non-existent correlations between
test responses and diagnoses
• Managers may “see” non-existent correlations between
employees’ race or gender and performance
•Parents may “see” nonexistent correlations between
children’s sugar consumption and unruly behavior
• Students may “see” nonexistant correllations between
their peers’ college majors and personalities.
Much of what we “learn” from experience may
reflect our prior theories about reality rather
than the actual nature of reality.
Everyday Statistical Reasoning
1. Something out of nothing: the misperception of
random data.
- Drawing strong conclusions from small “lumpy”
samples
2. Too much from too little: the misinterpretation of
incomplete data
- Inadequate comparison groups
3. Seeing what you expect: biased evaluation of
ambiguous data
- Illusory correlation based on confirmation bias
But there’s hope …
Following training in probability and
statistics, people are less likely to make
these errors.
Fallibility of
Statistical
Reports
Everyday Reasoning
1.
Something out of nothing: the
misperception of random data.
2.
Too much from too little: the
misinterpretation of incomplete
data (~control groups)
3.
Seeing what you expect: biased
evaluation of ambiguous data
Statistical Reports
1.
One thing out of something else:
overgeneralization from biased samples
and measures
2.
Too much from too little:
the misinterpretation of incomplete data
(~control groups)
3.
Getting what you expect: biased
presentation of ambiguous data
Overgeneralizing from Biased Samples
1934 Election Poll
- In 1934, the Literary Digest predicted that Alf Landon would beat
Franklin D. Roosevelt in the presidential election, based on approx 2
million survey responses
- How could a study with such a large sample be so wrong? Selection bias?
But participants were selected randomly from phone books…
- Other polling agencies with smaller samples but more representative
methods accurately predicted Roosevelt’s win
Overgeneralizing from Biased Samples
Sperm Study
- In early 1996, media raised the alarm about declining sperm counts, as a
result of a book published by Colburn, an environmentalist
- The book relied heavily on a 1992 Danish meta-analysis reviewing 61
papers published between 1938 and 1991, in which a total of 14,947 men
had their sperm tested.
- Found a “significant” decline in sperm count: from 113 m sperm per ml in
1940 to 66 m sperm per ml in 1990
Overgeneralizing from Biased Samples
Sperm Study
- Sample:
Pre-1950
596
one study!
1951
1000
1952-1970
184
1970-1991
13,167
-The entire “decline” was carried by the single 1951 sample
-From 1970-1991, sperm counts actually increased
Misinterpretation of Incomplete Data
Crime Study
- Murders significantly fell in NYC in the last decade: from 2,245 in
1990 to 596 in 2003
presumed cause: Giuliani
- Murders significantly fell all across the country from 1990 to 2003
- Crime started dropping in NYC in 1990, four years before Giuliani
became mayor.
Misinterpretation of Incomplete Data
Unwed Mothers Study
- In October 1996, NCHS issued data showing that the rate of births
to unwed mothers had declined from 46.9 per thousand in 1994 to
44.9 per thousand in 1995. The first decline in 20 years. Front page
coverage in the NYtimes and LAtimes.
- Clinton trumpets the results as a success for his new welfare
policies (instituted in 1996)
- Not mentioned: from 1993-1994 there was the largest one-year
increase in out-of-wedlock births since national figures have been
kept
Biased Presentation of Ambiguous Results
• Selective presentation of results
Day Care Study
-In 1996, media publicized the results of a study presented at an NICHD
conference claiming that the bond between mothers and babies is not
weakened when the child is placed in day care
-Study measured the presence or absence of “secure attachment” in
infants
-Overall no difference in day care versus home care babies
Biased Presentation of Ambiguous Results
• Selective presentation of results
Day Care Study
-What the media did not highlight: a more confusing picture emerges
when the averages are broken out
-Baby boys were most likely to be insecurely bonded when they were in
day care for more than thirty hours a week
-Baby girls were most likely to be insecurely bonded when they were in
day care for less than 10 hours a week
Biased Presentation of Ambiguous Results
• Selective presentation of results
Psychology Research
- Researchers sometimes present significant results and fail to present
null (or opposing) results.
-Sometimes you can catch them – look at their methods section and see
how many tests they must have run and how many they reported.
Biased Presentation of Ambiguous Results
• “Spin” or specialized emphasis of particular results
Mortgage Study
-In 1995 a study by the Federal Reserve Bank of Chicago, showed that
among people with bad credit ratings, 10% of white applicants are
denied mortgages while 20% of black and Hispanic applicants are
denied mortgages.
-In the same study, however, it was found that compared to past years,
approved mortgages rose by 55% for black applicants rose by 55%,
45% for Hispanic applicants, and 16% for white applicants.
Biased Presentation of Ambiguous Results
• “Spin” or specialized emphasis of particular results
Mortgage Study
-In 1995 a study by the Federal Reserve Bank of Chicago, showed that
among people with bad credit ratings, 90% of white applicants are
granted mortgages while 80% of black and Hispanic applicants are
granted mortgages.
-In the same study, however, it was found that compared to past years,
approved mortgages rose by 55% for black applicants, by 45% for
Hispanic applicants, and by only 16% for white applicants.
Biased Presentation of Ambiguous Results
• “Spin” or specialized emphasis of particular results
Mortgage Study
-The NYtimes did not report the second finding until the fourth
paragraph of the article
-They also reported denial rates rather than approval rates.
-In approval terms, the comparison is 90% versus 80%. In denial terms,
the comparison is 10% versus 20%.
-“Twice as likely to be denied”. (makes you think people of color were
half as likely to be accepted, but actually they were 88% as likely to be
accepted)
Biased Presentation of Ambiguous Results
• “Spin” or specialized emphasis of particular results
Psychology Research
- Sometimes a p-value of 0.10 is treated as “not significant” (especially
if the researcher did not predict the effect)
- Other times the same p-value is emphasized as “marginally
significant” (esp if the researcher predicted the effect)
Can’t always trust intuition
- Learn more about possible pitfalls in
intuitive decision making
Can’t always trust statistical reports
- Learn more about how to evaluate
statistical reports and research findings
Practice
Washington Post April 12, 2000
“Government-funded medical surveys since 1960 have shown
higher rates of at least one type of cancer – varying from
thyroid tumors to leukemia – at most of the major facilities
that produced nuclear weapons.”
What’s problematic about this statement?