RIMI Workshop: Power Analysis

Download Report

Transcript RIMI Workshop: Power Analysis

RIMI Workshop:
Power Analysis
Ronald D. Yockey
[email protected]
Goals of the Power Analysis
Workshop
1. Understand what power is and why power analyses are
important in conducting research.
2. Recognize the limits of Null Hypothesis Significance
Testing (NHST) and how effect sizes complement
NHST.
3. Understand the relationship between power, effect size, and
sample size.
4. Use GPower to estimate the sample size (N) required to
obtain a desired level of power (e.g., 80%) for a number
of statistical procedures.
5. Provide an estimate of power for your grant proposals!
Null Hypothesis Significance
Testing (NHST)
What is Power?
Power – the probability of rejecting the null
hypothesis (i.e., obtaining significance) when it is
false
- Ranges from 0 to 1
- When multiplied by 100%, power is expressed
as a percentage
Examples of Power
Example #1
Power = .50
- 50% of the time the null hypothesis will be
rejected (i.e., statistical significance will be
obtained)
- 50% of the time the null hypothesis will
not be rejected (i.e., statistical significance
will not be obtained) – A Type II Error
Examples of Power (continued)
Example #2
Power = .80
- 80% of the time the null hypothesis will be
rejected (i.e., statistical significance will be
obtained)
- 20% of the time the null hypothesis will
not be rejected (i.e., statistical significance
will not be obtained) – A Type II Error
Rationale for Power Analysis
-
-
-
Work involved in a study - conceiving the idea,
literature review, grant proposal submission,
running participants, analyzing data, writing the
results
High power = high chance of obtaining significance
(supporting the research hypothesis)
Low power = low chance of obtaining significance
Neglecting “A priori” Power Analysis frequently
results in low power studies
Power Analysis - Crucial for increasing the
probability of getting significant results!
Rationale for Power Analysis
(continued)
Low power studies are very common
e.g., Power = .30
- 30% chance of achieving significance (rejecting H0)
Is spending the time and effort to conduct the study (not
to mention taxpayers’ money) worth it when there is
only a 3 in 10 chance of getting significance?
Recommended power level – 70% to 80%
(Diminishing returns in the 90%+ range)
Factors That Influence Power
1. Alpha level (α = .05 or .01)
- Larger α = greater power
2. One-tailed vs. two-tailed tests
- One-tailed tests have greater power (for a constant α)
- Two-tailed tests are much more common (a onetailed test may require justification)
3. The size of the standard deviation (σ)
- Smaller standard deviation = greater power
(σ can be very difficult to manipulate)
Factors That Influence Power
(continued)
4. Effect size – the size of the “treatment effect”
in your study
- Larger effect size = greater power
5. Sample size (N)
-
Larger N = greater power
(The most commonly manipulated factor for
increasing power)
Examples of Low Power Studies
Very “realistic” low power study examples (for the
independent samples t test):
Example #1 (2-tailed, α=.05)
n1 = 30
n2 = 30
Small effect (i.e., a relatively small difference between the
groups; characteristic of many studies in the social
and behavioral sciences)
Power = 12%!
Examples of Low Power Studies
(continued)
Example #2 (2-tailed, α=.05)
n1=50, n2=50; Small effect
Power = 17%!
Example #3 (2-tailed, α=.05)
n1=30, n2=30; Medium effect
Power = 48%
All three studies suffer from insufficient power.
Rationale for Power Analysis
(continued)
The prevalence of low power studies is one reason
why funding agencies such as NIH and NIMH
(among others) often require estimates of power
with the submission of a grant proposal.
And that’s why we’re here today!
Null Hypothesis Significance
Testing (NHST)
NHST (Continued)
If statistical significance is obtained (e.g., p < .05), then we
can declare that the groups are different.
While a “statistically significant” result with NHST tells us
the groups are different, it says nothing about how
different they are. Statistical significance means
“beyond normal sampling error” or “reliable
difference,” but it does not necessarily mean “big
difference” or “important.”
NHST (Continued)
-
-
While NHST can be a very useful tool, it has frequently
been misused, as far too many researchers have made
the mistake of assuming statistical significance means
“practical importance”
Due to this common misunderstanding, the American
Psychological Association (APA) now strongly
encourages that effect sizes be presented (alongside the
results of significance tests), and many journals require
the reporting of effect sizes for manuscript
consideration.
What is an Effect Size?
Effect size – Indicates the size or degree of the
effect of some treatment or phenomenon
Definitions of effect size provided by Cohen
(1988; p. 8-9)
- “The degree to which the phenomenon is present
in the population.”
- “The degree to which the null hypothesis is false.”
NHST vs. Effect Size
Cohen’s second definition of effect size (repeated):
- “The degree to which the null hypothesis is
false.”
1. NHST – If reject null – what do you conclude?
The null is false – i.e., Experimental ≠ Control
(NHST doesn’t indicate how different the
groups are, just that they’re not equal)
2. Effect size – indicates how different the groups
are
NHST vs. Effect Size (continued)
Basic Question of Significance Testing (NHST) –
Is there an effect?
- Yes or No
Basic Question of Effect sizes – How big is the
effect?
- A question of degree
Effect Sizes in Power Analysis
Effect sizes play a fundamental role in power
analysis
– To conduct a power analysis, the effect size must
be estimated.
(We’ll examine several effect size measures
shortly.)
Effect Sizes in Power Analysis
(continued)
Different effect sizes are often used for different
statistical procedures (t tests, ANOVA,
Correlation, etc.)
Effect Sizes – Mean Differences
Effect size of the difference between two means
Example #1 – IQ scores:
group 1 = 115, group 2 = 105
Effect size = mean group 1 – mean group 2
= 115-105 = 10 IQ points
Effect size of 10 IQ points (notice the effect
size indicates how different the groups are)
Effect Sizes – Mean Differences
(continued)
Example #2:
Stress – breathing exercises vs. control
breathing exercises = 60, control = 67
(higher scores = greater stress)
Effect size = 60 – 67 = –7; effect size of 7 points
(Often the absolute value for an effect size is
reported.)
Effect Sizes – Mean Differences
(continued)
Problems with mean difference approach:
1. When different scales are used (with different
M and SD) to measure the same construct, the
results of different studies cannot be
meaningfully compared (comparing apples and
oranges).
2. Power analysis requires a standardized or
“scale free” measure of effect size.
Standardized Measures of Effect Size
t tests – Cohen’s d
ANOVA – η2 (eta-square) or R2
Correlation – Pearson’s r
Multiple Regression – R2
Chi–Square Test of Independence – Cramer’s Phi
Cohen’s d
Used for all t tests (one sample t, independent
samples t, dependent samples t)
A standardized or “scale free” measure of mean
differences
Cohen’s d (continued)
Cohen’s d (continued)
Example:
Examining the effect of a drug on pain levels
- Pain questionnaire on a 10-50 scale
administered to people suffering from back pain
(higher score = greater pain).
- old drug – 25, new drug – 20
- standard deviation of 10.
Cohen’s d (continued)
d = .5
(Interpret in terms of standard deviation differences like z-scores)
Those who took the new drug had pain levels that
were .5 standard deviations lower than those who took
the old drug.
Cohen’s conventions for d
Magnitude
d
Small
.20
1/5 of a std. dev. difference
Medium
.50
1/2 of a std. dev. difference
Large
.80
8/10 of a std. dev. difference
Cohen’s standards for small, medium, and large effect sizes
for the independent samples t test, one sample t test, and
the dependent samples t test.
Power Table – Independent t
(abridged)
Effect Size (d)
Power
.20 – Small
.50
.60
.70
.80
194 (388)
246 (492)
310 (620)
394 (788)
.50 – Medium .80 – Large
32 (64)
41 (82)
51 (102)
64 (128)
14 (28)
17 (34)
21 (42)
26 (52)
Sample size required per group (with total N listed in parentheses)
for a given level of power and effect size for the Independent
Samples t test (α = .05, 2-tailed).
Note: Assumes equal n per group.
Cohen’s conventions for
Pearson’s r
Magnitude
r
Small
.10
Medium
.30
Large
.50
Cohen’s standards for small, medium, and large effect sizes
for the Pearson r correlation coefficient.
Power Table – Pearson’s r
(abridged)
Effect Size (r)
Power
.50
.60
.70
.80
.10 – Small .30 – Medium
384
489
616
782
43
54
67
84
.50 – Large
15
19
23
29
Sample size (N) required for a given level of power and effect
size for the Pearson r correlation coefficient (α = .05, 2-tailed).
Cohen’s Conventions for Cramer’s
Phi/w (Chi-Square)
Magnitude
Phi, w
Small
.10
Medium
.30
Large
.50
Cohen’s standards for small, medium, and large effect sizes
for the chi-square test of independence.
Note: Applies only to 2 x k tables, where k ≥ 2.
Power Table – Chi-Square Test of
Independence (abridged)
Effect Size (Phi, w)
Power
.50
.60
.70
.80
.10 – Small .30 – Medium
385
490
618
785
43
55
69
88
.50 – Large
16
20
25
32
Sample size required for a given level of power and effect size
for the chi-square test of independence (α = .05, df = 1, i.e., 2
x 2 table).
Effect Size - ANOVA
k = the number of groups, mi = the mean of the
ith group, m = the grand (overall) mean, and σ =
the average (or pooled) standard deviation.
Effect Size - ANOVA
Cohen’s Conventions for ANOVA
(f and η2)
f
η2
Small
.10
.01
Medium
.25
.06
Large
.40
.14
Magnitude
Cohen’s standards for small, medium, and large effect sizes for
the one-way between subjects analysis of variance (ANOVA).
Power Table – ANOVA
(abridged)
Effect Size (f, η2)
Power
.50
.60
.70
.80
f =.10; η2=.01
Small
167 (501)
209 (627)
258 (774)
323 (969)
f =.25; η2=.06 f =.40; η2=.14
Medium
Large
28 (84)
12 (36)
35 (105)
14 (42)
43 (129)
18 (54)
53 (159)
22 (66)
Sample size (N) required per group (and total N) for a given level of power
and effect size for the one-way between subjects ANOVA (α = .05). The
power values provided are based on 3 groups; larger N is required to
achieve the same level of power as the number of groups increase.
Effect Size – Multiple Regression
Cohen’s conventions for Multiple
Regression (f2 and R2)
Magnitude
f2
R2
Small
.02
.02
Medium
.15
.13
Large
.35
.26
Cohen’s standards for small, medium, and large effect sizes
for multiple regression.
Power Table – Multiple Regression
(abridged)
Effect Size (f2, R2)
Power f2 =.02; R2=.02 f2 =.15; R2=.13 f2 =.35; R2=.26
Small
Medium
Large
.50
292
43
21
.60
362
52
25
.70
444
63
30
.80
550
77
36
Sample size (N) required for a given level of power and effect size for
multiple regression (α = .05). The power values provided are based on 3
predictors (IVs); larger N is required to achieve the same level of power as
the number of predictors increase.
Estimating Power using GPower
GPower illustration…