Transcript Document

RESEARCH METHODOLOGY &
STATISTICS
TESTING DIFFERENCES IN PROPORTIONS
Addictions Department
MSc(Addictions)
Testing differences between 2 Proportions
• Another common problem in Addiction (& other) research
• Consider dichotomous (Yes/ No) measures
• ‘Exposure’
• Gender
• Treated/ Not treated
• ‘Outcome’
• Meeting diagnosis for e.g., alcohol dependence
• Successful treatment outcome
• Note that a construct can be measured using different metrics
(e.g., dichotomous (dependence: yes/ no) vs continuous
(number of dependence symptoms)
Relapse in treated vs. controls
• A clinical researcher hypotheses that the addition of a new
treatment component will significantly improve treatment
outcome among individuals receiving inpatient detox for
alcohol dependence.
• She recruits 50 consecutive patients admitted to the detox
unit. Half of them receive additional training in
meditation and half do not.
• Patients are followed up (100% re-contact rate) at three
months and assessed for alcohol consumption/
dependence.
• Null Hypothesis (Ho): There will be no difference between
the two groups in rates of relapse at 3 months
Proportions/ Percentages/ Rates
Percentage = Number of people with ‘x’/ total number
% Treated = 25/ 50
= 50%
Proportion (%/100; i.e.. Adds to 1)
Proportion treated = .5
‘Risk’ - Especially when discussing outcomes (e.g.,
relapse) we often talk about ‘risk’
• In this example the ‘risk’ of relapse might be expressed as
either a percentage or a proportion.
•
•
•
•
•
•
Relative Risk
• At Follow-up 12 of the treatment group have remained
abstinent compared with 3 of the control group
• Relative risk is the ratio of the two risks (proportions)
• So, in above example 12/ 25 = .48; 3/25 = .12
• Relative risk = .48/ .12
• =4.0
‘Cured’
Relapse
Total
‘Risk’
Treated
12
13
25
.48
Controls
3
22
25
.12
NOT in the exam (!)
Formula for Relative Risk (again):
Standard Error of the Relative Risk:
95% confidence interval around the Relative Risk
+
-
Total
‘Risk’
Case
a
b
(a+b)
a/(a+b)
Controls
c
d
(c+d)
c/(c+d)
Take- home message
Most/ All statistical packages – and numerous computer programs
- automatically provide confidence intervals for estimates of
relative risks (and other statistics)
There is no need be memorize formulae used BUT:
Important to recognize that there is a strong mathematical
foundation to these calculations
It IS possible to calculate them by calculator if you are
provided with individual level data
Standard error (& therefore CI) is partially determined by
sample size
Some Examples
Use this website:
http://www.medcalc.org/calc/relative_risk.php
To:
Calculate/ confirm RR & CI for the above example
Calculate RR for this example:
3 Month Outcome
‘Sober’
Treatment
Drinking
Total
Control
54
537
591
Meditation
345
293
638
Total
399
830
1229
Odds Ratio
• Anyone gamble?
• The odds of an event are the chance of an event happening
divided by the chance of that event not happening
• SO: IF rolling a dice, the odds of rolling a ‘1’ are 1/5.
• In the treatment example above the odds of not drinking
at 3 months in the treated/ meditation group are 12/ 13
• In the control group: 3/22
• The odds ratio is simply the ratio of these odds
• = (12/13)/ (3/22)
• = .9231/ .1364
• = 6.77
More Formally
Odds ratio = (a*d)/(b*c)
Example
3 Month Outcome
‘Sober’
Treatment
Drinking
Total
Control
3
22
25
Meditation
12
13
25
Total
25
25
50
Odds ratio = (a*d)/(b*c)
= (3 * 13)/(22 * 12)
=39/264
= .1477
Odds Ratios are ‘symmetric’
.1477 X 6.7676
=1
???????????
Some Features of Odds Ratios
Can range from 0 to (theoretically) infinity
An odds ratio of 1 = no difference between groups
Odds ratios less than one represent reduced likelihood of outcome
Odds ratio greater than one represent increased likelihood of
outcome
Are ‘symmetric’
Some Features of Odds Ratios
Can calculate 95% (& other) Confidence intervals around the
estimate
Note that confidence intervals are calculated on a logarithmic
scale so are not symmetrical
If confidence intervals include ‘1’ then estimate is not significantly
different from one – the difference between treatments is not
significant & we can not reject the null hypothesis.
Some Examples
Use this website:
http://www.medcalc.org/calc/odds_ratio.php
To: Calculate/ confirm OR & CI for the above example
Calculate OR for these examples:
3 Month Outcome
‘Sober’
Treatment
Drinking
Total
Control
54
537
591
Meditation
345
293
638
Total
399
830
1229
Male
Female
Total
Yes
675
416
1091
No
2570
3460
6030
Total
3245
3876
7121
Gender
Alcohol
Abuse
Chi Square
• The Chi square statistic (χ 2) tests whether the distribution
of a categorical variable differ between groups/ categories
• Examples of categorical variables
• Primary drug (tobacco vs alcohol vs cannabis vs cocaine vs
heroin)
• Occupational status (student vs unemployed vs retired etc)
• Status at follow-up of e.g., 20 year longitudinal study
(interviewed vs located but refused vs. not located vs. dead)
• χ 2 compares counts of categorical responses between 2+
independent groups
Contingency Table
In simplest case we can continue to consider the 2 X 2
contingency Table
Calculating χ 2
Each cell of the contingency Table can be assigned an ‘expected’
value, assuming that there is no association
The expected (E) value for ‘a’ would be:
((a+b)*(a+c))/(a+b+c+d)
The value of the test-statistic is
where
O = an observed frequency;
E = an expected (theoretical) frequency, asserted by the null
hypothesis;
Some Features of χ 2
• This calculation produces a Chi square statistic (χ 2)
which, like the t-test (& others), has a known distribution
• There is also a ‘degrees of freedom’ associated with this
statistic, calculated:
• D.f. = (number of columns -1)*(number of rows-1)
• So, in our example, it would be (2-1)*(2-1)=1
• Using the estimated χ 2 and its degrees of freedom, we can look
this up in Tables to find the significance level associated with a
specific test value & d.f.
Why not stick with Odds Ratios?
• The example I’ve used could just as easily be tested using an OR
& its 95% CI
• BUT: Odds ratios are (generally) only useful for dichotomous
variables
• Chi can be used for categorical variables with 3 or more
categories
An Example
• A researcher has conducted a 10 year follow-up on individuals
entering treatment for alcohol/ drug addiction
• In 2003 everyone entering treatment for addiction in her clinic
was enrolled into a research study and baseline information was
collected (e.g., patterns of drug use)
• In this sample, there were three categories of drug users: those
who primarily used alcohol, those who primarily used heroin
and those who primarily used cocaine
• At follow-up, after 10 years, she was able to interview only 60%
of those who had participated in 2003
• Reasons for loss to follow-up included death, unable to locate,
refusal
Data
Heroin
Alcohol
Cocaine
Interviewed
50
70
60
Refused
5
12
8
Not Located
25
16
24
Died
20
12
8
An Example
• Examine whether there were differences between groups
(primary drug) and follow-up status
• Use this calculator to calculate
• http://www.quantpsy.org/chisq/chisq.htm
• Firstly: How many degrees of freedom?
Correlation
• Correlations (r) are used to test whether there is a
significant association between continuous variables
• Examples of categorical variables
• Height (cm), weight (kg)
• Number of drinks?
• Scores on a personality dimension
• One web site:
• http://www.mathsisfun.com/data/correlation.html
Correlation
• Can range between -1 and 1 – o indicates no association
and 1 (+ or –) indicates perfect association.
• Significance is sample size dependent
• Important also to graph
• Correlation is NOT causation – we’ll be discussing this
more this afternoon.
• Calculator:
• http://www.socscistatistics.com/tests/pearson/Default2.a
spx
• (will need r, and n to click thru to p value calculator,
• Exercise – try adding some outliers
Summary
• Comparing means on a continuous variable across
subpopulations (e.g., gender)
• T-test
• Comparing rates of a dichotomous outcome across two groups
• Relative risk
• Odds ratio
• Chi square
• Comparing frequency of a categorical outcome across 2 or more
groups
• Chi square
• Examining n associations between two continuous measures
• Correlation
Best way to test an association
Gender
Number of
cigarettes/day
Gender
Ever used cannabis
1
0
1
3
2
1
2
15
1
0
2
25
2
1
2
25
1
1
1
19
2
0
1
40
1
1
1
10
1
0
1
20
2
1
2
20
Best way to test an association
Region (1=NZ;
2=UK; 3=AUS)
Preferred drug
(1=Alc; 2=Cig;
3=THC)
1
1
2
3
3
2
2
1
1
1
3
2
1
1
3
3
2
1
Novelty seeking
Max drinks/ month
30
5
28
99
33
0
21
25
28
67
32
12
30
30
31
16
29
44