Transcript Document

Be humble in our attribute, be loving and varying in our
attitude, that is the way to live in heaven.
Statistical Package Usage
Topic: One Way ANOVA
By Dr. Kelly Fan, Cal State Univ, East Bay
Statistical Tools vs. Variable Types
Response
(output)
Numerical
Categorical
Predictor (input)
Numerical
Categorical/Mixed
Analysis of
Variance (ANOVA)
Simple
Analysis of
and Multiple
Covariance
Regression
(ANCOVA)
Categorical data analysis
Example: Broker Study
A financial firm would like to determine if brokers they use to
execute trades differ with respect to their ability to provide a stock
purchase for the firm at a low buying price per share. To measure
cost, an index, Y, is used.
Y=1000(A-P)/A
where
P=per share price paid for the stock;
A=average of high price and low price per share, for the
day.
“The higher Y is the better the trade is.”
CoL: broker
1
12
3
5
-1
12
5
6
2
7
17
13
11
7
17
12
3
8
1
7
4
3
7
5
4
21
10
15
12
20
6
14
5
24
13
14
18
14
19
17
}
R=6
Five brokers were in the study and six trades
were randomly assigned to each broker.
Statistical Model
(Broker is, of course, represented as
“categorical”)
“LEVEL” OF BROKER
1
1
2
•
•
•
•
n
2 ••• • • •••C
Y11 Y12 • • • • • • •Y1c
Y21
•
•
•
•
•
•
Yij
•
•
•
•
YnI
• • • •
•
•
•
•
•Ync
Yij = j + ij
i = 1, . . . . . , n
j = 1, . . . . . , C
One-Way Anova F-Test:
HO: Level of X has no impact on Y
HI: Level of X does have impact on Y
HO: 1 = 2 = • • • • 8
HI: not all j are EQUAL
ONE WAY ANOVA
The GLM Procedure
Dependent Variable: TRADE
Source
DF
Sum of
Squares
Model
4
640.800000
160.200000
Error
25
530.000000
21.200000
Corrected Total
29
1170.800000
Mean Square
R-Square
Coeff Var
Root MSE
0.547318
42.63283
4.604346
Estimate of the common standard deviation s
F Value
7.56
TRADE Mean
10.80000
Pr > F
0.0004
Diagnosis: Normality
Normality plot: normal scores vs. residuals
• Don’t do the normality checking for all groups
but only for the residuals
• The points on the normality plot must more or less
follow a line to claim “normal distributed”.
• There are statistic tests to verify it scientifically.
• The ANOVA method we learn here is not sensitive
to the normality assumption. That is, a mild
departure from the normal distribution will not
change our conclusions much.
From the Broker data:
7. 5
5. 0
2. 5
R
E
S
I
D
U
A
L
0
- 2. 5
- 5. 0
- 7. 5
- 10. 0
- 3
- 2
- 1
0
No r ma l
Qu a n t i l e s
1
2
3
Diagnosis: Equal Variances
Residual plot: predicted values vs. residuals
• The points on the residual plot must be more or less
within a horizontal band to claim “constant
variances”.
• There are statistic tests to verify it scientifically.
• The ANOVA method we learn here is not sensitive to
the constant variances assumption. That is, slightly
different variances within groups will not change our
conclusions much.
From the Broker data:
RE S I DUA L
7
6
5
4
3
2
1
0
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
5
6
7
8
9
10
11
P RE DI CT E D
12
13
14
15
16
17
Multiple Comparison
Procedures
Once we reject H0: ==...c in favor of
H1: NOT all ’s are equal, we don’t yet
know the way in which they’re not all
equal, but simply that they’re not all the
same. If there are 4 columns, are all 4 ’s
different? Are 3 the same and one
different? If so, which one? etc.
Pairwise Comparison
Goal: grouping levels
Method: Compare each pair of levels
SNK procedure is a popular procedure
and introduced here
SAS Output for SNK Procedure
Number of Means
Critical Range
2
3
SNK Grouping Mean
N BROKER
A
17.000
6 5
14.000
6 4
A
12.000
6 2
B
6.000
6 1
5.000
6 3
A
A
B
B
5
5.4749249 6.6214244 7.3120942 7.8071501
Means with the same letter are not
significantly different.
A
4
Conclusion : 5 4 2
13
Conclusion : 5 4 2
13
 Brokers 1 and 3 are not significantly
different each other but they are significantly
different to the other 3 brokers.
 Broker 2 and 4 are not significantly different,
and broker 4 and 5 are not significantly
different, but broker 2 is different to (smaller
than) broker 5 significantly.
Comparisons to Control
Dunnett Procedure
Designed specifically for comparing several
“treatments” to a “control.”
Example:
CONTROL
Col
1
2
6 12
3
4
5
5 14 17
} R=6
CONTROL
In our example:
1
2
3
6
12
5 14 17
Comparisons significant at the 0.05 level are
indicated by ***.
Simultaneous
Difference
95%
Between Confidence
BROKER
Comparison
Means
Limits
5-1
11.000
4.070
17.930 ***
4-1
8.000
1.070
14.930 ***
2-1
6.000
-0.930
12.930
3-1
-1.000
-7.930
5.930
- Cols 4 and 5 differ from the control [ 1 ].
- Cols 2 and 3 are not significantly different
from control.
4
5
Contrast
Question 1: Broker 1 vs. the others
Question 2: Brokers 1, 2 are more
experienced than the others.
Experienced vs. less experienced brokers
SAS Output for Question 1
Contrast
BROKER 1 VS THE OTHERS
DF Contrast SS Mean Square F Value Pr > F
1
172.8000000
172.8000000
8.15
0.0085
KRUSKAL - WALLIS TEST
(Non - Parametric Alternative)
HO: The probability distributions are
identical for each level of the factor
HI: Not all the distributions are the same
Example: Life Insurance Amount
State
1: CA
2: KA
3: CO
90
80
165
200
140
160
225
150
140
100
140
160
170
150
175
300
300
155
250
280
180
RE S I DUA L
200
100
0
- 100
- 200
160
170
180
P RE DI CT E D
190
200
KRUSKAL - WALLIS TEST
Kruskal-Wallis Test
Chi-Square
DF
Pr > Chi-Square
1.0791
2
0.5830
SAS Code
DATA INSURANCE;
INPUT STATE $ AMOUNT@@;
DATALINES;
CA 90 CA 200 CA 225 CA 100 CA 170 CA 300 CA 250
KA 80 KA 140 KA 150 KA 140 KA 150 KA 300 KA 280
CO 165 CO 160 CO 140 CO 160 CO 175 CO 155 CO 180
;
** NON-PARAMETRIC TEST;
PROC NPAR1WAY DATA=INSURANCE WILCOXON;
TITLE "NONPARAMETRIC TEST TO COMPARE STATES";
CLASS STATE;
VAR AMOUNT;
RUN;