Transcript Document

Solution 9
Matrix Plot for the Data for Exercise 4.12-4.14
1. a)
555.25
Y
289.75
90
X1
40
96.5
X2
39.5
99.25
X3
37.75
15
X4
5
14.75
X5
4.25
266.25
X6
118.75
5
5
9 .7 55. 2
28
5
40
90
.5
39
.5
96
. 75 9 .2 5
37
9
5
15
5
75
4. 2 14.
5
5
8. 7 66. 2
11
2
From the matrix plot, 1) The assumption about linearity seems ok;
2).The assumption about measurement errors can not be checked at this level
3). The assumption about the predictor variables seems be violated since there is strong
colllinearity between some predictor variables, e.g., between X3 and X6, between X1 and X6.
4). The assumption about observations may be violated. There seems have some
outliers.
7/17/2015
ST3131 Solution 9
1
Residual Plots for Ex 4.12-4.14
I Chart of Residuals
3
2
Residual
a). From the Normal probability plot and
the histogram of the standard residuals it
seems that the Normality assumption is
violated.
Normal Plot of Residuals
Residual
The assumptions about the measurement
errors may be checked via the residual
plot on the right-hand sided.
1
0
-1
-2
-2
-1
0
1
4
3
2
1
0
-1
-2
-3
-4
UCL=3.159
Mean=0.003609
LCL=-3.152
0
2
Normal Score
10
20
30
40
Observation Number
Histogram of Residuals
Residuals vs. Fits
3
2
Residual
Frequency
10
b). From the index plot of the standard
residuals, it seems that the homogeneity is
slightly violated since it seems the
variances in the left-end are smaller than
the variances in the right-end of the plot.
5
1
0
-1
-2
0
-2.0-1.5-1.0-0.50.0 0.5 1.0 1.5 2.0 2.5 3.0
Residual
100
200
300
400
500
600
700
Fit
c). Mean 0 assumption is never checked.
d). The independence assumption seems
ok. This may be seen from the index plot
of the standard residuals. However, we
are not 100% sure based on just the
picture.
From the index plot, it seems that
Observations 34 and 38 are outliers. Thus,
the assumption about the observation
equal liability is violated.
7/17/2015
ST3131 Solution 9
2
3
0.25
2
0.20
1
0.15
COOK1
SRES1
b). The table is omitted here!
c). The plots are as below!
0
-1
0.10
0.05
-2
0.00
Index
10
20
30
40
Index
10
20
30
40
From the index plot of SRES, we can see that observations 34 and 38 are outliers.
From the index plot of Cook, we can see that observations 34 and 38 are influential points. The cutoff
value 4(p+1)/(n-p-1)=4*7/(40-6-1)=.8485 fails to identify any influential points.
7/17/2015
ST3131 Solution 9
3
0.6
1
0.4
HAi
DFIT1
0.5
0
0.3
0.2
-1
0.1
0.0
Index
10
20
30
40
Index
10
20
30
40
From the index plot of DFIT, we can see observation 34 and 38 are influential points. Here the cutoff
value 2 ((p+1)/(n-p-1))^{1/2} =.9211 works.
From the Hadi measure, we fail to detect any influential points.
7/17/2015
ST3131 Solution 9
4
The potential-residuals plot
DDi
0.2
0.1
0.0
0.0
0.1
0.2
0.3
0.4
0.5
0.6
HHi
,
From HHi-axis it seems observations 8,9, and 15 should be identified as high leverage points but they are not
outliers.
From Ddi-axis, it seems observations 34 and 38 are outliers but they are not high leverage points.
d). Observations 34 and 38 are outliers (in Y-directions) but not high leverage points. Observations 8, 9 and 15
are high leverage points but they are not outliers in Y-directions.
7/17/2015
ST3131 Solution 9
5
Regression Analysis: Y versus X1, X2, X3
2.
The regression equation is
Y = 61.9 + 1.64 X1 + 2.18 X2 + 2.02 X3
Predictor
Coef
SE Coef
T
P
61.93
18.16
3.41
0.002
X1
1.6365
0.2208
7.41
0.000
X2
2.1769
0.2028
10.73
0.000
X3
2.0173
0.2398
8.41
0.000
Constant
S = 31.63
R-Sq = 94.1%
R-Sq(adj) = 93.6%
(a). Sum(u_iv_i)=35089.3, Sum(v_I^2)=17394.3,
Thus,
Beta3=sum(u_iv_i)/sum(v_I^2)=2.01729, verified.
(b). SEbeta3=S/sum(v_I^2)=31.63/17394.3^{1/2}=.239826, as desired.
3.
From the SRES-axis, we can see Observations 7 and 18 are outliers. But Observation 18 is not a high
leverage point.
From the Pii-axis, we can see observations 7 and 11 are high leverage points. But observation 11 is not an
outlier in Y-direction.
7/17/2015
ST3131 Solution 9
6
Regression Plot
R(YoX123) = -0.0000000 + 3.22952 R(X4oX123)
S = 26.8000
4. a) The added-variable plot is drawn and
put in the right-hand side. The fitted results
are as below. From the F-test in the ANOVA
table, we can see that the overall fit is highly
significant with p-value .001. It follows that
we should add X4 into the model.
R-Sq = 24.2 %
R-Sq(adj) = 22.2 %
100
R(YoX123)
50
0
-50
-10
0
10
R(X4oX123)
Regression Analysis: R(YoX123) versus R(X4oX123)
The regression equation is
R(YoX123) = -0.0000000 + 3.22952 R(X4oX123)
S = 26.8000
R-Sq = 24.2 %
R-Sq(adj) = 22.2 %
Analysis of Variance
Source
DF
SS
MS
F
P
1
8727.0
8726.95
12.1504
0.001
Error
38
27293.2
718.24
Total
39
36020.1
Regression
7/17/2015
ST3131 Solution 9
7
Regression Plot
R(YoX1234) = -0.0000000 - 0.657458 R(X5oX1234)
S = 26.5825
R-Sq = 1.6 %
R-Sq(adj) = 0.0 %
50
R(YoX1234)
2. b) The added-variable plot is put in the
right-hand side. It seems that the fitted line is
almost flat. The F-test for the overall fit is not
significant with p-value .434 (from the
ANOVA table below). It follows that we
should not add X5 into the model.
0
-50
-10
0
10
R(X5oX1234)
Regression Analysis: R(YoX1234) versus R(X5oX1234)
The regression equation is
R(YoX1234) = -0.0000000 - 0.657458 R(X5oX1234)
S = 26.5825
R-Sq = 1.6 %
R-Sq(adj) = 0.0 %
Analysis of Variance
Source
DF
SS
MS
F
P
1
441.3
441.251
0.624444
0.434
Error
38
26851.9
706.630
7/17/2015
Total
39
27293.2
Regression
ST3131 Solution 9
8
Regression Plot
R(YoX1234) = -0.0000000 - 0.958279 R(X6oX1234)
2. d). Since we can not add X5 and X6 into
the model, the best model contains at most 4
predictor variables. Since all coefficients
except the intercept of the model Y vs X1,
X2, X3, and X4 are significant, the best model
should be Y vs X1, X2, X3, and X4.
S = 26.7871
R-Sq = 0.1 %
R-Sq(adj) = 0.0 %
50
R(YoX1234)
2. c) The added-variable plot is put in the
right-hand side. It seems that the fitted line is
almost flat. The F-test for the overall fit is not
significant with p-value .849 (from the
ANOVA table below). It follows that we
should not add X6 into the model.
0
-50
-2
-1
0
1
R(X6oX1234)
Regression Analysis: R(YoX1234) versus R(X6oX1234)
The regression equation is
R(YoX1234) = -0.0000000 - 0.958279 R(X6oX1234)
S = 26.7871
R-Sq = 0.1 %
R-Sq(adj) = 0.0 %
Analysis of Variance
Source
DF
SS
MS
F
P
1
26.4
26.418
3.68E-02
0.849
Error
38
27266.8
717.547
Total
39
27293.2
Regression
7/17/2015
ST3131 Solution 9
9