Transcript Document

Lecture 11
Review of Lecture 10
Prediction and Prediction Intervals
More Examples about Model Comparison
7/18/2015
ST3131, Lecture 11
1
Steps for Model Comparison :
RM H0: The RM is adequate vs FM H1: The FM is adequate
Step1: Fit the FM and get SSE (in the ANOVA table)
df (in the ANOVA table)
R_sq (under the Coefficient Table)
Step 2: Fit the RM and get SSE, df, and R_sq.
Step 3: Compute F-statistic:
F
( SSER  SSEF ) /(df R  df F )
~ F( df R df F ,df F ) .
SSEF / df F
( RF2  RR2 ) / r
F
~ F (r, n  p  1)
2
(1  RF ) /(n  p  1)
Step 4: Conclusion: Reject H0 if F>F(r,df(SSE,F),alpha)
Can’t Reject H0 otherwise.
7/18/2015
ST3131, Lecture 11
2
Special Case: ANOVA Table (Analysis of Variance)
RM H 0 : Y   0  
FM H1 : Y   0   1 X 1  ...   p X p  
F
( SSER  SSEF ) / p MSRF

~ F ( p, n  p  1)
SSEF /(n  p  1)
MSEF
RF2 / p
F
~ F( p,n p 1) .
(1  RF2 ) /(n  p  1)
Source
Sum of df
Squares
Mean Square
F-test
F=MSR/MSE
Regression SSR
p
MSR=SSR/p
Residuals
SSE
n-p-1
MSE=SSE/(n-p-1)
Total
SST
n-1
7/18/2015
ST3131, Lecture 11
P-value
3
Predictions: Recall the prediction for the SLR model:
Prediction: for SLR Model
yi   0  1 xi   i ,
 i ~ N (0,  2 )
For any given NEW x 0 , both thePrediction of the response y x0
and the Estimation of the expected response  x 0 are yˆ x0  ˆ 0  ˆ1 x 0
Standard Errors
(x  x)2
1
s.e.( yˆ x0  y x0 )  1   n 0
ˆ
n
2
 ( xi  x )
s.e.( yˆ x0   x0 ) 
i 1
i 1
a 100( 1-)% PI for y x0 is
yˆ x0  t ( n  2, / 2 ) s.e.( yˆ x0 -y x0 )
7/18/2015
( x0  x ) 2
1
 n
ˆ
n
 ( xi  x ) 2
a 100( 1-α ) % CI for  x0 is
ˆ x  t ( n  2, / 2 ) s.e.( ˆ x )
0
ST3131, Lecture 11
0
4
yi   0  1 xi1  ..   p xip   i,
Prediction: for MLR Model
 i ~ N (0, 2 )
For any given NEW x0 , both thePredictionof theresponsey x0
and theEstimationof theexpectedresponse x 0 are
yˆ x0  ˆ 0  ˆ1 x01  ..  ˆ p x0 p
Standard Errors
s.e.( yˆ x0  y x0 )  1 
1
 ( x0  x )' S 1 ( x0  x )ˆ
n
s.e.(ˆ x0   x0 ) 
1
 ( x0  x )' S 1 ( x0  x )ˆ
n
a 100( 1-)% PI for y x0 is
yˆ x0  t ( n p 1, / 2) s.e.( yˆ x0 -yx0 )
a 100( 1-α ) % CI for  x0 is
ˆ x  t ( n  p 1, / 2 ) s.e.(ˆ x )
0
0
where S -1  (n  1)Var ( X )
7/18/2015
ST3131, Lecture 11
5
Problem 3.5 (Page 76, textbook) Table 3.11 shows the regression output, with some
numbers erased, when a simple regression model relating a response variable Y to a
predictor variable X1 is fitted based on 20 observations. Complete the 13 missing
numbers, then compute Var(Y) and Var(X1).
ANOVA Table
Source
Sum of Squares
Regression
1848.76
Mean Square
df
F-test
Residual
Total
Coefficient Table
Variable
Coefficients
s.e.
Constant
-23.4325
12.74
X1
n=
7/18/2015
R^2=
T-test
P-value
.0824
.1528
8.32
<.0001
Ra^2=
S=
df
ST3131, Lecture 11
6
n  20, p  1, df  n - p - 1  20 - 1 - 1  18,
SSR  1848.76
ˆ 0  23.4325, s.e.(ˆ 0 )  12.74,
T  ˆ / s.e.(ˆ )  23.4325/ 12.74  1.839
0
0
0
s.e.(ˆ1 )  .1528, T1  8.32,
ˆ  T  s.e.(ˆ )  8.32  .1528  1.2713.
1
F
1
1
SSR / 1
 T12  F  8.322  69.32
SSE /(n  2)
ˆ 2  SSE /(n  2)  SSR / T12  1848.76 / 8.322  26.707
MSE  ˆ 2  26.707, ˆ  26.707  5.1679
SSE  (n  2)ˆ 2  18  26.707  480.73,
SST  SSR  SSE  1828.76  480.73  2329.49
R 2  SSR / SST  1828.76 / 2329.49  79.35%
Ra2  1 
7/18/2015
SSE /(n  2)
480.73 / 18
 1
 78.21%
SST /(n  1)
2329.49 / 19
ST3131, Lecture 11
7
s.e.(ˆ1 )  ˆ /
 (x
i
 x ) 2  ˆ / (n  1)Var( X )
 Var( X )  (ˆ / s.e.(ˆ1 ))2 /(n  1)  26.707/(.15282 ) / 19  1143.87 / 19  60.20
Var(Y )  SST /(n  1)  2329.49 / 19  122.60
7/18/2015
ST3131, Lecture 11
8
Problem 3.12 (Page 78, textbook) Table 3.14 shows the regression output of a MLR
model relating the beginning salaries in dollars of employees in a given company to the
following predictor variables:
Sex (X1):
An indicator variable(man=1, woman=0)
Education(X2):
Years of Schooling at the time of hire
Experience(X3):
Number of months previous work experience
Months(X4):
Number of months with the company
In (a)-(b) below, specify the null and alternative hypotheses the test used, and your
conclusion using a 5% level of significance.
7/18/2015
ST3131, Lecture 11
9
Table 3.14 ANOVA Table
Source
Sum of Squares
df
Mean Square
F-test
Regression
23665352
4
5916338
22.98
Residual
22657938
88
257477
Total
46323290
92
Coefficient Table
Variable
Coefficients
s.e.
T-test
P-value
Constant
3526.4
327.7
10.76
.000
Sex
722.5
117.8
6.13
.000
Education
90.02
24.69
3.65
.000
Experience
1.2690
.5877
2.16
.034
Month
23.406
5.201
4.50
.000
n=93
R^2=.515
Ra^2=.489
S=507.4
Df=88
7/18/2015
ST3131, Lecture 11
10
(a) Conduct the F-test for the overall fit of the regression (F(4,88,.05)<2.53)
Test
H0:
Statistic
F=
Conclusion:
vs H1:
df=(
,
H0, the overall fit is
)
significant.
(b) Is there a positive linear relationship between Salary and Experience, after
accounting for the effect of the variables Sex, Education, and Months.
Test
H0:
Statistic T=
vs H1:
P-value=
Conclusion:
H0. The positive relationship is
significance level.
7/18/2015
ST3131, Lecture 11
significant at 5%
11
(c) What salary would you forecast for a man with 12 years of Education, 10 months of
Experience, and 15 months with the company?
(d) What salary would you forecast, on average, for a man with 12 years of Education,
10 months of Experience, and 15 months with the company?
7/18/2015
ST3131, Lecture 11
12
(e) What salary would you forecast, on average, for a woman with 12 years of
Education, 10 months of Experience, and 15 months with the company?
Problem 3.13 (Page 79, textbook) Consider the regression model that generated output
in Table 31.4 to be a Full Model. Now consider the Reduced Model in which Salary is
regression on only Education . The ANOVA table obtained when fitting this model is
shown in Table 3.15. Conduct a single test to compare the Full and Reduced Models.
What conclusion can be drawn from the result of the test? (Use 5% significant level).
7/18/2015
ST3131, Lecture 11
13
Table 3.15 ANOVA Table
Sum of Squares
df
Mean Square
F-test
7862535
1
7862535
18.60
Residual
38460756
91
422646
Total
46323291
92
Source
Regression
Test H0:
Statistic
vs
H1:
SSE(R )=
SSE(F)=
df(R )=
df(F)=
F=
df=(
,
Conclusion:
7/18/2015
).
H0. The Reduced Model is
ST3131, Lecture 11
significant
14
After-class Questions:
1.
Why ANOVA table can be used to test if R_sq=0?
2.
Why F-test can be used to test if the effect of a predictor variable is
significant or not?
7/18/2015
ST3131, Lecture 11
15