Transcript 何謂誤差平方和?
The Simple Linear Regression Model
Simple Linear Regression Model
y = 0 + 1 x +
Simple Linear Regression Equation
E(y) = 0 + 1x
Estimated Simple Linear Regression Equation
y^ = b0 + b1x
Slide 1
最小平方直線(最佳預測直線)
通過平面分佈圖資料點的直線中,使預測誤差平方和
爲最小者即稱爲最小平方直線,而此方法即稱爲最小
平方法(Least Square Method)
何謂誤差平方和?
設 ( x1 , y1 ), ( x2 , y2 ),...,( xn , yn )爲n個資料點,若以
y b0 b1 x 做
爲以X預測Y的直線,則當X=x1,預測值
y1 b0 b1 x 與實際觀
察的y1之差異 y1 y1 即稱爲預測誤差,誤差平方和即定義爲
n
n
f (b0 , b1 ) ( yi y i ) ( yi b0 b1 xi ) 2
i 1
2
i 1
求 b0 , b1使函數 f 爲最小時,由微積分解“極大或極小”方法。
Slide 2
最小平方直線
解此聯立方程組
可得
f (b0 , b1 )
0
b
0
f (b , b )
0 1
0
b1
xi yi ( xi yi ) / n
b1 :
2
2
x
(
x
)
i i /n
b y b x
1
0
故最小平方直線為 yˆ b0 b1x y b1 x b1x y b1( x x)
Slide 3
Example: Reed Auto Sales
Simple Linear Regression
Reed Auto periodically has a special week-long sale.
As part of the advertising campaign Reed runs one or
more television commercials during the weekend
preceding the sale. Data from a sample of 6 previous
sales are shown below.
Number of TV Ads
1
3
2
1
3
2
Number of Cars Sold
14
24
18
17
27
22
Slide 4
Example: Reed Auto Sales
Slope for the Estimated Regression Equation
b1 = 264 - (12)(122)/5 = 5
28 - (12)2/5
y-Intercept for the Estimated Regression Equation
b0 = 20.333 - 5(2) = 10.333
Estimated Regression Equation
y^ = 10.333 + 5x
Slide 5
Example: Reed Auto Sales
Scatter Diagram
30
Cars Sold
20
10
0
0
1
2
3
4
TV ad
Slide 6
The Coefficient of Determination
Relationship Among SST, SSR, SSE
SST = SSR + SSE
2
2
^ )2
( y i y ) ( y^i y ) ( y i y
i
Coefficient of Determination
r2 = SSR/SST
where:
SST = total sum of squares
SSR = sum of squares due to regression
SSE = sum of squares due to error
Slide 7
判定係數
定義: r2 = SSR/SST
用以表示Y的變異數中已被X解釋的部分(比率)
• 當r2 愈大時,表示最小平方直線愈精確
• 1- r2為總變異數(SST)中無法由X解釋的餘量(剩餘的比率)
Example: Reed Auto Sales
• r2 = SSR/SST = 100/117.333 = .852273
•
表示汽車銷售量的差異與變化有85.2%可由“廣告次數”這個
因素來解釋(而有14.8%無法由“廣告次數”所解釋)
Slide 8
The Correlation Coefficient
Sample Correlation Coefficient
rxy (sign of b1 ) Coefficien t of Determinat ion
rxy (sign of b1 ) r 2
where:
b1 = the slope of the estimated regression
equation yˆ b0 b1 x
Slide 9
Example: Reed Auto Sales
Sample Correlation Coefficient
rxy (sign of b1 ) r 2
The sign of b1 in the equation yˆ 10.333 5 x is “+”.
rxy 0.852273
rxy = +.923186
Slide 10
Model Assumptions
Assumptions About the Error Term
• The error is a random variable with mean of
zero.
• The variance of , denoted by 2, is the same for
all values of the independent variable.
• The values of are independent.
• The error is a normally distributed random
variable.
Slide 11
Testing for Significance
To test for a significant regression relationship, we
must conduct a hypothesis test to determine whether
the value of 1 is zero.
Two tests are commonly used
• t Test
• F Test
Both tests require an estimate of 2, the variance of
in the regression model.
Slide 12
Testing for Significance
An Estimate of 2
The mean square error (MSE) provides the estimate
of 2, and the notation s2 is also used.
s2 = MSE = SSE/(n-2)
where:
SSE (yi yˆi ) 2 ( yi b0 b1 xi ) 2
Slide 13
Testing for Significance
An Estimate of
• To estimate we take the square root of 2.
• The resulting s is called the standard error of the
estimate.
SSE
s MSE
n2
Slide 14
Testing for Significance: t Test
Hypotheses
H0 : 1 = 0
Ha : 1 = 0
Test Statistic
Rejection Rule
b1
t
sb1
where sb1
s
2
(
x
x
)
i
Reject H0 if t < -t or t > t
where t is based on a t distribution with
n - 2 degrees of freedom.
Slide 15
Example: Reed Auto Sales
t Test
• Hypotheses
• Rejection Rule
H 0 : 1 = 0
H a : 1 = 0
For = .05 and d.f. = 4, t.025 = 2.776
Reject H0 if t > 2.776
• Test Statistics
t = 5/1.0408 = 4.804
• Conclusions
Reject H0
• P-value
2P{T>4.804}=0.0086 <0.05
Reject H0
Slide 16
Confidence Interval for 1
We can use a 95% confidence interval for 1 to test
the hypotheses just used in the t test.
H0 is rejected if the hypothesized value of 1 is not
included in the confidence interval for 1.
Slide 17
Confidence Interval for 1
The form of a confidence interval for 1 is:
b1 t / 2 sb1
where
b1
t / 2 sb1
t / 2
is the point estimate
is the margin of error
is the t value providing an area
of /2 in the upper tail of a
t distribution with n - 2 degrees
of freedom
Slide 18
Example: Reed Auto Sales
Rejection Rule
Reject H0 if 0 is not included in the confidence
interval for 1.
95% Confidence Interval for 1
b1 t / 2 sb1 = 5 2.776(1.0408) = 5 2.89
or 2.11 to 7.89
Conclusion
Reject H0
Slide 19
Testing for Significance: F Test
Hypotheses
H 0 : 1 = 0
H a : 1 = 0
Test Statistic
F = MSR/MSE
Rejection Rule
Reject H0 if F > F
where F is based on an F distribution with 1 d.f. in
the numerator and n - 2 d.f. in the denominator.
Slide 20
Example: Reed Auto Sales
F Test
• Hypotheses
• Rejection Rule
H 0 : 1 = 0
H a : 1 = 0
For = .05 and d.f. = 1, 4: F.05 = 7.709
Reject H0 if F > 7.709.
• Test Statistic
F = MSR/MSE = 100/4.333 = 23.077
• Conclusion
We can reject H0.
Slide 21
Some Cautions about the
Interpretation of Significance Tests
Rejecting H0: 1 = 0 and
concluding that the relationship
between x and y is significant
does not enable us to conclude
that a cause-and-effect
relationship is present between x
and y.
Just because we are able to reject
H0: 1 = 0 and demonstrate
statistical significance does not
enable us to conclude that there
is a linear relationship between x
and y.
Slide 22
Using the Estimated Regression Equation
for Estimation and Prediction
Confidence Interval Estimate of E(yp)
y p t /2 s y p
Prediction Interval Estimate of yp
yp + t/2 sind
where the confidence coefficient is 1 - and
t/2
is based on a t distribution with n - 2 d.f.
s yˆ p is the standard error of the estimate of E(yp)
sind
is the standard error of individual
ˆp
estimate of y
Slide 23
Standard Errors of Estimate of
E(yp) and yp
s yˆ p
( x0 x ) 2
1
S
n ( xi x ) 2
sind
( x0 x ) 2
1
S 1
n ( xi x ) 2
Slide 24
E(yp) 與yp估計式的變異數
2
y 的變異數:
b1 ( x0 x) 的變異數: S
的變異數: 2
E( yp) 0 1 x0 估計式的變異數:
n
2
b1
( x0 x)
2
2 ( x0 x) 2
2
(
x
x
)
i
Var( yˆ ) Var[ y b1 ( x0 x)]
2 where 2s (x0 ( xsx )x )2
n
b1
2
2
(
x
x
)
i
i
yp 0 1 x0 估計式的變異數:
Var ( yˆ ) 2
Slide 25
Example: Reed Auto Sales
Point Estimation
If 3 TV ads are run prior to a sale, we expect the
mean number of cars sold to be:
y^ = 10.333 + 5(3) = 25.333 cars
Confidence Interval for E(yp)
95% confidence interval estimate of the mean number
of cars sold when 3 TV ads are run is:
25.333 + 3.730 = 21.603 to 29.063 cars
Prediction Interval for yp
95% prediction interval estimate of the number of
cars sold in one particular week when 3 TV ads are
run is:
25.333 + 6.878 = 18.455 to 32.211 cars
Slide 26