FC5e_Ch12 - Webster in china
Download
Report
Transcript FC5e_Ch12 - Webster in china
Business Statistics:
A First Course
Fifth Edition
Chapter 12
Simple Linear Regression
简单线性回归
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.
Chap 12-1
Learning Objectives
In this chapter, you learn:
How to use regression analysis to predict the value of a dependent
variable based on an independent variable 如何利用回归分析在给定
一个自变量(解释变量)的前提下来预测因变量的值?
The meaning of the regression coefficients b0 and b1回归系数的解释
How to evaluate the assumptions of regression analysis and know
what to do if the assumptions are violated 如何对回归分析的条件假
设进行检验并掌握在假设条件不满足时的处理方法
To make inferences about the slope and correlation coefficient 对回
归方程中的斜率估计b1和决定自变量与因变量关系的相关系数进行统
计推断
To estimate mean values and predict individual values 去估计期望值
和预测单个个体的因变量值
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-2
Correlation vs. Regression
相关分析和回归分析
A scatter plot can be used to show the relationship between
two variables 散点图可以简单刻画两个变量之间的关系
Correlation analysis is used to measure the strength of the
association (linear relationship) between two variables
相关分析用来度量两个变量之间的联系(特指线性关系)强度
Correlation is only concerned with strength of the relationship
相关分析只是度量关系的强度
No causal effect is implied with correlation
不带任何因果性分析
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-3
Introduction to
Regression Analysis
Regression analysis is used to:
Predict the value of a dependent variable based on the
value of at least one independent variable
给定自变量值下,预测应变量的值
Explain the impact of changes in an independent variable on
the dependent variable
可以解释自变量变动对因变量的影响程度
Dependent variable: the variable we wish to predict or explain
因变量也是我们研究感兴趣的对象,希望去预测和解释它
Independent variable: the variable used to predict or explain
the dependent variable
自变量是用来预测和解释因变量的
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-4
Simple Linear Regression
Model 简单线性回归模型
Only one independent variable, X 只考虑一个
自变量
Relationship between X and Y is described
by a linear function 假定自变量X和因变量Y之间
的关系是线性的
Changes in Y are assumed to be related to
changes in X 模型假定Y的变化可以由X的变化
引起
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-5
Types of Relationships
Linear relationships
线性关系
Y
Curvilinear relationships
曲线关系
Y
X
Y
X
Y
X
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
X
Chap 12-6
Types of Relationships
(continued)
Weak relationships
弱线性相关
Strong relationships
强线性相关
Y
Y
X
X
Y
Y
X
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
X
Chap 12-7
Types of Relationships
(continued)
No relationship 线性不相关
Y
X
Y
X
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-8
Simple Linear Regression
Model
Population
Slope
Coefficient
总体的斜
率
Population Y intercept
总体的截距项
Dependent
Variable因变
量
Independent
Variable
自变量
Random
Error term
随机误差项
Yi β0 β1Xi ε i
Linear component
线性部分
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Random Error
component
随机扰动部分
Chap 12-9
Simple Linear Regression
Model
(continued)
Y
Yi β0 β1Xi ε i
Observed Value
of Y for Xi Y的实
际观测值
εi
Predicted Value
of Y for Xi 给定Xi
下,Y的预测值
Slope = β1斜率
Random Error for this Xi value
给定Xi预测Y值时可能的随机扰动
Intercept = β0
截距项
Xi
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
X
Chap 12-10
Simple Linear Regression Equation (Prediction Line)
简单线性回归方程(预测直线)
The simple linear regression equation provides an estimate
of the population regression line 简单线性回归方程是总体回
归线的一个估计
Estimated
(or predicted)
Y value for
observation i
Estimate of the
regression
intercept 回归截
距项的估计
Estimate of the
regression slope
回归斜率的估计
第i个个体的Y值的估
计或者预测
Ŷi b0 b1Xi
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Value of X for
observation i
第i个观测的
解释变量X的
值
Chap 12-11
The Least Squares Method
最小二乘方法
b0 and b1 are obtained by finding the values of
that minimize the sum of the squared
differences between Y and Ŷ :
通过拟合残差平方和最小的方法来得到b0和b1,
这里拟合残差的定义就是Y- Ŷ
min (Yi Ŷi ) min (Yi (b0 b1Xi ))
2
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
2
Chap 12-12
Finding the Least Squares
Equation
The coefficients b0 and b1 , and other
regression results in this chapter, will be
found using Excel or Minitab
本章所有求解回归系数b0和b1都是通过Excel或者
Minitab得到
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-13
Interpretation of the
Slope and the Intercept
b0 is the estimated mean value of Y when
the value of X is zero
b0 是当自变量X值为零时,因变量Y的均值的估计
b1 is the estimated change in the mean
value of Y as a result of a one-unit change
in X
b1是指当自变量X变化一个单位时,Y的均值变化
的估计值
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-14
Simple Linear Regression
Example 例题
A real estate agent wishes to examine the relationship
between the selling price of a home and its size
(measured in square feet) 一个房地产经纪想调查房屋销
售价格和房子本身大小的关系
A random sample of 10 houses is selected 一个包含10
个房屋的随机样本被抽取
Dependent variable (Y) = house price in $1000s
因变量Y为房屋的价格(单位:千美元)
Independent variable (X) = square feet
自变量X为平方英尺
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-15
Simple Linear Regression
Example: Data 例题的数据
House Price in $1000s
(Y)
Square Feet
(X)
245
1400
312
1600
279
1700
308
1875
199
1100
219
1550
405
2350
324
2450
319
1425
255
1700
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-16
Simple Linear Regression Example:
Scatter Plot 例题数据对应的散点图
House price model: Scatter Plot
House Price ($1000s)
450
400
350
300
250
200
150
100
50
0
0
500
1000
1500
2000
2500
3000
Square Feet
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-17
Simple Linear Regression Example:
Using Excel Excel中简单线性回归实现
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-18
Simple Linear Regression Example:
Excel Output
Regression Statistics
Multiple R
0.76211
R Square
0.58082
Adjusted R Square
0.52842
Standard Error
The regression equation is 回归方程如下:
house price 98.24833 0.10977 (square feet)
41.33032
Observations
10
ANOVA
df
SS
MS
F
11.0848
Regression
1
18934.9348
18934.9348
Residual
8
13665.5652
1708.1957
Total
9
32600.5000
Coefficients
Intercept
Square Feet
Standard Error
t Stat
P-value
Significance F
0.01039
Lower 95%
Upper 95%
98.24833
58.03348
1.69296
0.12892
-35.57720
232.07386
0.10977
0.03297
3.32938
0.01039
0.03374
0.18580
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-19
Simple Linear Regression Example: Minitab
Output Minitab中简单线性回归实现
The regression
equation is:
The regression equation is
Price = 98.2 + 0.110 Square Feet
Predictor
Coef
SE Coef
Constant
98.25
58.03
Square Feet 0.10977 0.03297
T
P
1.69 0.129
3.33 0.010
house price = 98.24833 +
0.10977 (square feet)
S = 41.3303 R-Sq = 58.1% R-Sq(adj) = 52.8%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
8
9
SS
MS
F
P
18935 18935 11.08 0.010
13666 1708
32600
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-20
Simple Linear Regression Example:
Graphical Representation 图形表示
House price model: Scatter Plot and Prediction Line
房屋定价的简单线性回归模型
House Price ($1000s)
450
Intercept
= 98.248
400
350
Slope
= 0.10977
300
250
200
150
100
50
0
0
500
1000
1500
2000
2500
3000
Square Feet
house price 98.24833 0.10977 (square feet)
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-21
Simple Linear Regression
Example: Interpretation of bo
house price 98.24833 0.10977 (square feet)
b0 is the estimated mean value of Y when the value of X
is zero (if X = 0 is in the range of observed X values)
b0 值是给定X值为零时的Y均值的估计值(一般要求观察
到的X值应包含X=0)
Because a house cannot have a square footage of 0, b0
has no practical application
考虑到实际中不可能有房屋为0平方英尺,故b0 在这里无
实际应用意义
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-22
Simple Linear Regression
Example: Interpreting b1
house price 98.24833 0.10977 (square feet)
b1 estimates the change in the mean value of
Y as a result of a one-unit increase in X b1是
对X变化一个单位时引起Y均值的变化的估计
Here, b1 = 0.10977 tells us that the mean value of
a house increases by 0.10977($1000) = $109.77,
on average, for each additional one square foot of
size 每增加一平方英尺,房屋平均价格增加109.77
美元
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-23
Simple Linear Regression
Example: Making Predictions
Predict the price for a house with 2000 square
feet:预测一个2000平方英尺的房屋的价格
house price 98.25 0.1098 (sq.ft.)
98.25 0.1098(200 0)
317.85
The predicted price for a house with 2000
square feet is 317.85($1,000s) = $317,850
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-24
Simple Linear Regression
Example: Making Predictions
When using a regression model for prediction, only
predict within the relevant range of data 回归方程在预测
时,只对那些X取值在观测区间的有效
Relevant range for interpolation
X观测值区间
House Price ($1000s)
450
400
350
300
250
200
Do not try to extrapolate beyond
the range of observed X’s
150
100
50
0
0
500
1000
1500
2000
2500
3000
不能对超出X观测范围的回归直线
的外沿部分进行预测
Square Feet
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-25
Measures of Variation
随机扰动的度量
Total variation is made up of two parts:
SST SSR SSE
Total Sum of
Squares总平方和
SST ( Yi Y )2
Regression Sum of
Squares回归平方和
SSR ( Ŷi Y )2
Error Sum of Squares
残差平方和
SSE ( Yi Ŷi )2
where:
Y = Mean value of the dependent variable 因变量观测值的均值
Yi = Observed value of the dependent variable因变量第i个观测值
Yˆi = Predicted value of Y for the given Xi value因变量对应的估计值
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-26
Measures of Variation
(continued)
SST = total sum of squares
Measures the variation of the Yi values around their
mean Y 度量所有应变量观测值围绕均值的变差
SSR = regression sum of squares (Explained Variation)
(Total Variation)
Variation attributable to the relationship between X
and Y 由X解释掉的(通过回归方程)Y的变差
SSE = error sum of squares (Unexplained Variation)
Variation in Y attributable to factors other than X
X未能解释掉的Y的变差
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-27
Measures of Variation
(continued)
Y
Yi
SSE = (Yi - Yi )2
Y
_
Y
SST = (Yi - Y)2
_
SSR = (Yi - Y)2
_
Y
Xi
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
_
Y
X
Chap 12-28
Coefficient of Determination,
决定系数 r2
The coefficient of determination is the portion of the
total variation in the dependent variable that is
explained by variation in the independent variable
决定系数指应变量的总变差(总平方和)中被自变量
解释掉的(回归平方和)部分的比例
The coefficient of determination is also called
r-squared and is denoted as r2也称为r平方
SSR regression sum of squares
r
SST
total sum of squares
2
note:
0 r 1
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
2
Chap 12-29
Examples of r2 Values
Y
r2 = 1
r2 = 1
X
Y
r2
=1
Perfect linear relationship
between X and Y:
X和Y是完全的线性关系
100% of the variation in Y is
explained by variation in X
则Y的总变差完全被X100%的
解释掉
X
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-30
Examples of r2 Values
Y
0 < r2 < 1
X
Weaker linear relationships
between X and Y: X和Y的线
性关系相对较弱
X
Some but not all of the
variation in Y is explained
by variation in X
X可以解释但不能完全解释掉
Y的总变差
Y
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-31
Examples of r2 Values
r2 = 0
No linear relationship
between X and Y: X和Y完全没
有线性相关关系
Y
r2 = 0
X
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
The value of Y does not
depend on X. (None of the
variation in Y is explained by
variation in X)Y的变化与X无关
(X不能解释Y的任何变化)
Chap 12-32
Simple Linear Regression Example:
Coefficient of Determination, r2 in Excel
SSR 18934.9348
r
0.58082
SST 32600.5000
2
Regression Statistics
Multiple R
0.76211
R Square
0.58082
Adjusted R Square
0.52842
Standard Error
58.08% of the variation in house prices
is explained by variation in square feet
房屋面积的变动能够解释掉房价变化的
58.08%
41.33032
Observations
10
ANOVA
df
SS
MS
F
11.0848
Regression
1
18934.9348
18934.9348
Residual
8
13665.5652
1708.1957
Total
9
32600.5000
Coefficients
Intercept
Square Feet
Standard Error
t Stat
P-value
Significance F
0.01039
Lower 95%
Upper 95%
98.24833
58.03348
1.69296
0.12892
-35.57720
232.07386
0.10977
0.03297
3.32938
0.01039
0.03374
0.18580
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-33
Simple Linear Regression Example:
Coefficient of Determination, r2 in Minitab
The regression equation is
Price = 98.2 + 0.110 Square Feet
Predictor
Coef
SE Coef
Constant
98.25
58.03
Square Feet 0.10977 0.03297
T
P
1.69 0.129
3.33 0.010
r2
SSR 18934.9348
0.58082
SST 32600.5000
S = 41.3303 R-Sq = 58.1% R-Sq(adj) = 52.8%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
8
9
SS
MS
F
P
18935 18935 11.08 0.010
13666 1708
32600
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
58.08% of the variation in
house prices is explained by
variation in square feet
房屋面积的变动能够解释掉房
价变化的58.08%
Chap 12-34
Standard Error of Estimate
估计的标准差
The standard deviation of the variation of observations
around the regression line is estimated by 观察值回绕回
归直线变动的标准差的估计为
n
SSE
SYX
n2
(Yi Yˆi ) 2
i 1
n2
Where
SSE = error sum of squares 残差平方和
n = sample size 样本量
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-35
Simple Linear Regression Example:
Standard Error of Estimate in Excel
Regression Statistics
Multiple R
0.76211
R Square
0.58082
Adjusted R Square
0.52842
Standard Error
SYX 41.33032
41.33032
Observations
10
ANOVA
df
SS
MS
F
11.0848
Regression
1
18934.9348
18934.9348
Residual
8
13665.5652
1708.1957
Total
9
32600.5000
Coefficients
Intercept
Square Feet
Standard Error
t Stat
P-value
Significance F
0.01039
Lower 95%
Upper 95%
98.24833
58.03348
1.69296
0.12892
-35.57720
232.07386
0.10977
0.03297
3.32938
0.01039
0.03374
0.18580
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-36
Simple Linear Regression Example:
Standard Error of Estimate in Minitab
The regression equation is
Price = 98.2 + 0.110 Square Feet
Predictor
Coef
SE Coef
Constant
98.25
58.03
Square Feet 0.10977 0.03297
T
P
1.69 0.129
3.33 0.010
SYX 41.33032
S = 41.3303 R-Sq = 58.1% R-Sq(adj) = 52.8%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
8
9
SS
MS
F
P
18935 18935 11.08 0.010
13666 1708
32600
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-37
Comparing Standard Errors
SYX is a measure of the variation of observed Y values from the
regression line SYX 度量了所有观测到的Y围绕回归直线变化的程度
Y
Y
small SYX
X
large SYX
X
The magnitude of SYX should always be judged relative to the size of
the Y values in the sample data 判断SYX 的大小要参照Y本身度量的大小
i.e., SYX = $41.33K is moderately small relative to house prices in
the $200K - $400K range
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-38
Assumptions of Regression
L.I.N.E 回归模型中的假设
Linearity 线性关系
The relationship between X and Y is linear X和Y之间具有
线性关系
Independence of Errors 独立性
Error values are statistically independent 随机误差项之间
是相互独立的
Normality of Error 随机扰动的正态性
Error values are normally distributed for any given value of
X 给定X下,误差项是服从正态分布的
Equal Variance (also called homoscedasticity) 等方差性
The probability distribution of the errors has constant
variance 所有误差项的分布具有相同方差
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-39
Residual Analysis 残差分析
ei Yi Ŷi
The residual for observation i, ei, is the difference between its
observed and predicted value 残差就是观测值和预测值之间的差异
Check the assumptions of regression by examining the residuals
通过对残差的分析和检验,可以判断回归模型的假设是否成立
Examine for linearity assumption
Evaluate independence assumption
Evaluate normal distribution assumption
Examine for constant variance for all levels of X
(homoscedasticity)
Graphical Analysis of Residuals 常用残差图分析来实现
Can plot residuals vs. X 可以画残差 vs. X的散点图
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-40
Residual Analysis for Linearity
线性假设的残差分析
Y
Y
x
x
Not Linear
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
residuals
residuals
x
x
Linear
Chap 12-41
Residual Analysis for Independence
独立性假设的残差分析
Not Independent
X
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
residuals
residuals
X
residuals
Independent
X
Chap 12-42
Checking for Normality
正态性假设的检验
Examine the Stem-and-Leaf Display of the
Residuals 分析残差的茎叶图
Examine the Boxplot of the Residuals 分析残差
的盒子图
Examine the Histogram of the Residuals分析残
差的柱状图
Construct a Normal Probability Plot of the
Residuals构造残差的正态概率图
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-43
Residual Analysis for Normality
When using a normal probability plot, normal errors will
approximately display in a straight line 当用正态概率图进
行判断时,如果点近似在一直线附近,则认为正态性假设
成立
100
Percent
0
-3
-2
-1
0
1
2
3
Residual
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-44
Residual Analysis for Equal
Variance 方差齐性的残差分析
Y
Y
x
x
Non-constant variance
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
residuals
residuals
x
x
Constant variance
Chap 12-45
Simple Linear Regression Example:
Excel Residual Output
RESIDUAL OUTPUT
Residuals
1
251.92316
-6.923162
2
273.87671
38.12329
3
284.85348
-5.853484
4
304.06284
3.937162
5
218.99284
-19.99284
80
60
40
Residuals
Predicted
House Price
House Price Model Residual Plot
20
0
6
268.38832
-49.38832
-20
7
356.20251
48.79749
-40
8
367.17929
-43.17929
-60
9
254.6674
64.33264
10
284.85348
-29.85348
0
1000
2000
3000
Square Feet
Does not appear to violate any regression assumptions
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
似乎没有违反任何回归模型的假设
Chap 12-46
Inferences About the Slope
斜率的统计推断
The standard error of the regression slope
coefficient (b1) is estimated by 回归直线斜率估
计b1的标准差的估计为
S YX
Sb1
SSX
S YX
(X
i
X)
2
where:
Sb1= Estimate of the standard error of the slope 斜率的标准差的估计
SSE
SYX
= Standard error of the estimate 模型估计的标准误
n2
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-47
Inferences About the Slope:
t Test 斜率统计推断的t检验
t test for a population slope 总体斜率的t检验
Null and alternative hypotheses 对应的统计原假设和备择
假设
Is there a linear relationship between X and Y? X和Y之间有线性
关系吗
H0: β1 = 0
H1: β1 ≠ 0
(no linear relationship 无限性关系)
(linear relationship does exist 线性关系确实存在)
Test statistic 检验统计量
t STAT
b1 β1
Sb
1
d.f. n 2
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
where:
b1 = regression slope coefficient
回归斜率的估计
β1 = hypothesized slope
假设的回归斜率值
Sb1 = standard error of the slope
回归斜率的估计的标准误
Chap 12-48
Inferences About the Slope:
t Test Example
House Price
in $1000s
(y)
Square Feet
(x)
245
1400
312
1600
279
1700
308
1875
199
1100
219
1550
405
2350
324
2450
319
1425
255
1700
Estimated Regression Equation:
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
house price 98.25 0.1098 (sq.ft.)
The slope of this model is 0.1098
Is there a relationship between the
square footage of the house and its
sales price?
模型中斜率项的估计为0.1098,则
房屋的大小(平方英尺)和售价之
间有线性关系吗?
Chap 12-49
Inferences About the Slope:
t Test Example
H0: β1 = 0
H1: β1 ≠ 0
From Excel output:
Coefficients
Intercept
Square Feet
Standard Error
t Stat
P-value
98.24833
58.03348
1.69296
0.12892
0.10977
0.03297
3.32938
0.01039
From Minitab output:
b1
Predictor
Coef
SE Coef
Constant
98.25
58.03
Square Feet 0.10977 0.03297
T
P
1.69 0.129
3.33 0.010
b1
Sb1
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Sb1
t ST AT
b1 β 1
Sb
0.10977 0
3.32938
0.03297
1
Chap 12-50
Inferences About the Slope:
t Test Example
Test Statistic: tSTAT = 3.329
H0: β1 = 0
H1: β1 ≠ 0
d.f. = 10- 2 = 8
a/2=.025
Reject H0
a/2=.025
Do not reject H0
-tα/2
-2.3060
0
Reject H0
tα/2
2.3060
3.329
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Decision: Reject H0
There is sufficient evidence
that square footage affects
house price 有足够的证据
认为房屋的大小影响到房
屋的售价
Chap 12-51
Inferences About the Slope:
H :β =0
t Test Example
0
1
H1: β1 ≠ 0
From Excel output:
Coefficients
Intercept
Square Feet
Standard Error
t Stat
P-value
98.24833
58.03348
1.69296
0.12892
0.10977
0.03297
3.32938
0.01039
From Minitab output:
Predictor
Coef
SE Coef
Constant
98.25
58.03
Square Feet 0.10977 0.03297
T
P
1.69 0.129
3.33 0.010
p-value
Decision: Reject H0, since p-value < α
There is sufficient evidence that square footage affects
house price. 用p-值同样的结论可以得到
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-52
F Test for Significance
显著性的F检验
MSR
F Test statistic: F
STAT
MSE
where
SSR
1
SSE
MSE
n2
MSR
where FSTAT follows an F distribution with 1 numerator and (n – 2)
denominator degrees of freedom F统计量服从F分布,其分子自由度为
1,分母自由度为n-2
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-53
F-Test for Significance
Excel Output
Regression Statistics
Multiple R
0.76211
R Square
0.58082
Adjusted R Square
0.52842
Standard Error
MSR 18934.9348
FSTAT
11.0848
MSE 1708.1957
41.33032
Observations
10
With 1 and 8 degrees
of freedom
p-value for
the F-Test
ANOVA
df
SS
MS
F
11.0848
Regression
1
18934.9348
18934.9348
Residual
8
13665.5652
1708.1957
Total
9
32600.5000
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Significance F
0.01039
Chap 12-54
F-Test for Significance
Minitab Output
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
8
9
SS
MS
F
P
18935 18935 11.08 0.010
13666 1708
32600
With 1 and 8 degrees
of freedom
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
FSTAT
p-value for
the F-Test
MSR 18934.9348
11.0848
MSE 1708.1957
Chap 12-55
F Test for Significance
(continued)
H0: β1 = 0
H1: β1 ≠ 0
a = .05
df1= 1
df2 = 8
Critical
Value:
Fa = 5.32
a = .05
0
Do not
reject H0
Reject H0
Test Statistic:
FSTAT
MSR
11.08
MSE
Decision:
Reject H0 at a = 0.05
Conclusion:
There is sufficient evidence that house
size affects selling price 有足够的理由认
F 为房子的面积影响到房子的销售价格
F.05 = 5.32
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-56
Confidence Interval Estimate
for the Slope 斜率估计的置信区间
Confidence Interval Estimate of the Slope:斜率的置信区间估计
b1 t α / 2 S b
1
d.f. = n - 2
Excel Printout for House Prices:
Intercept
Square Feet
Coefficients
Standard Error
t Stat
P-value
98.24833
0.10977
Lower 95%
Upper 95%
58.03348
1.69296
0.12892
-35.57720
232.07386
0.03297
3.32938
0.01039
0.03374
0.18580
At 95% level of confidence, the confidence interval for
the slope is (0.0337, 0.1858)
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-57
Confidence Interval Estimate
for the Slope
(continued)
Intercept
Square Feet
Coefficients
Standard Error
t Stat
P-value
98.24833
0.10977
Lower 95%
Upper 95%
58.03348
1.69296
0.12892
-35.57720
232.07386
0.03297
3.32938
0.01039
0.03374
0.18580
Since the units of the house price variable is $1000s, we
are 95% confident that the average impact on sales price is
between $33.74 and $185.80 per square foot of house size
可以认为房屋大小增加一平方英尺,平均带来售价的增长在
33.74和185.80之间
This 95% confidence interval does not include 0.
Conclusion: There is a significant relationship between
house price and square feet at the .05 level of significance
因为该区间不包括0,我们认为在0.05水平上,房屋售价和大
小的线性关系显著存在
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-58
Estimating Mean Values and Predicting
Individual Values 估计均值和预测个体值
Goal: Form intervals around Y to express uncertainty about the
value of Y for a given Xi 目标是在给定Xi下构造Y的置信区间来刻画Y
的不确定性
Confidence
Interval for
the mean of
Y, given Xi
Y
Y
Y = b0+b1Xi
Prediction Interval
for an individual Y,
given Xi
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Xi
X
Chap 12-59
Confidence Interval for
the Average Y, Given X
Confidence interval estimate for the mean value of Y given
a particular Xi 给定某个Xi时,对应Y的均值的置信区间估计公
式如下:
Confidence interval for μ Y|X X i :
Yˆ ta / 2SYX hi
Size of interval varies according to distance away from mean, X
区间的大小随着每一个X 离开均值距离大小不同而不同
1 (Xi X)2 1
(Xi X)2
hi
n
SSX
n (Xi X)2
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-60
Prediction Interval for
an Individual Y, Given X
Prediction interval estimate for an Individual value of Y
given a particular Xi 给定某个Xi下,该个体Y值的预测区间
估计公式如下:
Prediction interval for YXX :
i
Ŷ t α / 2S YX 1 hi
This extra term adds to the interval width to reflect the added uncertainty
for an individual case 相对于Y平均值的置信区间,这个而外多出来的部
分反映了个体本身不确定性带来的可能变动
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-61
Estimation of Mean Values:
Example
Confidence Interval Estimate for μY|X=X
i
Find the 95% confidence interval for the mean price
of 2,000 square-foot houses
Predicted Price Yi = 317.85 ($1,000s)
Ŷ t 0.025S YX
1
n
(X i X) 2
(X i X) 2
317.85 37.12
The confidence interval endpoints are 280.66 and 354.90,
or from $280,660 to $354,900
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-62
Estimation of Individual Values:
Example
Prediction Interval Estimate for YX=X
i
Find the 95% prediction interval for an individual
house with 2,000 square feet
Predicted Price Yi = 317.85 ($1,000s)
Ŷ t 0.025S YX 1
1
n
(X i X) 2
(X i X) 2
317.85 102.28
The prediction interval endpoints are 215.50 and 420.07,
or from $215,500 to $420,070
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-63
Finding Confidence and
Prediction Intervals in Excel
(continued)
Input values
Y
Confidence Interval Estimate for μY|X=Xi
Prediction Interval Estimate for YX=Xi
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-64
Finding Confidence and
Prediction Intervals in Minitab
Confidence Interval Estimate for μY|X=Xi
Predicted Values for New Observations
New
Obs Fit SE Fit
95% CI
95% PI
1 317.8 16.1 (280.7, 354.9) (215.5, 420.1)
Y
Values of Predictors for New Observations
New Square
Obs Feet
1
2000
Input values
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Prediction Interval Estimate for YX=Xi
Chap 12-65
Chapter Summary
Introduced types of regression models
Reviewed assumptions of regression and
correlation
Discussed determining the simple linear
regression equation
Described measures of variation
Discussed residual analysis
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-66
Chapter Summary
(continued)
Described inference about the slope
Discussed correlation -- measuring the strength
of the association
Addressed estimation of mean values and
prediction of individual values
Discussed possible pitfalls in regression and
recommended strategies to avoid them
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-67