FC5e_Ch12 - Webster in china

Download Report

Transcript FC5e_Ch12 - Webster in china

Business Statistics:
A First Course
Fifth Edition
Chapter 12
Simple Linear Regression
简单线性回归
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc.
Chap 12-1
Learning Objectives
In this chapter, you learn:
 How to use regression analysis to predict the value of a dependent
variable based on an independent variable 如何利用回归分析在给定
一个自变量(解释变量)的前提下来预测因变量的值?
 The meaning of the regression coefficients b0 and b1回归系数的解释
 How to evaluate the assumptions of regression analysis and know
what to do if the assumptions are violated 如何对回归分析的条件假
设进行检验并掌握在假设条件不满足时的处理方法
 To make inferences about the slope and correlation coefficient 对回
归方程中的斜率估计b1和决定自变量与因变量关系的相关系数进行统
计推断
 To estimate mean values and predict individual values 去估计期望值
和预测单个个体的因变量值
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-2
Correlation vs. Regression
相关分析和回归分析

A scatter plot can be used to show the relationship between
two variables 散点图可以简单刻画两个变量之间的关系

Correlation analysis is used to measure the strength of the
association (linear relationship) between two variables
相关分析用来度量两个变量之间的联系(特指线性关系)强度

Correlation is only concerned with strength of the relationship
相关分析只是度量关系的强度

No causal effect is implied with correlation
不带任何因果性分析
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-3
Introduction to
Regression Analysis

Regression analysis is used to:

Predict the value of a dependent variable based on the
value of at least one independent variable
给定自变量值下,预测应变量的值

Explain the impact of changes in an independent variable on
the dependent variable
可以解释自变量变动对因变量的影响程度
Dependent variable: the variable we wish to predict or explain
因变量也是我们研究感兴趣的对象,希望去预测和解释它
Independent variable: the variable used to predict or explain
the dependent variable
自变量是用来预测和解释因变量的
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-4
Simple Linear Regression
Model 简单线性回归模型

Only one independent variable, X 只考虑一个
自变量

Relationship between X and Y is described
by a linear function 假定自变量X和因变量Y之间
的关系是线性的

Changes in Y are assumed to be related to
changes in X 模型假定Y的变化可以由X的变化
引起
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-5
Types of Relationships
Linear relationships
线性关系
Y
Curvilinear relationships
曲线关系
Y
X
Y
X
Y
X
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
X
Chap 12-6
Types of Relationships
(continued)
Weak relationships
弱线性相关
Strong relationships
强线性相关
Y
Y
X
X
Y
Y
X
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
X
Chap 12-7
Types of Relationships
(continued)
No relationship 线性不相关
Y
X
Y
X
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-8
Simple Linear Regression
Model
Population
Slope
Coefficient
总体的斜
率
Population Y intercept
总体的截距项
Dependent
Variable因变
量
Independent
Variable
自变量
Random
Error term
随机误差项
Yi  β0  β1Xi  ε i
Linear component
线性部分
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Random Error
component
随机扰动部分
Chap 12-9
Simple Linear Regression
Model
(continued)
Y
Yi  β0  β1Xi  ε i
Observed Value
of Y for Xi Y的实
际观测值
εi
Predicted Value
of Y for Xi 给定Xi
下,Y的预测值
Slope = β1斜率
Random Error for this Xi value
给定Xi预测Y值时可能的随机扰动
Intercept = β0
截距项
Xi
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
X
Chap 12-10
Simple Linear Regression Equation (Prediction Line)
简单线性回归方程(预测直线)
The simple linear regression equation provides an estimate
of the population regression line 简单线性回归方程是总体回
归线的一个估计
Estimated
(or predicted)
Y value for
observation i
Estimate of the
regression
intercept 回归截
距项的估计
Estimate of the
regression slope
回归斜率的估计
第i个个体的Y值的估
计或者预测
Ŷi  b0  b1Xi
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Value of X for
observation i
第i个观测的
解释变量X的
值
Chap 12-11
The Least Squares Method
最小二乘方法
b0 and b1 are obtained by finding the values of
that minimize the sum of the squared
differences between Y and Ŷ :
通过拟合残差平方和最小的方法来得到b0和b1,
这里拟合残差的定义就是Y- Ŷ
min  (Yi Ŷi )  min  (Yi  (b0  b1Xi ))
2
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
2
Chap 12-12
Finding the Least Squares
Equation

The coefficients b0 and b1 , and other
regression results in this chapter, will be
found using Excel or Minitab
本章所有求解回归系数b0和b1都是通过Excel或者
Minitab得到
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-13
Interpretation of the
Slope and the Intercept

b0 is the estimated mean value of Y when
the value of X is zero
b0 是当自变量X值为零时,因变量Y的均值的估计

b1 is the estimated change in the mean
value of Y as a result of a one-unit change
in X
b1是指当自变量X变化一个单位时,Y的均值变化
的估计值
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-14
Simple Linear Regression
Example 例题


A real estate agent wishes to examine the relationship
between the selling price of a home and its size
(measured in square feet) 一个房地产经纪想调查房屋销
售价格和房子本身大小的关系
A random sample of 10 houses is selected 一个包含10
个房屋的随机样本被抽取


Dependent variable (Y) = house price in $1000s
因变量Y为房屋的价格(单位:千美元)
Independent variable (X) = square feet
自变量X为平方英尺
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-15
Simple Linear Regression
Example: Data 例题的数据
House Price in $1000s
(Y)
Square Feet
(X)
245
1400
312
1600
279
1700
308
1875
199
1100
219
1550
405
2350
324
2450
319
1425
255
1700
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-16
Simple Linear Regression Example:
Scatter Plot 例题数据对应的散点图
House price model: Scatter Plot
House Price ($1000s)
450
400
350
300
250
200
150
100
50
0
0
500
1000
1500
2000
2500
3000
Square Feet
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-17
Simple Linear Regression Example:
Using Excel Excel中简单线性回归实现
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-18
Simple Linear Regression Example:
Excel Output
Regression Statistics
Multiple R
0.76211
R Square
0.58082
Adjusted R Square
0.52842
Standard Error
The regression equation is 回归方程如下:
house price  98.24833  0.10977 (square feet)
41.33032
Observations
10
ANOVA
df
SS
MS
F
11.0848
Regression
1
18934.9348
18934.9348
Residual
8
13665.5652
1708.1957
Total
9
32600.5000
Coefficients
Intercept
Square Feet
Standard Error
t Stat
P-value
Significance F
0.01039
Lower 95%
Upper 95%
98.24833
58.03348
1.69296
0.12892
-35.57720
232.07386
0.10977
0.03297
3.32938
0.01039
0.03374
0.18580
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-19
Simple Linear Regression Example: Minitab
Output Minitab中简单线性回归实现
The regression
equation is:
The regression equation is
Price = 98.2 + 0.110 Square Feet
Predictor
Coef
SE Coef
Constant
98.25
58.03
Square Feet 0.10977 0.03297
T
P
1.69 0.129
3.33 0.010
house price = 98.24833 +
0.10977 (square feet)
S = 41.3303 R-Sq = 58.1% R-Sq(adj) = 52.8%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
8
9
SS
MS
F
P
18935 18935 11.08 0.010
13666 1708
32600
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-20
Simple Linear Regression Example:
Graphical Representation 图形表示
House price model: Scatter Plot and Prediction Line
房屋定价的简单线性回归模型
House Price ($1000s)
450
Intercept
= 98.248
400
350
Slope
= 0.10977
300
250
200
150
100
50
0
0
500
1000
1500
2000
2500
3000
Square Feet
house price  98.24833  0.10977 (square feet)
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-21
Simple Linear Regression
Example: Interpretation of bo
house price  98.24833  0.10977 (square feet)

b0 is the estimated mean value of Y when the value of X
is zero (if X = 0 is in the range of observed X values)
b0 值是给定X值为零时的Y均值的估计值(一般要求观察
到的X值应包含X=0)

Because a house cannot have a square footage of 0, b0
has no practical application
考虑到实际中不可能有房屋为0平方英尺,故b0 在这里无
实际应用意义
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-22
Simple Linear Regression
Example: Interpreting b1
house price  98.24833  0.10977 (square feet)

b1 estimates the change in the mean value of
Y as a result of a one-unit increase in X b1是
对X变化一个单位时引起Y均值的变化的估计

Here, b1 = 0.10977 tells us that the mean value of
a house increases by 0.10977($1000) = $109.77,
on average, for each additional one square foot of
size 每增加一平方英尺,房屋平均价格增加109.77
美元
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-23
Simple Linear Regression
Example: Making Predictions
Predict the price for a house with 2000 square
feet:预测一个2000平方英尺的房屋的价格
house price  98.25  0.1098 (sq.ft.)
 98.25  0.1098(200 0)
 317.85
The predicted price for a house with 2000
square feet is 317.85($1,000s) = $317,850
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-24
Simple Linear Regression
Example: Making Predictions

When using a regression model for prediction, only
predict within the relevant range of data 回归方程在预测
时,只对那些X取值在观测区间的有效
Relevant range for interpolation
X观测值区间
House Price ($1000s)
450
400
350
300
250
200
Do not try to extrapolate beyond
the range of observed X’s
150
100
50
0
0
500
1000
1500
2000
2500
3000
不能对超出X观测范围的回归直线
的外沿部分进行预测
Square Feet
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-25
Measures of Variation
随机扰动的度量

Total variation is made up of two parts:
SST  SSR  SSE
Total Sum of
Squares总平方和
SST   ( Yi  Y )2
Regression Sum of
Squares回归平方和
SSR   ( Ŷi  Y )2
Error Sum of Squares
残差平方和
SSE   ( Yi  Ŷi )2
where:
Y = Mean value of the dependent variable 因变量观测值的均值
Yi = Observed value of the dependent variable因变量第i个观测值
Yˆi = Predicted value of Y for the given Xi value因变量对应的估计值
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-26
Measures of Variation
(continued)

SST = total sum of squares


Measures the variation of the Yi values around their
mean Y 度量所有应变量观测值围绕均值的变差
SSR = regression sum of squares (Explained Variation)


(Total Variation)
Variation attributable to the relationship between X
and Y 由X解释掉的(通过回归方程)Y的变差
SSE = error sum of squares (Unexplained Variation)

Variation in Y attributable to factors other than X
X未能解释掉的Y的变差
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-27
Measures of Variation
(continued)
Y
Yi

SSE = (Yi - Yi )2

Y
_

Y
SST = (Yi - Y)2
 _
SSR = (Yi - Y)2
_
Y
Xi
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
_
Y
X
Chap 12-28
Coefficient of Determination,
决定系数 r2

The coefficient of determination is the portion of the
total variation in the dependent variable that is
explained by variation in the independent variable
决定系数指应变量的总变差(总平方和)中被自变量
解释掉的(回归平方和)部分的比例

The coefficient of determination is also called
r-squared and is denoted as r2也称为r平方
SSR regression sum of squares
r 

SST
total sum of squares
2
note:
0 r 1
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
2
Chap 12-29
Examples of r2 Values
Y
r2 = 1
r2 = 1
X
Y
r2
=1
Perfect linear relationship
between X and Y:
X和Y是完全的线性关系
100% of the variation in Y is
explained by variation in X
则Y的总变差完全被X100%的
解释掉
X
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-30
Examples of r2 Values
Y
0 < r2 < 1
X
Weaker linear relationships
between X and Y: X和Y的线
性关系相对较弱
X
Some but not all of the
variation in Y is explained
by variation in X
X可以解释但不能完全解释掉
Y的总变差
Y
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-31
Examples of r2 Values
r2 = 0
No linear relationship
between X and Y: X和Y完全没
有线性相关关系
Y
r2 = 0
X
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
The value of Y does not
depend on X. (None of the
variation in Y is explained by
variation in X)Y的变化与X无关
(X不能解释Y的任何变化)
Chap 12-32
Simple Linear Regression Example:
Coefficient of Determination, r2 in Excel
SSR 18934.9348
r 

 0.58082
SST 32600.5000
2
Regression Statistics
Multiple R
0.76211
R Square
0.58082
Adjusted R Square
0.52842
Standard Error
58.08% of the variation in house prices
is explained by variation in square feet
房屋面积的变动能够解释掉房价变化的
58.08%
41.33032
Observations
10
ANOVA
df
SS
MS
F
11.0848
Regression
1
18934.9348
18934.9348
Residual
8
13665.5652
1708.1957
Total
9
32600.5000
Coefficients
Intercept
Square Feet
Standard Error
t Stat
P-value
Significance F
0.01039
Lower 95%
Upper 95%
98.24833
58.03348
1.69296
0.12892
-35.57720
232.07386
0.10977
0.03297
3.32938
0.01039
0.03374
0.18580
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-33
Simple Linear Regression Example:
Coefficient of Determination, r2 in Minitab
The regression equation is
Price = 98.2 + 0.110 Square Feet
Predictor
Coef
SE Coef
Constant
98.25
58.03
Square Feet 0.10977 0.03297
T
P
1.69 0.129
3.33 0.010
r2 
SSR 18934.9348

 0.58082
SST 32600.5000
S = 41.3303 R-Sq = 58.1% R-Sq(adj) = 52.8%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
8
9
SS
MS
F
P
18935 18935 11.08 0.010
13666 1708
32600
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
58.08% of the variation in
house prices is explained by
variation in square feet
房屋面积的变动能够解释掉房
价变化的58.08%
Chap 12-34
Standard Error of Estimate
估计的标准差

The standard deviation of the variation of observations
around the regression line is estimated by 观察值回绕回
归直线变动的标准差的估计为
n
SSE
SYX 

n2

(Yi  Yˆi ) 2
i 1
n2
Where
SSE = error sum of squares 残差平方和
n = sample size 样本量
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-35
Simple Linear Regression Example:
Standard Error of Estimate in Excel
Regression Statistics
Multiple R
0.76211
R Square
0.58082
Adjusted R Square
0.52842
Standard Error
SYX  41.33032
41.33032
Observations
10
ANOVA
df
SS
MS
F
11.0848
Regression
1
18934.9348
18934.9348
Residual
8
13665.5652
1708.1957
Total
9
32600.5000
Coefficients
Intercept
Square Feet
Standard Error
t Stat
P-value
Significance F
0.01039
Lower 95%
Upper 95%
98.24833
58.03348
1.69296
0.12892
-35.57720
232.07386
0.10977
0.03297
3.32938
0.01039
0.03374
0.18580
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-36
Simple Linear Regression Example:
Standard Error of Estimate in Minitab
The regression equation is
Price = 98.2 + 0.110 Square Feet
Predictor
Coef
SE Coef
Constant
98.25
58.03
Square Feet 0.10977 0.03297
T
P
1.69 0.129
3.33 0.010
SYX  41.33032
S = 41.3303 R-Sq = 58.1% R-Sq(adj) = 52.8%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
8
9
SS
MS
F
P
18935 18935 11.08 0.010
13666 1708
32600
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-37
Comparing Standard Errors
SYX is a measure of the variation of observed Y values from the
regression line SYX 度量了所有观测到的Y围绕回归直线变化的程度
Y
Y
small SYX
X
large SYX
X
The magnitude of SYX should always be judged relative to the size of
the Y values in the sample data 判断SYX 的大小要参照Y本身度量的大小
i.e., SYX = $41.33K is moderately small relative to house prices in
the $200K - $400K range
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-38
Assumptions of Regression
L.I.N.E 回归模型中的假设




Linearity 线性关系
 The relationship between X and Y is linear X和Y之间具有
线性关系
Independence of Errors 独立性
 Error values are statistically independent 随机误差项之间
是相互独立的
Normality of Error 随机扰动的正态性
 Error values are normally distributed for any given value of
X 给定X下,误差项是服从正态分布的
Equal Variance (also called homoscedasticity) 等方差性
 The probability distribution of the errors has constant
variance 所有误差项的分布具有相同方差
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-39
Residual Analysis 残差分析
ei  Yi  Ŷi

The residual for observation i, ei, is the difference between its
observed and predicted value 残差就是观测值和预测值之间的差异

Check the assumptions of regression by examining the residuals
通过对残差的分析和检验,可以判断回归模型的假设是否成立


Examine for linearity assumption

Evaluate independence assumption

Evaluate normal distribution assumption

Examine for constant variance for all levels of X
(homoscedasticity)
Graphical Analysis of Residuals 常用残差图分析来实现

Can plot residuals vs. X 可以画残差 vs. X的散点图
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-40
Residual Analysis for Linearity
线性假设的残差分析
Y
Y
x
x
Not Linear
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
residuals
residuals
x
x

Linear
Chap 12-41
Residual Analysis for Independence
独立性假设的残差分析
Not Independent
X
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
residuals
residuals
X
residuals

Independent
X
Chap 12-42
Checking for Normality
正态性假设的检验




Examine the Stem-and-Leaf Display of the
Residuals 分析残差的茎叶图
Examine the Boxplot of the Residuals 分析残差
的盒子图
Examine the Histogram of the Residuals分析残
差的柱状图
Construct a Normal Probability Plot of the
Residuals构造残差的正态概率图
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-43
Residual Analysis for Normality
When using a normal probability plot, normal errors will
approximately display in a straight line 当用正态概率图进
行判断时,如果点近似在一直线附近,则认为正态性假设
成立
100
Percent
0
-3
-2
-1
0
1
2
3
Residual
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-44
Residual Analysis for Equal
Variance 方差齐性的残差分析
Y
Y
x
x
Non-constant variance
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
residuals
residuals
x
x

Constant variance
Chap 12-45
Simple Linear Regression Example:
Excel Residual Output
RESIDUAL OUTPUT
Residuals
1
251.92316
-6.923162
2
273.87671
38.12329
3
284.85348
-5.853484
4
304.06284
3.937162
5
218.99284
-19.99284
80
60
40
Residuals
Predicted
House Price
House Price Model Residual Plot
20
0
6
268.38832
-49.38832
-20
7
356.20251
48.79749
-40
8
367.17929
-43.17929
-60
9
254.6674
64.33264
10
284.85348
-29.85348
0
1000
2000
3000
Square Feet
Does not appear to violate any regression assumptions
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
似乎没有违反任何回归模型的假设
Chap 12-46
Inferences About the Slope
斜率的统计推断

The standard error of the regression slope
coefficient (b1) is estimated by 回归直线斜率估
计b1的标准差的估计为
S YX
Sb1 

SSX
S YX
 (X
i
 X)
2
where:
Sb1= Estimate of the standard error of the slope 斜率的标准差的估计
SSE
SYX 
= Standard error of the estimate 模型估计的标准误
n2
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-47
Inferences About the Slope:
t Test 斜率统计推断的t检验

t test for a population slope 总体斜率的t检验


Null and alternative hypotheses 对应的统计原假设和备择
假设



Is there a linear relationship between X and Y? X和Y之间有线性
关系吗
H0: β1 = 0
H1: β1 ≠ 0
(no linear relationship 无限性关系)
(linear relationship does exist 线性关系确实存在)
Test statistic 检验统计量
t STAT 
b1  β1
Sb
1
d.f.  n  2
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
where:
b1 = regression slope coefficient
回归斜率的估计
β1 = hypothesized slope
假设的回归斜率值
Sb1 = standard error of the slope
回归斜率的估计的标准误
Chap 12-48
Inferences About the Slope:
t Test Example
House Price
in $1000s
(y)
Square Feet
(x)
245
1400
312
1600
279
1700
308
1875
199
1100
219
1550
405
2350
324
2450
319
1425
255
1700
Estimated Regression Equation:
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
house price  98.25  0.1098 (sq.ft.)
The slope of this model is 0.1098
Is there a relationship between the
square footage of the house and its
sales price?
模型中斜率项的估计为0.1098,则
房屋的大小(平方英尺)和售价之
间有线性关系吗?
Chap 12-49
Inferences About the Slope:
t Test Example
H0: β1 = 0
H1: β1 ≠ 0
From Excel output:
Coefficients
Intercept
Square Feet
Standard Error
t Stat
P-value
98.24833
58.03348
1.69296
0.12892
0.10977
0.03297
3.32938
0.01039
From Minitab output:
b1
Predictor
Coef
SE Coef
Constant
98.25
58.03
Square Feet 0.10977 0.03297
T
P
1.69 0.129
3.33 0.010
b1
Sb1
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Sb1
t ST AT 
b1  β 1
Sb

0.10977  0
 3.32938
0.03297
1
Chap 12-50
Inferences About the Slope:
t Test Example
Test Statistic: tSTAT = 3.329
H0: β1 = 0
H1: β1 ≠ 0
d.f. = 10- 2 = 8
a/2=.025
Reject H0
a/2=.025
Do not reject H0
-tα/2
-2.3060
0
Reject H0
tα/2
2.3060
3.329
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Decision: Reject H0
There is sufficient evidence
that square footage affects
house price 有足够的证据
认为房屋的大小影响到房
屋的售价
Chap 12-51
Inferences About the Slope:
H :β =0
t Test Example
0
1
H1: β1 ≠ 0
From Excel output:
Coefficients
Intercept
Square Feet
Standard Error
t Stat
P-value
98.24833
58.03348
1.69296
0.12892
0.10977
0.03297
3.32938
0.01039
From Minitab output:
Predictor
Coef
SE Coef
Constant
98.25
58.03
Square Feet 0.10977 0.03297
T
P
1.69 0.129
3.33 0.010
p-value
Decision: Reject H0, since p-value < α
There is sufficient evidence that square footage affects
house price. 用p-值同样的结论可以得到
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-52
F Test for Significance
显著性的F检验

MSR
F Test statistic: F
STAT 
MSE
where
SSR
1
SSE
MSE 
n2
MSR 
where FSTAT follows an F distribution with 1 numerator and (n – 2)
denominator degrees of freedom F统计量服从F分布,其分子自由度为
1,分母自由度为n-2
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-53
F-Test for Significance
Excel Output
Regression Statistics
Multiple R
0.76211
R Square
0.58082
Adjusted R Square
0.52842
Standard Error
MSR 18934.9348
FSTAT 

 11.0848
MSE 1708.1957
41.33032
Observations
10
With 1 and 8 degrees
of freedom
p-value for
the F-Test
ANOVA
df
SS
MS
F
11.0848
Regression
1
18934.9348
18934.9348
Residual
8
13665.5652
1708.1957
Total
9
32600.5000
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Significance F
0.01039
Chap 12-54
F-Test for Significance
Minitab Output
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
8
9
SS
MS
F
P
18935 18935 11.08 0.010
13666 1708
32600
With 1 and 8 degrees
of freedom
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
FSTAT 
p-value for
the F-Test
MSR 18934.9348

 11.0848
MSE 1708.1957
Chap 12-55
F Test for Significance
(continued)
H0: β1 = 0
H1: β1 ≠ 0
a = .05
df1= 1
df2 = 8
Critical
Value:
Fa = 5.32
a = .05
0
Do not
reject H0
Reject H0
Test Statistic:
FSTAT 
MSR
 11.08
MSE
Decision:
Reject H0 at a = 0.05
Conclusion:
There is sufficient evidence that house
size affects selling price 有足够的理由认
F 为房子的面积影响到房子的销售价格
F.05 = 5.32
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-56
Confidence Interval Estimate
for the Slope 斜率估计的置信区间
Confidence Interval Estimate of the Slope:斜率的置信区间估计
b1  t α / 2 S b
1
d.f. = n - 2
Excel Printout for House Prices:
Intercept
Square Feet
Coefficients
Standard Error
t Stat
P-value
98.24833
0.10977
Lower 95%
Upper 95%
58.03348
1.69296
0.12892
-35.57720
232.07386
0.03297
3.32938
0.01039
0.03374
0.18580
At 95% level of confidence, the confidence interval for
the slope is (0.0337, 0.1858)
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-57
Confidence Interval Estimate
for the Slope
(continued)
Intercept
Square Feet
Coefficients
Standard Error
t Stat
P-value
98.24833
0.10977
Lower 95%
Upper 95%
58.03348
1.69296
0.12892
-35.57720
232.07386
0.03297
3.32938
0.01039
0.03374
0.18580
Since the units of the house price variable is $1000s, we
are 95% confident that the average impact on sales price is
between $33.74 and $185.80 per square foot of house size
可以认为房屋大小增加一平方英尺,平均带来售价的增长在
33.74和185.80之间
This 95% confidence interval does not include 0.
Conclusion: There is a significant relationship between
house price and square feet at the .05 level of significance
因为该区间不包括0,我们认为在0.05水平上,房屋售价和大
小的线性关系显著存在
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-58
Estimating Mean Values and Predicting
Individual Values 估计均值和预测个体值
Goal: Form intervals around Y to express uncertainty about the
value of Y for a given Xi 目标是在给定Xi下构造Y的置信区间来刻画Y
的不确定性
Confidence
Interval for
the mean of
Y, given Xi
Y

Y

Y = b0+b1Xi
Prediction Interval
for an individual Y,
given Xi
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Xi
X
Chap 12-59
Confidence Interval for
the Average Y, Given X
Confidence interval estimate for the mean value of Y given
a particular Xi 给定某个Xi时,对应Y的均值的置信区间估计公
式如下:
Confidence interval for μ Y|X  X i :
Yˆ  ta / 2SYX hi
Size of interval varies according to distance away from mean, X
区间的大小随着每一个X 离开均值距离大小不同而不同
1 (Xi  X)2 1
(Xi  X)2
hi  
 
n
SSX
n  (Xi  X)2
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-60
Prediction Interval for
an Individual Y, Given X
Prediction interval estimate for an Individual value of Y
given a particular Xi 给定某个Xi下,该个体Y值的预测区间
估计公式如下:
Prediction interval for YXX :
i
Ŷ  t α / 2S YX 1  hi
This extra term adds to the interval width to reflect the added uncertainty
for an individual case 相对于Y平均值的置信区间,这个而外多出来的部
分反映了个体本身不确定性带来的可能变动
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-61
Estimation of Mean Values:
Example
Confidence Interval Estimate for μY|X=X
i
Find the 95% confidence interval for the mean price
of 2,000 square-foot houses

Predicted Price Yi = 317.85 ($1,000s)
Ŷ  t 0.025S YX
1

n
(X i  X) 2

(X i  X) 2
 317.85  37.12
The confidence interval endpoints are 280.66 and 354.90,
or from $280,660 to $354,900
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-62
Estimation of Individual Values:
Example
Prediction Interval Estimate for YX=X
i
Find the 95% prediction interval for an individual
house with 2,000 square feet

Predicted Price Yi = 317.85 ($1,000s)
Ŷ  t 0.025S YX 1 
1

n
(X i  X) 2

(X i  X) 2
 317.85  102.28
The prediction interval endpoints are 215.50 and 420.07,
or from $215,500 to $420,070
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-63
Finding Confidence and
Prediction Intervals in Excel
(continued)
Input values

Y
Confidence Interval Estimate for μY|X=Xi
Prediction Interval Estimate for YX=Xi
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-64
Finding Confidence and
Prediction Intervals in Minitab
Confidence Interval Estimate for μY|X=Xi
Predicted Values for New Observations
New
Obs Fit SE Fit
95% CI
95% PI
1 317.8 16.1 (280.7, 354.9) (215.5, 420.1)

Y
Values of Predictors for New Observations
New Square
Obs Feet
1
2000
Input values
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Prediction Interval Estimate for YX=Xi
Chap 12-65
Chapter Summary





Introduced types of regression models
Reviewed assumptions of regression and
correlation
Discussed determining the simple linear
regression equation
Described measures of variation
Discussed residual analysis
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-66
Chapter Summary
(continued)




Described inference about the slope
Discussed correlation -- measuring the strength
of the association
Addressed estimation of mean values and
prediction of individual values
Discussed possible pitfalls in regression and
recommended strategies to avoid them
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc..
Chap 12-67