Transcript Slide 1

Chapter 13
Nonlinear
and
Multiple Regression
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
13.1
Aptness
of the Model
and Model Checking
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Standardized Residuals
The standardized residuals are given by
*
ei
y i  yˆ i

s 1
1
n

( xi  x )
i  1, ..., n
2
 (x j  x )
2
j
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Diagnostic Plots
The basic plots for an assessment of model
validity and usefulness are
1. ei* (or ei) on the vertical axis vs. xi
on the horizontal axis.
2. ei* (or ei) on the vertical axis vs. yi
on the horizontal axis.
(these two plots are called residual plots)
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Diagnostic Plots
3. yˆ i on the vertical axis vs. yi on the
horizontal axis.
4. A normal probability plot of the
standardized residuals.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Difficulties in the Plots
1. A nonlinear probabilistic relationship
between x and y is appropriate.
2. The variance of  (and of Y) is not a
constant  2 but depends on x.
3. The selected model fits well except for a
few outlying data values, which may have
greatly influenced the choice of the bestfit function.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Difficulties in the Plots
4. The error term  does not have a
normal distribution.
5. When the subscript i indicates the time
order of the observations, the  i 's exhibit
dependence over time.
6. One or more relevant independent
variables have been omitted from the
model.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Abnormality in Data
(and Remedies)
Non linear relationship
(use nonlinear model)
Discrepant observation
Non-constant variance
(weighted least squares)
Observation with large
influence (omit value or MAD)
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Abnormality in Data
(and Remedies)
Dependence in errors
(transform y’s or model
including time)
Variable omitted (multiple
regression model including
omitted variable)
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
13.2
Regression
With
Transformed
Variables
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Intrinsically Linear – Function
A function relating y to x is intrinsically
linear if by means of a transformation
on x and/or y, the function can be
expressed as y    0   1 x , where x , y 
are the transformed independent and
dependent variables respectively.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Intrinsically Linear Functions
Function
y e
Transformation
x
Linear Form
y   ln ( y )
y   ln ( )   x
y   ln ( y ), x   lo g ( x )
y   ln ( )   x
(exponential)
y x

(power)
y     log( x )
y  
1
x
(reciprocal)
x   ln( x )
x 
1
y     x
y     x
x
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Intrinsically Linear – Probabilistic
Model
A probabilistic model relating Y to x is
intrinsically linear if by means of a
transformation on Y and/or x, the it can
be reduced to a linear probabilistic
model
Y    0   1 x    .
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Intrinsically Linear Probabilistic
Models
Exponential
x
Y   e 
Power
Y x

ln (Y )  Y   ln ( )   x  ln (  )
log(Y )  Y   log( )   log( x )  log(  )

( x )
Y     log( x )  
x   log( x ) yields the m odel
Reciprocal
1
Y   
1
x
x 
yields m odel
x
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Analyzing Transformed Data
1. Estimating  0 and  1 and then
transforming back to obtain estimates of
the original parameters is not equivalent
to using the principle of least squares on
the original model.
2. If the chosen model is not intrinsically
linear, least squares would have to be
applied to the untransformed model.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Analyzing Transformed Data
3. When a transformed model satisfies the
assumptions in chap.12, the method of least
squares yields the best estimates of the
transformed parameters. The estimates of
the original parameters may not be the best.
4. After a transformation on y, to use
standard formulas to test hypotheses or
construct CI’s,  i 's should be at least
normally distributed.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Analyzing Transformed Data
5. When y is transformed, the r2 value
from the resulting regression refers to
variation in the y i 's explained by the
transformation regression model.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Logit Function
An example where we have a function
of  0   1 x .
p( x) 
e
 0  1 x
1 e
 0  1 x
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Logistic Regression
Logistic regression means assuming
that p(x) is related to x by the logit
function.
Algebra yields:
e
 0  1 x

p( x)
1  p( x)
Called
the odds
ratio
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
13.3
Polynomial
Regression
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Polynomial Regression Model
The kth-degree polynomial regression
model equation is
Y   0   1 x   2 x  ...   k x  
2
k
 is normally distributed with
  0
  
2
2
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Regression Models
Quadratic Model
Cubic Model
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Estimation of Parameters Using
Least Squares
k + 1 normal equations:
b0 n  b1  x i  b2 
b0  x i  b1 
b0 
k
xi
2
xi
 b1 
2
xi
 ...  bk 
 ...  bk 
k 1
xi
k 1
xi
 ...  bk 

2k
xi
k
xi


yi
 xi y i


k
xi
yi
Solve for estimates of ˆ1 , ... ˆ k .
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
ˆ
Estimate of 
and R
2
2
ˆ  s 
2
2
SSE
2
n  ( k  1)
 M SE
Coefficient of multiple determination: R2
R
2
 1
SST
Adjusted R2
2
Ra
 1
SSE
n 1

SSE
n  ( k  1) SST
2

( n  1) R  k
n 1 k
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Confidence Interval and Test
A 100(1   )% C I for  i , the
coefficient of xi in the polynomial
regression function is
ˆi  t / 2, n  ( k 1)  s ˆ
i
A test of H 0 :  i   i 0 is based on the
following t statistic value and n – (k + 1)
df.
ˆ  
t
i
i0
s ˆ
i
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
CI for  Y  x *
Let x* denote a particular value of x.
A 100(1   )% CI for  Y  x * is
ˆ Y  x *  t / 2 , n  ( k 1)
with
 estim ated S D 


 of ˆ Y  x *

k
ˆ
ˆ
ˆ
ˆ
Y   0   1 x *  ...   k ( x *)
sYˆ the est. standard deviation
becomes yˆ  t / 2 , n  ( k 1)  sYˆ
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
PI
A 100(1   )% P I for a future y value is
ˆ Y  x *  t / 2 , n  ( k 1)
 yˆ  t / 2, n  ( k 1) 

 2  estim ated S D 
 s  

ˆ
of

Y  x*



2
2




1/ 2
2
s  sYˆ
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Centering x Values
Let x = the average of the xi’s for which
observations are to be taken and
consider
Y  0 
*
1 ( x
 x)
*
2 (x
*
1 ( x
 x)
*
2 (x
 x)
 x ),
In this model  Y  x   0 
and the parameters describe the behavior of
the regression near the center of the data.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
13.4
Multiple
Regression
Analysis
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
General Additive Multiple
Regression Model Equation
Y   0   1 x1   2 x 2  ...   k x k  
where E (  )  0 an d V (  )   .
For purposes of testing hypotheses and
calculating CIs or PIs, assume  is
normally distributed.
2
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Models
1. The first-order model:
Y   0   1 x1   2 x 2  
2. Second-order no-interaction model:
Y   0   1 x1   2 x 2 
2
 3 x1

2
 4 x2

3. First-order predictors and interaction:
Y   0   1 x1   2 x 2   3 x1 x 2  
4. Second-order (full quadratic) model:
Y   0   1 x1   2 x 2 
2
 3 x1

2
 4 x2
  5 x1 x 2  
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Models With Predictors for
Categorical Variables
Using simple numerical coding,
qualitative (categorical) variables can be
incorporated into a model. With a
dichotomous variable associate an
indicator (dummy) variable x whose
possible values are 0 and 1.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Estimation of Parameters
Normal equations:
b0 n  b1  x1 j  b 2  x 2 j  ...  b k  x kj 

yj
b0  x1 j  b1  x1 j  b 2  x1 j x 2 j  ...  b k  x1 j x kj 
2
 x1 j y j
b0  x kj  b1  x1 j x kj  ...  b k 1  x k 1, j x kj  b k  x kj 
2
 x kj y j
Solve for estimates of ˆ1 , ... ˆ k .
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
ˆ
Estimate of 
and R
2
2
ˆ  s 
2
2
SSE
2
n  ( k  1)
 M SE
Coefficient of multiple determination: R2
R
2
 1
SST
Adjusted R2
2
Ra
 1
SSE
n 1

SSE
n  ( k  1) SST
2

( n  1) R  k
n 1 k
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Test
Null hypoth.: H 0 :  1  ...   k  0
Alt. hypoth.: H a : at least one  i  0
2
Test statistic:
f 

R /k
2
(1  R ) /[ n  ( k  1)]
SSR / k
S S E /[ n  ( k  1)]

M SR
M SE
Rejection region: f  F , k , n  ( k 1)
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Inferences Based on the Model
Y   0   1 x1   2 x 2  ...   k x k  
1. A 100(1   )% C I for  i , the
coefficient of xi in the regression
function is
ˆi  t / 2, n  ( k 1)  s ˆ
i
Simultaneous CIs for several  i 's for
which the simultaneous confidence level
is controlled can be obtained by the
Bonferroni technique.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Inferences Based on the Model
Y   0   1 x1   2 x 2  ...   k x k  
2. A test of H 0 :  i   i 0 is based on the
following t statistic value and n – (k + 1)
df. The test is upper-, lower-, or twotailed according to whether Ha contains
the inequality >, < or  .
t
ˆi   i 0
s ˆ
i
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Inferences Based on the Model
Y   0   1 x1   2 x 2  ...   k x k  
3. A 100(1   )% C I for  Y  x * ,..., x * is
1
 Y  x * ,..., x *  t / 2, n  ( k 1)
1
k
 yˆ  t
where
/ 2 , n  ( k  1)
k
estim
ated
S
D





of  Y  x * ,..., x * 


1
k


 sYˆ
*
*
ˆ
ˆ
ˆ
ˆ
Y   0   1 x1  ...   k x k
yˆ is the calculated value of Yˆ
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Inferences Based on the Model
Y   0   1 x1   2 x 2  ...   k x k  
4. A 100(1   )% P I for a future y value is
 Y  x * ,..., x *  t / 2 , n  ( k 1)
1
k
 yˆ  t / 2, n  ( k 1) 
2

 2  estim ated S D 
 s  

 of  *

*
Y

x
,...,
x

1
k



2





1/ 2
2
s  sYˆ
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
An F Test for a Group of Predictors
H 0 :  l 1  ...   k  0
(so Y   0   1 x1  ...   l xl  
is correct)
versus
H a : at least one  l 1 , ...,  k  0
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Procedure
SSEk = unexplained variation (full model)
SSEl = unexplained variation (reduced model)
Test statistic: f 
(S S E k  S S E l ) /( k  1)
(S S E k ) /[ n  ( k  1)]
Rejection Region: f  F , k , n  ( k 1)
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
13.5
Other Issues
in
Multiple Regression
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Transformations in Multiple
Regression
Theoretical considerations or
diagnostic plots may suggest a
nonlinear relation between a
dependent variable and two or more
independent variables. Frequently a
transformation will linearize the
model.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Standardizing Variables
Let xi and si be the sampled average
and standard deviation of the xij’s
Let x i   ( x i  x i ) / s i , then the coded full
second-order model with two
independent variables has regression
function
E (Y )   0   1 x1   2 x 2   3 x3   4 x 4   5 x5
• increased numerical accuracy
• more accurate estimation for the parameters
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Variable Selection
1. If we examine regressions involving
all possible subsets of the predictors
for which data is available, what
criteria should be used to select a
model?
2. If the number of predictors is too
large to examine all regressions, can a
good model be found by examining a
reduced number of subsets?
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Criteria for Variable Selection
2
Rk ,
the coefficient of multiple
determination for a k-predictor model.
Identify k for which R k2 is nearly as large
as R2 for all predictors in the model.
1.
2. MSEk = SSE/(n – k – 1), the mean
squared error for a k –predictor model.
Find the model having minimum MSEk.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Criteria for Variable Selection
3. C k 
SSE k
s
2
 2( k  1)  n
A desirable model is specified by a
subset of predictors for which Ck is
small.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Stepwise Regression
When the number of predictors is too large
to allow for explicit or implicit
examination of all possible subsets,
alternative selection procedures generally
identify good models. Two of these
methods are the backward elimination
(BE) method and the forward selection
method (FS).
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Identifying Influential Observations
In general, the predicted values corresponding
to the sample observations can be written
yˆ1  h1 1 y1  h1 2 y 2  ...  h1 n y n
yˆ n  h n 1 y1  h n 2 y 2  ...  h n n y n
If hjj > 2(k + 1)/n, the jth observation is
potentially influential (some regard 3(k + 1)/n
as the value.)
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.