Chapter 7 Polynomial Regression Models

Transcript Chapter 7 Polynomial Regression Models

Chapter 7 Polynomial Regression
Models
Ray-Bing Chen
Institute of Statistics
National University of Kaohsiung
1
7.1 Introdution
• The linear regression model y = X +  is a
general model for fitting any relationship that is
linear in the unknown parameter .
• Polynomial regression model:
2
7.2 Polynomial Models in One
Variable
7.2.1 Basic Principles
• A second-order model (quadratic model):
3
4
•
•
•
Polynomial models are useful in situations where
the analyst knows that curvilinear effects are
present in the true response function.
Polynomial models are also useful as
approximating functions to unknown and
possible very complex nonlinear relationship.
Polynomial model is the Taylor series expansion
of the unknown function.
5
• Several important conditions:
– Order of the model: The order (k) should be as
low as possible. The high-order polynomials (k >
2) should be avoided unless they can be justified
for reasons outside the data. In an extreme case it
is always possible to pass a polynomial of order n1 through n point so that a polynomial of
sufficiently high degree can always be found that
provides a “good” fit to the data.
– Model Building Strategy: Various strategies for
choosing the order of an approximating
polynomial have been suggested. Two procedures:
forward selection and backward elimination.
6
• Extrapolation: Extrapolation with polynomial
models can be extreme hazardous. (see Figure 7.2)
• Ill-Conditioning I: The X’X matrix becomes illconditioned as the order increases. It means that
the matrix inversion calculations will be
inaccurate, and considerable error may be
introduced into the parameter estimates.
• Ill-Conditioning II: If the values of x are limited
to a narrow range, there can be significant illconditioning or multicollinearity in the columns of
X.
7
8
• Hierarchy: The regression model
is said to be hierarchical because it contains all
terms of order three and lower. Only hierarchical
models are invariant under linear transformation.
Example 7.1 The Hardwood Data:
• The strength of kraft paper (y) v.s. the % of
hardwood.
• Data in Table 7.1
• A scatter plot in Figure 7.3
9
10
11
12
13
14
15
7.2.2 Piecewise Polynomial Fitting (Splines)
• Sometimes a low-order polynomial provides a
poor fit to the data. But increasing the order of the
polynomial modestly does not substantially
improve the situation.
• This problem may occur when the function
behaves differently in different parts of the range
of x.
• A usual approach is to divide the range of x into
segments and fit an appropriate curve in each
segment.
• Spline functions offer a useful way to perform this
type of piecewise polynomial fitting.
16
• Splines are piecewise polynomials of order k.
• The joint points of the pieces are usually called
knots.
• Generally the function values and the first k-1
derivatives agree at the knots. That is slpine is a
continuous function with k-1 continues derivatives.
• Cubic Spline:
17
• It is not simple to decide the number and position
of the knots and the order of the polynomial in
each segment.
• Wold (1974) suggests
– there should be as few knots as possible, with at
least four or five data points per segment.
– There should be no more than one extreme
point and one point of inflexion per segment.
• The great flexibility of spline functions makes it
very easy to overfit the data.
18
• Cubic slpine model with h knots and no
continuous restriction:
• The fewer continuity restrictions required, the
better if the fit.
• The more continuity restrictions required, the
worse is the fit but smoother the final curve will
be.
19
•
•
20
• X’X becomes ill-conditioned if there is a large
number of knots.
• Use a different representation of the slpine: cubic
B-spline.
21
Example 7.2 Voltage Drop Data
• The battery voltage drop in a guided missile
motor observed over the time of missile flight is
shown in Table 7.3.
• The Scatter-plot is in Figure 7.6
• Model the data with a cubic slpine using two knots
at 6.5 and 13.
22
23
• The ANOVA
• A plot of the residual v.s. the fitted values and a
normal probability plot of the residuals are in
Figure 7.7 and Figure 7.8
24
25
26
27
28
Example 7.3 Piecewise Linear Regression
• An important special case of practical interest
fitting piecewise linear regression models.
• This can be treated easily using linear splines.
29
30
•
31
7.2.3 Polynomial and Trigonometric Terms
• Sometimes consider the models as the
combination of polynomial and trigonometric
terms.
• From the scatter-plot, there may be some
periodicity or cyclic behavior in the data.
• A model with fewer terms may result than if only
polynomial terms are employed.
• The model
32
• If the regressor x is equally spaced, then the pairs
of terms sin(jx) and cos(jx) are orthogonal.
• Even without exactly equal spacing, the
correlation between these terms will usually be
quite small.
• In Example 7.2
– Rescale the regressor x so that all of the
observations are in the interval (0, 2).
– Fit the model with d = 2 and r = 1
– R2 = 0.9895 and MSRes = 0.0767
33
7.3 Nonparamteric Regression
• Nonparameter regression is closed related to the
piecewise polynomial regression.
• Develop a model free basis for predicting the
response over the range of the data.
34
7.3.1 Kernel Regression
• The kernel smoother: use a weighted average of
the data.
•
where S=[wij] is the smoothing matrix.
• Typically, the weights are chosen such that wij  0
for all yi’s outside of s defined “neighborhood” of
the specific location of interest.
35
• These kernel smoothers use a bandwidth, b, to
define this neighborhood of interest.
• A large value for b results in more of the data
being used to predict the response at the specific
location.
• The resulting plot of predicted values becomes
much smoother as b increases.
• As b decrease, less of the data are used to generate
the prediction, and the resulting plot looks more
wiggly or bumpy.
36
• This approach is called a kernel smoother.
• A kernel function:
• See Table 7.5
37
7.3.2 Locally Weighted Regression (Loess)
• Another nonparameteric method
• Loess also uses the data from a neighborhood
around the specific location.
• The neighborhood is defined as the span, which is
the fraction of the total points used to form
neighborhoods.
• A span 0.5 indicates that the closest half of the
total data points is used as the neighborhood.
• Then loess procedure uses the points in the
neighborhood to generate a weighted least-squares
estimate of the specific response.
38
• The weights are based on the distance of the
points used in the estimation from the specific
location of interest.
• Let x0 be the specific location of interest, and let
Δ(x0) be the distance the farthest point in the
neighborhood lies from the specific location of
interest.
• The tri-cube weighted function is
39
• The model
•
• Since
40
• A common estimate of variance is
• R2 = (SST – SSRes) / SST
41
Example 7.4 Applying Loess Regression to the
Windmill Data
42
43
44
45
46
47
7.3.3 Final Cautions
• Parametric models are guided by appropriate
subject area theory.
• Nonparametric models almost always reflect pure
empiricism.
• One should always prefer a simple parametric
model when it provides a reasonable and
satisfactory fit to the data.
• The model terms often have important
interpretations.
• One should prefer the parametric model,
especially when subject area theory supports the
transformation used.
48
• On the other hand, there are many situations
where no simple parametric model yields an
adequate or satisfactory fit to the data, where there
is little or no subject area theory to guide the
analyst, and where no simple transformation
appears appropriate.
• In such cases, nonparametric regression makes a
great deal of sense.
• One is willing to accept the relative complexity
and the “black box” nature of the estimation in
order to give an adequate fit to the data.
49
7.4 Polynomial Models in Two or
More Variables
•
50
51
• Response surface methodology (RSM) is widely
applied in industry for modeling the output
response(s) of a process in terms of the important
controllable variables and then finding the
operating conditions that optimize the response.
• Illustrate fitting a second-order response surface in
two variables.
– y : the percent conversion of a chemical process
– T : reaction temperature
– C : reaction concentration
• Figure 7.14 shows a central composite design.
52
• Second-order model:
• See p.246
• The fitted model is
2
2
yˆ  79 . 75  9 . 83 x1  4 . 22 x 2  8 . 88 x1  5 . 13 x 2  7 . 75 x1 x 2
• The ANOVA table
53
54
• R2 and adjusted R2 values for this model are
satisfactory.
55
56
57
58
59
60
• From the response surface plots, the maximum
percent conversion occurs at about 245°C and 20%
concentration.
• The experimenter is interested in predicting the
response y pr estimating the mean response at a
particular point in the process variable space.
61
62
7.5 Orthogonal Polynomial
• In fitting polynomial model in one variable, even
if nonessential ill-conditioning is removed by
centering, we may still have high levels of
multicollinearity.
63
64
• Suppose the model is,
y i   0 P0 ( x i )   1 P1 ( x i )     k Pk ( x i )   i
• Then X’X is
 n

 i 1


X'X 





2
0
0
0

P ( xi )






n

0
• The estimators are ˆ


0
i 1
n
 P (x ) y
j

i

0








0

2
Pk ( x i ) 

i
i 1
n

i 1
2
Pj ( xi )
65
66
Example 7.5 Orthogonal Polynomial
• The effect of various reorder quantities on the
average annual cost of the inventory.
67
68
• The fitted equation is
2
2

x  162 . 5
1  x  162 . 5 
10  1 
yˆ  324 . 30  0 . 7424 ( 2 )(
)  2 . 7955  
 

25
2  
25
12 

69