Transcript Lecture 7
Ch11 Curve Fitting
Dr. Deshi Ye
[email protected]
Outline
The method of Least Squares
Inferences based on the Least Squares
Estimators
Curvilinear Regression
Multiple Regression
2/30
11.1 The Method of Least Squares
Study the case where a dependent
variable is to be predicted in terms of a
single independent variable.
The random variable Y depends on a
random variable X.
Regressing curve of Y on x, the relationship
between x and the mean of the
corresponding distribution of Y.
3/30
Linear regression
4/30
Linear regression
Linear regression: for any x, the mean of
the distribution of the Y’s is given by x
In general, Y will differ from this mean, and we denote
this difference as follows
Y x
is a random variable and we can also choose
so that the mean of the distribution of this random is
equal to zero.
5/30
EX
x
y
1 2 3 4 5 6 7 8 9 10 11 12
16 35 45 64 86 96 106 124 134 156 164 182
6/30
Analysis
ˆ a bx
y
ˆi
ei yi y
n
as close as possible to zero.
e
i
i 1
7/30
Principle of least squares
Choose a and b so that
n
n
2
e
(
y
(
a
bx
))
i
i
i 1
2
i
i 1
is minimum. The procedure of finding the equation of the line
which best fits a given set of paired data, called the method
of least squares. Some notations:
n
n
n
( yi )2
i 1
i 1
n
n
n
n
( xi ) 2
i 1
n
S xx ( xi x ) xi2
2
i 1
i 1
S yy ( yi y )2 yi2
n
n
i 1
i 1
n
n
( xi )( yi )
i 1
i 1
n
S xy ( xi x )( yi y ) xi yi
i 1
8/30
Least squares estimators
a y b x and b
S xy
S xx
, where x , y are the means of x, y
Fitted (or estimated) regression line
yˆ a bx
Residuals: observation – fitted value= y i (a bxi )
The minimum value of the sum of squares is called the
residual sum of squares or error sum of squares. We
n
will show that
2
SSE residual sum of squares= (yi - a - bxi )
i 1
S xy S xy2 / S xx
9/30
EX solution
Y = 14.8 X + 4.35
10/30
X-and-Y
X-axis
independent
predictor
carrier
input
Y-axis
dependent
predicted
response
output
11/30
Example
You’re a marketing analyst for Hasbro
Toys. You gather the following data:
Ad $
Sales (Units)
1
1
2
1
3
2
4
2
5
4
What is the relationship
between sales & advertising?
12/30
Scattergram
Sales vs. Advertising
Sales
4
3
2
1
0
0
1
2
3
4
5
Advertising
13/30
the Least Squares
Estimators
14/30
11.2 Inference based on the Least
Squares Estimators
We assume that the regression is linear in
x and, furthermore, that the n random
variable Yi are independently normally
distribution with the means xi
Statistical model for straight-line
regression Y x
i
i
i
i
are independent normal distributed random
variable having zero means and the common
variance 2
15/30
Standard error of estimate
2
The i-th deviation and the estimate of
is
1 n
2
S
[
y
(
a
bx
)]
i
i
n 2 i 1
2
e
2
Estimate of
can also be written as follows
S yy
S
2
e
( S xy ) 2
S xx
n2
16/30
Statistics for inferences: based on the assumption made
concerning the distribution of the values of Y, the following
theorem holds.
Theorem. The statistics
nS xx
(a )
(b )
t
and t
S xx
2
se
S xx n( x )
se
are values of random variables having the t distribution
with n-2 degrees of freedom.
Confidence intervals
: a t / 2 se
1 ( x )2
n S xx
: b t / 2 se
1
S xx
17/30
Example
The following data pertain to number of
computer jobs per day and the central
processing unit (CPU) time required.
Number of jobs
x
1
2
3
4
5
CPU time
y
2
5
4
9
10
18/30
EX
1) Obtain a least squares fit of a line to the
observations on CPU time
b
S xy
S xx
2, a y bx 0
y 2x
19/30
Example
2) Construct a 95% confidence interval for α
s
2
e
S yy S xy 2 / S xx
n2
The 95% confidence interval of α,
a t / 2 se
46 400 /10
2
3
t / 2 t0.025 3.182
1 x2
1 9
0 3.182 * 2 *
4.72
n S xx
5 10
20/30
Example
3) Test the null hypothesis
the alternative hypothesis
level of significance.
1 against
1 at the 0.05
Solution: the t statistic is given by
(b )
2 1
t
S xx
10 2.236
se
2
Criterion:
t t0.05 2.353
Decision: we cannot reject the null hypothesis
21/30
11.3 Curvilinear Regression
Regression curve is nonlinear.
Polynomial regression:
Y 0 1x 2 x
2
px
p
Y on x is exponential, the mean of the distribution of
values of Y is given by y x
Take logarithms, we have log y log x log
Thus, we can estimate , by the pairs of value ( xi ,log yi )
22/30
Polynomial regression
If there is no clear indication about the
function form of the regression of Y on x,
we assume it is polynomial regression
Y a0 a1x a2 x2 ak xk
23/30
Polynomial Fitting
•Really just a generalization of the
previous case
•Exact solution
•Just big matrices
24/30
11.4 Multiple Regression
The mean of Y on x is given by
b0 b1 x1 b2 x2
n
Minimize
[ yi (b0 b1xi1
bk xk
bk xik )]2
i 1
We can solve it when r=2 by the following equations
y nb b x b x
x y b x b x b x x
x y b x b x x b x
0
1
2
0
0
1
1
1
2
1
1
2
2
2
1
1 2
2
1 2
2
2
2
25/30
Example
P365.
26/30
Multiple Linear Fitting
X1(x), . . .,XM(x) are arbitrary fixed functions of x
(can be nonlinear), called the basis functions
normal equations of the least squares
problem
Can be put in matrix form and solved
27/30
Correlation Models
1. How strong is the linear relationship
between 2 variables?
2. Coefficient of correlation used
Population correlation coefficient denoted
Values range from -1 to +1
28/30
Correlation
Standardized observation
Observation - Sample mean xi x
Sample standard deviation
sx
The sample correlation coefficient r
1 n xi x yi y
r
(
)(
)
n 1 i 1 s x
sy
29/30
Coefficient of Correlation Values
No
Correlation
-1.0
-.5
Increasing degree of
negative correlation
0
+.5
+1.0
Increasing degree of
positive correlation
30/30