Linear-Least-Squares

Download Report

Transcript Linear-Least-Squares

Linear Least Squares Approximation

By Kristen Bauer, Renee Metzger, Holly Soper, Amanda Unklesbay

Linear Least Squares

Is the line of best fit for a group of points It seeks to minimize the sum of all data points of the square differences between the function value and data value.

It is the earliest form of linear regression

Gauss and Legendre

The method of least squares was first published by Legendre in 1805 and by Gauss in 1809.

Although Legendre’s work was published earlier, Gauss claims he had the method since 1795.

Both mathematicians applied the method to determine the orbits of bodies about the sun.

Gauss went on to publish further development of the method in 1821.

Example

Consider the points (1,2.1) , (2,2.9) , (5,6.1) , and (7,8.3) with the best fit line f(x) = 0.9x + 1.4 The squared errors are: x x x x 1 2 3 4 =1 =2 =5 =7 f(1)=2.3

f(2)=3.2

f(5)=5.9

f(7)=7.7

y y y y 1 2 3 4 =2.1

=2.9

=6.1

=8.3

e 1 = (2.3 – 2.1)² = .04

e 2 = (3.2 – 2.9)² =. 09 e 3 = (5.9 – 6.1)² = .04

e 4 = (7.7 – 8.3)² = .36

So the total squared error is .04 + .09 + .04 + .36 = .53

By finding better coefficients of the best fit line, we can make this error smaller…

We want to minimize the vertical distance between the point and the line.

• E = (d 1 ) ² + (d 2 )² + (d 3 )² +…+(d n )² for n data points • E = [f(x 1 ) – y 1 ] ² + [f(x 2 ) – y 2 ]² + … + [f(x n ) – y n ]² • E = [mx 1 + b – y 1 ] ² + [mx 2 + b – y 2 ]² +…+ [mx n + b – y n ]² • E= ∑( mx i + b – y i )²

E must be MINIMIZED!

How do we do this?

E = ∑(mx i + b – y i )² Treat x and y as constants, since we are trying to find m and b.

So…PARTIALS!

 E/  m = 0 and  E/  b = 0 But how do we know if this will yield maximums, minimums, or saddle points?

Minimum Point Maximum Point Saddle Point

Minimum!

Since the expression E is a sum of

squares

and is therefore positive (i.e. it looks like an upward paraboloid), we know the solution must be a minimum.

We can prove this by using the 2 nd Partials Derivative Test.

2

nd

Partials Test

Suppose the gradient of f(x 0 ,y 0 ) = 0. (An instance of this is  E/  m =  E/  b = 0.) We set

A

  2 

x

2

f

,

B

  2  

f

,

C

  2 

y

2

f

And form the discriminant D = AC – B 2 1) 2) If D < 0, then (x 0 ,y 0 ) is a saddle point .

If D > 0, then f takes on A local minimum at (x 0 ,y 0 ) if A > 0 A local maximum at (x 0 ,y 0 ) if A < 0

Calculating the Discriminant

A A A A A

      2

x

2

f

    2 2

m E

2   ( 2 (

mx

m

2 

m

  ( 2

x

2 )

A x

2

y

) 2

y

)

B

B

B

  2  

f

    2 2

E

 (

mx B

   ( 2 

b B

  ( 2

x

)

B x y

) 2

y

)

C

C C C C

  2

y

2

f

     2

b E

2  2  (

mx

b

2   2

mx

b

  2

C y

) 2

y

)

D

AC

B

2

x

2    2

D

AC

B

2

x

2    2 1) 2) If D < 0, then (x 0 ,y 0 ) is a saddle point .

If D > 0, then f takes on A local minimum A local maximum at (x 0 ,y 0 ) if A > 0 at (x 0 ,y 0 ) if A < 0 Now D > 0 by an inductive proof showing that

n i

n

 1

x i

2  

i n

 1

x i

  2 Those details are not covered in this presentation.

We know A > 0 since A = 2

∑ x

2 is always positive (when not all x’s have the same value).

Therefore…

Setting  E/  m and  E/  b equal to zero will yield two

minimizing

equations of E, the sum of the squares of the error.

Thus, the linear least squares algorithm (as presented) is valid and we can continue.

E = ∑(mx i + b – y i )² is minimized (as just shown) when the partial derivatives with respect to each of the variables is zero. ie:  E/  m = 0 and  E/  b = 0  E/  b = ∑2(mx i + b – y i ) = 0 m∑x i + ∑b = ∑y i mSx + bn = Sy set equal to 0  E/  m = ∑2x i (mx i + b – y i ) = 2∑(mx i ² + bx i – x i y i ) = 0 m ∑x i ² + b∑x i = ∑x i y i mSxx + bSx = Sxy NOTE: ∑x i = Sx ∑y i = Sy ∑x i ² = Sxx ∑x i y i = SxSy

Next we will solve the system of equations for unknowns m and b:   

mSxx mSx

 

bSx bn

 

Sy Sxy Solving for m…

nmSxx + bnSx = nSxy mSxSx + bnSx = SySx nmSxx – mSxSx = nSxy – SySx Multiply by n Multiply by Sx Subtract m(nSxx – SxSx) = nSxy – SySx

m

nSxy nSxx

 

SySx SxSx

Factor m

Next we will solve the system of equations for unknowns m and b:   

mSxx mSx

 

bSx bn

 

Sy Sxy Solving for b…

mSxSxx + bSxSx = SxSxy mSxSxx + bnSxx = SySxx bSxSx – bnSxx = SxySx – SySxx b(SxSx – nSxx) = SxySx – SySxx Multiply by Sx Multiply by Sxx Subtract Solve for b

b

Example: Find the linear least squares approximation to the data: (1,1), (2,4), (3,8)

Use these formulas:

m

nSxy nSxx

 

SySx SxSx

Sx = 1+2+3= 6 Sxx = 1²+2²+3² = 14 Sy = 1+4+8 = 13 Sxy = 1(1)+2(4)+3(8) = 33 n = number of points = 3

m

b

SxxSy nSxx

 

SxySx SxSx

   21  6

b

     16   6 The line of best fit is y = 3.5x – 2.667

Line of best fit: y = 3.5x – 2.667

15 10 5 -1 1 2 3 4 5 -5

THE ALGORITHM in Mathematica

Activity

For this activity we are going to use the linear least squares approximation in a real life situation.

You are going to be given a box score from either a baseball or softball game.

With the box score you are given you are going to write out the points (with the x coordinate being the number of hits that player had in the game and the y coordinate being the number of at-bats that player had in the game).

After doing that you are going to use the linear least squares approximation to find the best fitting line.

The slope of the besting fitting line you find will be the team’s batting average for that game.

In Conclusion…

E = ∑(mx i + b – y i )² is the sum of the squared error between the set of data points {(x 1 ,y 1 ),…,(x i ,y i ),…,(x n ,y n )} and the line approximating the data f(x) = mx + b .

By minimizing the error by calculus methods , we get equations for m and b that yield the least squared error :

m

nSxy nSxx

 

SySx SxSx b

SxxSy nSxx

 

SxySx SxSx

Advantages

Many common methods of approximating data seek to minimize the measure of difference between the approximating function and given data points. Advantages for using the squares of differences at each point rather than just the difference, absolute value of difference, or other measures of error include: – Positive differences do not cancel negative differences – Differentiation is not difficult – Small differences become smaller and large differences become larger

Disadvantages

Algorithm will fail if data points fall in a vertical line.

Linear Least Squares will not be the best fit for data that is not linear.

The End