Transcript Chapter2_Simple Linear Regression_How
Slide 1
Chapter 2 – Simple Linear
Regression - How
Y
Here is a perfect scenario of what we want reality to look
like for simple linear regression. Our two variables are
not perfectly related, as we can see, but nonetheless
there is a relationship.
The means of each distribution
is connected by a
straight line.
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
Y
In order to discover what the actual
equation for the straight line is we
need to sample from the population.
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
Y
And as far as we can tell there is a
scatter of ordered pairs. From these
ordered pairs we need to determine
the equation of the line.
(a,
a b)
b
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
How do we determine the actual equation of the line?
Y
The formula used to determine the actual line will not be
very informative. So instead I will tell you
what the formula will achieve.
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
The error is defined as the difference between the observed
y-value (blue dots are represented using the observed y)
and the predicted y-value (the predicted y-value is located
on the line).
Y
error y 1 yˆ 1
Observations
(x 1 , yˆ 1 )
yˆ1 a b ( x1 )
(x1, y1)
Observations
Observations
2
( y 1 yˆ 1 )
error
2
Calculate the error for every
observed y-value. Take the
square of all the results and
add them up. The least
squares regression line has
the property that no other line
will have a smaller squared
sum of the errors.
x1
X
Keep in mind that since we are going to attempt to find the
linear equation using a sample from our population, then
the linear equation that we calculate is an approximation
of the actual linear equation. In other words the slope and
y-intercept are estimates of the actual slope and
Y
y-intercept.
error y 1 yˆ 1
(x 1 , yˆ 1 )
yˆ1 a b ( x1 )
(x1, y1)
2
( y 1 yˆ 1 )
error
2
x1
X
Slide 2
Chapter 2 – Simple Linear
Regression - How
Y
Here is a perfect scenario of what we want reality to look
like for simple linear regression. Our two variables are
not perfectly related, as we can see, but nonetheless
there is a relationship.
The means of each distribution
is connected by a
straight line.
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
Y
In order to discover what the actual
equation for the straight line is we
need to sample from the population.
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
Y
And as far as we can tell there is a
scatter of ordered pairs. From these
ordered pairs we need to determine
the equation of the line.
(a,
a b)
b
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
How do we determine the actual equation of the line?
Y
The formula used to determine the actual line will not be
very informative. So instead I will tell you
what the formula will achieve.
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
The error is defined as the difference between the observed
y-value (blue dots are represented using the observed y)
and the predicted y-value (the predicted y-value is located
on the line).
Y
error y 1 yˆ 1
Observations
(x 1 , yˆ 1 )
yˆ1 a b ( x1 )
(x1, y1)
Observations
Observations
2
( y 1 yˆ 1 )
error
2
Calculate the error for every
observed y-value. Take the
square of all the results and
add them up. The least
squares regression line has
the property that no other line
will have a smaller squared
sum of the errors.
x1
X
Keep in mind that since we are going to attempt to find the
linear equation using a sample from our population, then
the linear equation that we calculate is an approximation
of the actual linear equation. In other words the slope and
y-intercept are estimates of the actual slope and
Y
y-intercept.
error y 1 yˆ 1
(x 1 , yˆ 1 )
yˆ1 a b ( x1 )
(x1, y1)
2
( y 1 yˆ 1 )
error
2
x1
X
Slide 3
Chapter 2 – Simple Linear
Regression - How
Y
Here is a perfect scenario of what we want reality to look
like for simple linear regression. Our two variables are
not perfectly related, as we can see, but nonetheless
there is a relationship.
The means of each distribution
is connected by a
straight line.
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
Y
In order to discover what the actual
equation for the straight line is we
need to sample from the population.
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
Y
And as far as we can tell there is a
scatter of ordered pairs. From these
ordered pairs we need to determine
the equation of the line.
(a,
a b)
b
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
How do we determine the actual equation of the line?
Y
The formula used to determine the actual line will not be
very informative. So instead I will tell you
what the formula will achieve.
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
The error is defined as the difference between the observed
y-value (blue dots are represented using the observed y)
and the predicted y-value (the predicted y-value is located
on the line).
Y
error y 1 yˆ 1
Observations
(x 1 , yˆ 1 )
yˆ1 a b ( x1 )
(x1, y1)
Observations
Observations
2
( y 1 yˆ 1 )
error
2
Calculate the error for every
observed y-value. Take the
square of all the results and
add them up. The least
squares regression line has
the property that no other line
will have a smaller squared
sum of the errors.
x1
X
Keep in mind that since we are going to attempt to find the
linear equation using a sample from our population, then
the linear equation that we calculate is an approximation
of the actual linear equation. In other words the slope and
y-intercept are estimates of the actual slope and
Y
y-intercept.
error y 1 yˆ 1
(x 1 , yˆ 1 )
yˆ1 a b ( x1 )
(x1, y1)
2
( y 1 yˆ 1 )
error
2
x1
X
Slide 4
Chapter 2 – Simple Linear
Regression - How
Y
Here is a perfect scenario of what we want reality to look
like for simple linear regression. Our two variables are
not perfectly related, as we can see, but nonetheless
there is a relationship.
The means of each distribution
is connected by a
straight line.
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
Y
In order to discover what the actual
equation for the straight line is we
need to sample from the population.
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
Y
And as far as we can tell there is a
scatter of ordered pairs. From these
ordered pairs we need to determine
the equation of the line.
(a,
a b)
b
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
How do we determine the actual equation of the line?
Y
The formula used to determine the actual line will not be
very informative. So instead I will tell you
what the formula will achieve.
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
The error is defined as the difference between the observed
y-value (blue dots are represented using the observed y)
and the predicted y-value (the predicted y-value is located
on the line).
Y
error y 1 yˆ 1
Observations
(x 1 , yˆ 1 )
yˆ1 a b ( x1 )
(x1, y1)
Observations
Observations
2
( y 1 yˆ 1 )
error
2
Calculate the error for every
observed y-value. Take the
square of all the results and
add them up. The least
squares regression line has
the property that no other line
will have a smaller squared
sum of the errors.
x1
X
Keep in mind that since we are going to attempt to find the
linear equation using a sample from our population, then
the linear equation that we calculate is an approximation
of the actual linear equation. In other words the slope and
y-intercept are estimates of the actual slope and
Y
y-intercept.
error y 1 yˆ 1
(x 1 , yˆ 1 )
yˆ1 a b ( x1 )
(x1, y1)
2
( y 1 yˆ 1 )
error
2
x1
X
Slide 5
Chapter 2 – Simple Linear
Regression - How
Y
Here is a perfect scenario of what we want reality to look
like for simple linear regression. Our two variables are
not perfectly related, as we can see, but nonetheless
there is a relationship.
The means of each distribution
is connected by a
straight line.
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
Y
In order to discover what the actual
equation for the straight line is we
need to sample from the population.
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
Y
And as far as we can tell there is a
scatter of ordered pairs. From these
ordered pairs we need to determine
the equation of the line.
(a,
a b)
b
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
How do we determine the actual equation of the line?
Y
The formula used to determine the actual line will not be
very informative. So instead I will tell you
what the formula will achieve.
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
The error is defined as the difference between the observed
y-value (blue dots are represented using the observed y)
and the predicted y-value (the predicted y-value is located
on the line).
Y
error y 1 yˆ 1
Observations
(x 1 , yˆ 1 )
yˆ1 a b ( x1 )
(x1, y1)
Observations
Observations
2
( y 1 yˆ 1 )
error
2
Calculate the error for every
observed y-value. Take the
square of all the results and
add them up. The least
squares regression line has
the property that no other line
will have a smaller squared
sum of the errors.
x1
X
Keep in mind that since we are going to attempt to find the
linear equation using a sample from our population, then
the linear equation that we calculate is an approximation
of the actual linear equation. In other words the slope and
y-intercept are estimates of the actual slope and
Y
y-intercept.
error y 1 yˆ 1
(x 1 , yˆ 1 )
yˆ1 a b ( x1 )
(x1, y1)
2
( y 1 yˆ 1 )
error
2
x1
X
Slide 6
Chapter 2 – Simple Linear
Regression - How
Y
Here is a perfect scenario of what we want reality to look
like for simple linear regression. Our two variables are
not perfectly related, as we can see, but nonetheless
there is a relationship.
The means of each distribution
is connected by a
straight line.
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
Y
In order to discover what the actual
equation for the straight line is we
need to sample from the population.
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
Y
And as far as we can tell there is a
scatter of ordered pairs. From these
ordered pairs we need to determine
the equation of the line.
(a,
a b)
b
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
How do we determine the actual equation of the line?
Y
The formula used to determine the actual line will not be
very informative. So instead I will tell you
what the formula will achieve.
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
The error is defined as the difference between the observed
y-value (blue dots are represented using the observed y)
and the predicted y-value (the predicted y-value is located
on the line).
Y
error y 1 yˆ 1
Observations
(x 1 , yˆ 1 )
yˆ1 a b ( x1 )
(x1, y1)
Observations
Observations
2
( y 1 yˆ 1 )
error
2
Calculate the error for every
observed y-value. Take the
square of all the results and
add them up. The least
squares regression line has
the property that no other line
will have a smaller squared
sum of the errors.
x1
X
Keep in mind that since we are going to attempt to find the
linear equation using a sample from our population, then
the linear equation that we calculate is an approximation
of the actual linear equation. In other words the slope and
y-intercept are estimates of the actual slope and
Y
y-intercept.
error y 1 yˆ 1
(x 1 , yˆ 1 )
yˆ1 a b ( x1 )
(x1, y1)
2
( y 1 yˆ 1 )
error
2
x1
X
Slide 7
Chapter 2 – Simple Linear
Regression - How
Y
Here is a perfect scenario of what we want reality to look
like for simple linear regression. Our two variables are
not perfectly related, as we can see, but nonetheless
there is a relationship.
The means of each distribution
is connected by a
straight line.
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
Y
In order to discover what the actual
equation for the straight line is we
need to sample from the population.
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
Y
And as far as we can tell there is a
scatter of ordered pairs. From these
ordered pairs we need to determine
the equation of the line.
(a,
a b)
b
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
How do we determine the actual equation of the line?
Y
The formula used to determine the actual line will not be
very informative. So instead I will tell you
what the formula will achieve.
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
The error is defined as the difference between the observed
y-value (blue dots are represented using the observed y)
and the predicted y-value (the predicted y-value is located
on the line).
Y
error y 1 yˆ 1
Observations
(x 1 , yˆ 1 )
yˆ1 a b ( x1 )
(x1, y1)
Observations
Observations
2
( y 1 yˆ 1 )
error
2
Calculate the error for every
observed y-value. Take the
square of all the results and
add them up. The least
squares regression line has
the property that no other line
will have a smaller squared
sum of the errors.
x1
X
Keep in mind that since we are going to attempt to find the
linear equation using a sample from our population, then
the linear equation that we calculate is an approximation
of the actual linear equation. In other words the slope and
y-intercept are estimates of the actual slope and
Y
y-intercept.
error y 1 yˆ 1
(x 1 , yˆ 1 )
yˆ1 a b ( x1 )
(x1, y1)
2
( y 1 yˆ 1 )
error
2
x1
X
Chapter 2 – Simple Linear
Regression - How
Y
Here is a perfect scenario of what we want reality to look
like for simple linear regression. Our two variables are
not perfectly related, as we can see, but nonetheless
there is a relationship.
The means of each distribution
is connected by a
straight line.
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
Y
In order to discover what the actual
equation for the straight line is we
need to sample from the population.
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
Y
And as far as we can tell there is a
scatter of ordered pairs. From these
ordered pairs we need to determine
the equation of the line.
(a,
a b)
b
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
How do we determine the actual equation of the line?
Y
The formula used to determine the actual line will not be
very informative. So instead I will tell you
what the formula will achieve.
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
The error is defined as the difference between the observed
y-value (blue dots are represented using the observed y)
and the predicted y-value (the predicted y-value is located
on the line).
Y
error y 1 yˆ 1
Observations
(x 1 , yˆ 1 )
yˆ1 a b ( x1 )
(x1, y1)
Observations
Observations
2
( y 1 yˆ 1 )
error
2
Calculate the error for every
observed y-value. Take the
square of all the results and
add them up. The least
squares regression line has
the property that no other line
will have a smaller squared
sum of the errors.
x1
X
Keep in mind that since we are going to attempt to find the
linear equation using a sample from our population, then
the linear equation that we calculate is an approximation
of the actual linear equation. In other words the slope and
y-intercept are estimates of the actual slope and
Y
y-intercept.
error y 1 yˆ 1
(x 1 , yˆ 1 )
yˆ1 a b ( x1 )
(x1, y1)
2
( y 1 yˆ 1 )
error
2
x1
X
Slide 2
Chapter 2 – Simple Linear
Regression - How
Y
Here is a perfect scenario of what we want reality to look
like for simple linear regression. Our two variables are
not perfectly related, as we can see, but nonetheless
there is a relationship.
The means of each distribution
is connected by a
straight line.
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
Y
In order to discover what the actual
equation for the straight line is we
need to sample from the population.
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
Y
And as far as we can tell there is a
scatter of ordered pairs. From these
ordered pairs we need to determine
the equation of the line.
(a,
a b)
b
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
How do we determine the actual equation of the line?
Y
The formula used to determine the actual line will not be
very informative. So instead I will tell you
what the formula will achieve.
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
The error is defined as the difference between the observed
y-value (blue dots are represented using the observed y)
and the predicted y-value (the predicted y-value is located
on the line).
Y
error y 1 yˆ 1
Observations
(x 1 , yˆ 1 )
yˆ1 a b ( x1 )
(x1, y1)
Observations
Observations
2
( y 1 yˆ 1 )
error
2
Calculate the error for every
observed y-value. Take the
square of all the results and
add them up. The least
squares regression line has
the property that no other line
will have a smaller squared
sum of the errors.
x1
X
Keep in mind that since we are going to attempt to find the
linear equation using a sample from our population, then
the linear equation that we calculate is an approximation
of the actual linear equation. In other words the slope and
y-intercept are estimates of the actual slope and
Y
y-intercept.
error y 1 yˆ 1
(x 1 , yˆ 1 )
yˆ1 a b ( x1 )
(x1, y1)
2
( y 1 yˆ 1 )
error
2
x1
X
Slide 3
Chapter 2 – Simple Linear
Regression - How
Y
Here is a perfect scenario of what we want reality to look
like for simple linear regression. Our two variables are
not perfectly related, as we can see, but nonetheless
there is a relationship.
The means of each distribution
is connected by a
straight line.
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
Y
In order to discover what the actual
equation for the straight line is we
need to sample from the population.
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
Y
And as far as we can tell there is a
scatter of ordered pairs. From these
ordered pairs we need to determine
the equation of the line.
(a,
a b)
b
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
How do we determine the actual equation of the line?
Y
The formula used to determine the actual line will not be
very informative. So instead I will tell you
what the formula will achieve.
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
The error is defined as the difference between the observed
y-value (blue dots are represented using the observed y)
and the predicted y-value (the predicted y-value is located
on the line).
Y
error y 1 yˆ 1
Observations
(x 1 , yˆ 1 )
yˆ1 a b ( x1 )
(x1, y1)
Observations
Observations
2
( y 1 yˆ 1 )
error
2
Calculate the error for every
observed y-value. Take the
square of all the results and
add them up. The least
squares regression line has
the property that no other line
will have a smaller squared
sum of the errors.
x1
X
Keep in mind that since we are going to attempt to find the
linear equation using a sample from our population, then
the linear equation that we calculate is an approximation
of the actual linear equation. In other words the slope and
y-intercept are estimates of the actual slope and
Y
y-intercept.
error y 1 yˆ 1
(x 1 , yˆ 1 )
yˆ1 a b ( x1 )
(x1, y1)
2
( y 1 yˆ 1 )
error
2
x1
X
Slide 4
Chapter 2 – Simple Linear
Regression - How
Y
Here is a perfect scenario of what we want reality to look
like for simple linear regression. Our two variables are
not perfectly related, as we can see, but nonetheless
there is a relationship.
The means of each distribution
is connected by a
straight line.
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
Y
In order to discover what the actual
equation for the straight line is we
need to sample from the population.
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
Y
And as far as we can tell there is a
scatter of ordered pairs. From these
ordered pairs we need to determine
the equation of the line.
(a,
a b)
b
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
How do we determine the actual equation of the line?
Y
The formula used to determine the actual line will not be
very informative. So instead I will tell you
what the formula will achieve.
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
The error is defined as the difference between the observed
y-value (blue dots are represented using the observed y)
and the predicted y-value (the predicted y-value is located
on the line).
Y
error y 1 yˆ 1
Observations
(x 1 , yˆ 1 )
yˆ1 a b ( x1 )
(x1, y1)
Observations
Observations
2
( y 1 yˆ 1 )
error
2
Calculate the error for every
observed y-value. Take the
square of all the results and
add them up. The least
squares regression line has
the property that no other line
will have a smaller squared
sum of the errors.
x1
X
Keep in mind that since we are going to attempt to find the
linear equation using a sample from our population, then
the linear equation that we calculate is an approximation
of the actual linear equation. In other words the slope and
y-intercept are estimates of the actual slope and
Y
y-intercept.
error y 1 yˆ 1
(x 1 , yˆ 1 )
yˆ1 a b ( x1 )
(x1, y1)
2
( y 1 yˆ 1 )
error
2
x1
X
Slide 5
Chapter 2 – Simple Linear
Regression - How
Y
Here is a perfect scenario of what we want reality to look
like for simple linear regression. Our two variables are
not perfectly related, as we can see, but nonetheless
there is a relationship.
The means of each distribution
is connected by a
straight line.
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
Y
In order to discover what the actual
equation for the straight line is we
need to sample from the population.
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
Y
And as far as we can tell there is a
scatter of ordered pairs. From these
ordered pairs we need to determine
the equation of the line.
(a,
a b)
b
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
How do we determine the actual equation of the line?
Y
The formula used to determine the actual line will not be
very informative. So instead I will tell you
what the formula will achieve.
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
The error is defined as the difference between the observed
y-value (blue dots are represented using the observed y)
and the predicted y-value (the predicted y-value is located
on the line).
Y
error y 1 yˆ 1
Observations
(x 1 , yˆ 1 )
yˆ1 a b ( x1 )
(x1, y1)
Observations
Observations
2
( y 1 yˆ 1 )
error
2
Calculate the error for every
observed y-value. Take the
square of all the results and
add them up. The least
squares regression line has
the property that no other line
will have a smaller squared
sum of the errors.
x1
X
Keep in mind that since we are going to attempt to find the
linear equation using a sample from our population, then
the linear equation that we calculate is an approximation
of the actual linear equation. In other words the slope and
y-intercept are estimates of the actual slope and
Y
y-intercept.
error y 1 yˆ 1
(x 1 , yˆ 1 )
yˆ1 a b ( x1 )
(x1, y1)
2
( y 1 yˆ 1 )
error
2
x1
X
Slide 6
Chapter 2 – Simple Linear
Regression - How
Y
Here is a perfect scenario of what we want reality to look
like for simple linear regression. Our two variables are
not perfectly related, as we can see, but nonetheless
there is a relationship.
The means of each distribution
is connected by a
straight line.
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
Y
In order to discover what the actual
equation for the straight line is we
need to sample from the population.
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
Y
And as far as we can tell there is a
scatter of ordered pairs. From these
ordered pairs we need to determine
the equation of the line.
(a,
a b)
b
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
How do we determine the actual equation of the line?
Y
The formula used to determine the actual line will not be
very informative. So instead I will tell you
what the formula will achieve.
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
The error is defined as the difference between the observed
y-value (blue dots are represented using the observed y)
and the predicted y-value (the predicted y-value is located
on the line).
Y
error y 1 yˆ 1
Observations
(x 1 , yˆ 1 )
yˆ1 a b ( x1 )
(x1, y1)
Observations
Observations
2
( y 1 yˆ 1 )
error
2
Calculate the error for every
observed y-value. Take the
square of all the results and
add them up. The least
squares regression line has
the property that no other line
will have a smaller squared
sum of the errors.
x1
X
Keep in mind that since we are going to attempt to find the
linear equation using a sample from our population, then
the linear equation that we calculate is an approximation
of the actual linear equation. In other words the slope and
y-intercept are estimates of the actual slope and
Y
y-intercept.
error y 1 yˆ 1
(x 1 , yˆ 1 )
yˆ1 a b ( x1 )
(x1, y1)
2
( y 1 yˆ 1 )
error
2
x1
X
Slide 7
Chapter 2 – Simple Linear
Regression - How
Y
Here is a perfect scenario of what we want reality to look
like for simple linear regression. Our two variables are
not perfectly related, as we can see, but nonetheless
there is a relationship.
The means of each distribution
is connected by a
straight line.
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
Y
In order to discover what the actual
equation for the straight line is we
need to sample from the population.
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
Y
And as far as we can tell there is a
scatter of ordered pairs. From these
ordered pairs we need to determine
the equation of the line.
(a,
a b)
b
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
How do we determine the actual equation of the line?
Y
The formula used to determine the actual line will not be
very informative. So instead I will tell you
what the formula will achieve.
yˆ a + bx, the variable are
x and y w hich is currently
denoted by yˆ .
X
The error is defined as the difference between the observed
y-value (blue dots are represented using the observed y)
and the predicted y-value (the predicted y-value is located
on the line).
Y
error y 1 yˆ 1
Observations
(x 1 , yˆ 1 )
yˆ1 a b ( x1 )
(x1, y1)
Observations
Observations
2
( y 1 yˆ 1 )
error
2
Calculate the error for every
observed y-value. Take the
square of all the results and
add them up. The least
squares regression line has
the property that no other line
will have a smaller squared
sum of the errors.
x1
X
Keep in mind that since we are going to attempt to find the
linear equation using a sample from our population, then
the linear equation that we calculate is an approximation
of the actual linear equation. In other words the slope and
y-intercept are estimates of the actual slope and
Y
y-intercept.
error y 1 yˆ 1
(x 1 , yˆ 1 )
yˆ1 a b ( x1 )
(x1, y1)
2
( y 1 yˆ 1 )
error
2
x1
X