Chapter 2 Simple Linear Regression – What is it, Why do we do it? Remember Statistics is an applied branch of mathematics. When you.

Download Report

Transcript Chapter 2 Simple Linear Regression – What is it, Why do we do it? Remember Statistics is an applied branch of mathematics. When you.

Chapter 2
Simple Linear Regression – What is
it, Why do we do it?
Remember Statistics is an applied branch of mathematics.
When you apply mathematics to describe the world we live in,
we call this mathematical modeling.
Up to this point in your mathematical studies you have been
learning about the language of algebra, which involves learning
the about the objects – like functions – that make up this
language, its characteristics, syntax, and grammar.
The big concept in algebra is the understanding of relationships
between variables. The simplest mathematical relationship
being linear relationships. After a while we learn about a
special type of relationship called a function, symbolized by
f(x) = y.
The types of relationships/functions we learn about in algebra are
called deterministic. What is deterministic? This means that we
have a perfect relationship between the variables in question. If
we know a particular value of one variable, then knowing what
the relationship is – meaning knowing the equation- gives us
complete knowledge of what is the corresponding value of the
other variable.
Here is a simple example. You work at a job that pays you $10.00
per hour. Let h equal the number of hours you work, d equal the
amount of dollars you earn and f(h) = d equal the symbolic
representation of the relationship between h and d.
So, if on your first week on the job you work 20 hours your gross
pay will be $200.00 without question, f(20) = $200.00. The
relationship is exact!
If the following week, you work 20 hours again your pay will be
$200.00 again. This is different from a probabilistic model. In a
probabilistic model there is a relationship between two variables but
that relationship is not perfect. For the value of a particular variable
we have an expectation of a value for the other variable in the
relationship, but we can not expect to get that exact value.
Here is a simple example. You are a vendor at an outdoor market.
Let h represent the number of hours you work at the outdoor
market, and let d equal the net amount of money you earn from
selling your products. If you work for 8 hours you will earn some
amount of money, but you can not predict from day to day what that
amount will be. You have an expectation of what you will earn,
otherwise you would not be involved in this endeavor.
But suppose that you worked for 8 hours, for many days at your
stall in the outdoor market. Eventually you would have gathered
enough data to create a distribution of sales and would begin to
see a pattern emerge. Soon you would recognize that this pattern
repeats itself, except that you can not predict exactly how much
money you will earn in any one week. But depending on the time
of the season, or month you can expect a certain return for your
work. Lets say that sales always seem to be higher at the end of
the month. So there is a relationship between the two variables,
except that it is not deterministic- exact - it is probabilistic.
So here is where we begin talking about the topic of chapter 2,
simple linear regression. The attempt here is to understand
what relationship exists between the two variables under a
probabilistic model.
Before we start let us cover a couple of preliminary
concepts. As was mentioned, we are dealing with a
probabilistic model, which means our functions will also
be probabilistic. More on what this means in a moment.
An important fact is that we will be dealing with linear
equations/functions only to make our task easier.
Now, what do we mean by a probabilistic linear function?
Let x be our independent variable, which we will call, most of
the time, the explanatory variable. In algebra the letter y
represent the dependent variable, which we now call the
response variable (more on the name change later).
Using algebra symbols, f is the name of the linear function, and
f(x) = y. But since we are dealing with a probabilistic model,
that is x and y are not perfectly related it seems that we have the
following situation: f(3) = ? . What exactly is the output of a
probabilistic equation that defines the relationship between two
variables?
Alright, here it is. Let the variable h, the explanatory variable
be the number of hours you work at the market selling your
product. Let us say you work 8 hours every Saturday. The
variable d represents the gross amount of dollars earned for
working 8 hours on Saturday. The function f represents the
mathematical relationship between the two; so f(8) equals the
expected (mean) gross amount of dollars earned. In other
words f(8) = the mean for the given situation. Thus, f(8)
represents the mean of some distribution.
We will make some assumptions about our
distribution; f(8) is the mean dollar value
of the sales. Secondly, we will assume a
normal distribution.
We realize that there is also then a value associated for any
value of h, number of hours worked.
Thus f(4) = has some value, and
f(5) = has some value, and
f(3.5) = has some value and so on.
The output is always the mean number of sales in dollars
associated with that number of hours worked, and we will
assume that the distributions are all normal. Also all the
normal distributions have the same standard deviation.
Ok, this is understandable so far.
But what exactly is simple linear
regression?
What is chapter 2 about?
We want to find the equation f(x) = y or in this example
f(h) = d. What is this equation that explains the
relationship between the variable x and the variable y, or in
our example h and d, which represents the mean of a
normal distribution.
We will only be considering situation when f(x) is a linear
equation, linear relationship. So one task is to determine if
our relationship is linear, how good is the linear
relationship, and if it will it provide any good information
We will assume that for each value of x, f(x) is the mean
of a normal distribution.