Regression Inference

Download Report

Transcript Regression Inference

Simple Linear Regression and
Correlation: Inferential Methods
Chapter 13
AP Statistics
Peck, Olsen and Devore
Topic 2: Summary of Bivariate Data



In Topic 2 we discussed summarizing
bivariate data
Specifically we were interested in
summarizing linear relationships between
two measurable characteristics
We summarized these linear relationships
by performing a linear regression using
the method of least squares
Least Squares Regression

Graphically display the data in a scatterplot


Calculate the Pearson’s Correlation Coefficient



Determine if the model is appropriate
No patterns
Determine the Coefficient of Determination


yˆ  a  bx
Inspect the residual plot


The strength of the linear association
Perform the least squares regression


Form, strength and direction
How good is the model as a prediction tool
Use the model as a prediction tool
Interpretation




Pearson’s correlation coefficient
Coefficient of Determination
Variables in yˆ  a  bx
Standard deviation of the residuals
Minitab Output
Simple Linear Regression Model





‘Simple’ because we had only one independent variable yˆ  a  bx
We interpreted
as a predicted value of y given a specific
value of x
When y  f (x) we can describe this as a deterministic
model. That is, the value of y is completely determined by a
given value x
That wasn’t really the case when we used our linear
regressions. The value of y was equal to our predicted value
+/- some amount. That is, y  a  bx  e
We call this a probabilistic model.
So, without e, the (x,y) pairs (observed points) would fall on
the regression line.
yˆ
Now consider this …


How did we calculate the coefficients in our
linear regression models?
We were actually estimating a population
parameter using a sample. That is, the
simple linear regression y  a  bx  e is an
estimate for the population regression line
y    x  e

We can consider a, b
estimates for  , 
Basic Assumptions for the Simple
Linear Regression Model




The distribution of e at any particular value
of x has a mean value of 0. That is, e  0
The standard deviation of e is the same for
any value of x. Always denoted by
The distribution of e at any value of x is
normal
The random deviations are independent.

Another interpretation of yˆ



Consider , y    x  e where the
coefficients are fixed and e is distributed
normally. Then the sum of a fixed number
yˆ
and a normally distributed
variable is
normally distributed (Chapter 7). So y is
normally distributed.
Now the mean of y will be equal to   x
plus the mean of e which is equal to 0
So another interpretation is the mean y
value for a given x value =   x
Distribution of y



Where y    x  e we can now see that
y is distributed normally with a mean of   x
The variance for y is the same as the
2
variance of e -- which is 
2
2
An estimate for  is se
Assumption


The major assumption to all this is that
the random deviation e is normally
distributed.
We’ll talk more about how this assumption
is reasonable later.
Inferences about the slope of the
population regression line

Now we are going to make some
inferences about the slope of the
regression line. Specifically, we’ll
construct a confidence interval and then
perform a hypothesis test – a model utility
test for simple linear regression
Just to repeat …

We said the population regression model is
y    x  e

The coefficients of this model are fixed but
unknown (parameters) – so using the method
of least squares, we estimate these
parameters using a sample of data (statistics)
and we get
y  a  bx  e
Sampling distribution of b


We use b as an estimate for the
population coefficient
in the simple
regression model
b is therefore a statistic determined by a
random sample and it has a sampling
distribution

Sampling distribution of b

When the four assumptions of the linear
regression model are met


The mean value of the sampling distribution of b
is  . That is,   
b
The standard deviation of the statistic b is
b 


 x  X 
2
The sampling distribution of b is normally
distributed.
Estimates for …

The estimate for the standard deviation of b is
sb 

se
 x  X 
2
When we standardize b it has a t distribution
with n-2 degrees of freedom
b
t
sb
Confidence Interval


Sample Statistic +/- Crit Value * Std Dev of Stat
b  t  sb
*
Hypothesis Test


We’re normally interested in the null H o : 
because if we reject the null, the data
suggests there is a useful linear relationship
between our two variables
We call this ‘Model Utility Test for Simple
Linear Regression’
0
Summary of the Test



Ho :   0
Test Statistic
HA :   0
b
t
sb
Assumptions are the same four as those for
the simple linear regression model.
Minitab Output