Linear Regression

Download Report

Transcript Linear Regression

Linear Regression
William P. Wattles, Ph.D.
Psychology 302
Correlation
• Teen birth rate correlated
with our composite religiosity
variable with r = 0.73; 95%
CI (0.56,0.84); n = 49; p <
0.0005. Thus teen birth rate
is very highly correlated with
religiosity at the state level,
with more religious states
having a higher rate of teen
birth. A scatter plot of teen
birth rate as a function of
religiosity is presented in
Figure 1. http://www.reproductive-healthjournal.com/content/6/1/14
• “Victor, when will you stop trying to
remember and start trying to think?” -Helen Boyden
• You can use linear regression to answer the following questions
about the pattern of data points and the significance of a linear
equation:
• 1. Is a pattern evident in a set of data
points?
• 2. Does the equation of a straight line
describe this pattern?
• 3. Are the predictions made from this
equation significant?
• Using Regression to
predict college
performance and
college satisfaction.
Dependent and
Independent Variables
• Dependent Variable-or Criterion
Variable The variable whose variation
we want to explain.
• Independent Variable-or Predictor
Variable A variable that is related to or
predicts variation in the dependent
variable.
Examples
• SAT score, college GPA
• Alcohol consumed, score on a driving
test
• type of car, Qualifying speed
• level of education, Income
• Number of boats registered, deaths of
manatees
Correlation
• The relationship between two variables
X and Y.
• In general, are changes in X associated
with Changes in Y?
• If so we say that X and Y covary.
• We can observe correlation by looking
at a scatter plot.
Correlation example
• Is number of beers
consumed
associated with
blood alcohol level?
Beer consumption and
Blood Alcohol Content
16
Correlation
• Correlation coefficient tells us the
strength and direction of the
relationship between two variables.
Prediction
• If two variables are
related then
knowing a value for
one should allow us
to predict the value
of the other.
Regression
• Allows us to predict
one variable based
on the value of
another.
Regression
• Using knowledge of the relationship
between X and Y to predict Y given X.
• X the independent variable (predictor)
used to explain changes in Y
• Y the dependent variable (criterion)
Linear regression
• Regression line-a straight line through
the scatter plot that best describes the
relationship.
• Regression line-predicts the value of Y
for a given value of X.
Regression Line
• A straight line that describes how a
dependent variable changes as the
independent variable changes.
Least squares
regression.
• A method of determining the regression
line that minimizes the errors (residuals)
Least squares
regression
• residual is the error or the amount that
the observed observation deviates from
the regression line.
• goal to find a solution that minimizes the
squared residuals
• Least squares (the smallest possible
sum of the squared residuals)
Least squares
regression.
• a is the intercept the value of y when
X=0
• b is the slope the rate of change in Y
when X increases by 1
Regression formula
• a=Ybar-bXbar
• b=sum of deviation products/sum of
Xdev squared
Berk & Cary page 240
Mortality vs. Temperature
Berk & Carey Page 303
y = 2.3577x - 21.795
R2 = 0.7654
110.0
100.0
mortality index
90.0
80.0
70.0
60.0
50.0
30.0
35.0
40.0
45.0
temperature
50.0
55.0
Simple Linear
Regression
y  a  b x
The Regression
Equation
• x-the independent variable, the
predictor
• y-the dependent variable, what we want
to predict
• a-the intercept
• b-the slope
Calculating the leastsquares regression.
b
 ( x  x )( y 
 (x  x )
2
a  y  bx
y)
Population
Population
β Beta Slope
α Alpha Intercept
Sample
b Slope
a Intercept
• Crying and IQ page 600
Relationship
• The scatterplot suggests a relationship
between crying and IQ.
• Can use knowledge of crying to predict
IQ
• What would null
say?
HO
Null, says: “It’s nothing
but sampling error.
Ha
• Babies who cry
easily may be more
easily stimulated
and have higher
IQ’s
Steps to Analyze
Regression Data
• Plot and interpret
• Numerical summary
• Mathematical model
Plot and Interpret
• Plot independent
variable on the X
axis
• Plot dependent
variable on the Y
axis.
• Examine form,
direction and
strength of
relationship
Numerical Summary
• Correlation
coefficient tells
direction and
strength of
relationship.
r = +.455
r squared
• r 2 percent of
variance in Y
explained by X.
• =21%
Mathematical Model
• Use model to predict • a(the intercept)
IQ based on
=91.27
knowledge of crying • b the slope = 1.493
• Least Squares
regression line.
• Y predict=a + bx
y
Excel Output
Sample Statistics
• The slope and
intercept are
statistics because
they are calculated
on the sample.
• We are really
interested in
estimating the
population
parameters
Population
Parameter
Sample
Statistic
Residuals
• Residuals-The difference between the
observed value of the dependent
variable and and value predicted by the
regression line.
re sid u a l  y  y
Coefficient of
determination
• R2 the square of the correlation
coefficient.
• The amount of the variation in Y that
can be explained by changes in X
Regression and
correlation
• correlation tells us about the
relationship
• regression allows us to predict Y if we
know X
Serotonin
• 5-HT levels predict
mood in healthy
males.
• SSRI, Zoloft, Prozac
Privitera page 531
• Do levels of
serotonin predict
positive mood in
subjects?
Exam 2 as predictor
Exam 1 as predictor
Using the regression
equation
•
•
•
•
Exam 1 84%
exam1pred 80.8%
Exam 2 68%
exam2 pred 69.3%
Non-exercise activity
and weight gain
• Does appraised
value predict selling
price?
• Page 622.
Francis Marion Univ.
• http://vimeo.com/39
111127
Final Exam
• 81 questions
• All multiple choice
•
•
•
•
•
chi square
independent t-test
matched pairs
regression
single-sample t-test
Time at table
• Does time at the lunch
table predict how much
young children eat?
• Page 629.
Arctic Rivers
• page 604
• do data suggest a change
in discharge over time?
• Page 630 does pine cone
count predict number of
offspring in squirrels?
Low variability
High Variability
The End