Transcript Document

Regression
Adv. Experimental
Methods & Statistics
PSYC 4310 / COGS 6310
Michael J. Kalsher
Department of
Cognitive Science
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2012, Michael Kalsher
1
Introduction to Regression
• If two variables covary, we should be able to predict
the value of one variable from another.
• Correlation only tells us how much two variables
covary.
• In regression, we construct an equation that uses one
or more variables (the IV(s) or predictor variable(s))
to predict another variable (the DV or outcome
variable).
– Predicting from one IV = Simple Regression
– Predicting from multiple IVs = Multiple Regression
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
Simple Regression:
The Model
• The general equation:
Outcomei = (model) + errori
• In regression the model is linear and we
summarize a data set with a straight line.
• The regression line is determined through the
method of least squares.
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
3
The Regression Line:
Model = “things that define the line we fit to the data”
• Any straight line can be defined by:
-
The slope of the line (b1)
The point at which the line crosses the ordinate, termed the intercept
of the line (b0)
The general equation:
… becomes
Outcomei = (model) + errori
Yi
=
(b0 + b1Xi) + εi
• b1 and b0 are termed regression coefficients
•
•
•
PSYC 4310/6310
b1 tells us what the model looks like (it’s shape)
b0 tells us where the model is in geometric space
εi is the residual term and represents the difference between
participant i’s predicted and obtained scores.
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
4
Method of Least Squares:
Finding the line of best fit
Regression Line
Slope = b1 = dy / dx
Residual (Error in Prediction)
Sum of residuals = 0
Individual Data Points
The method of least squares selects the line
(regression line) that has the lowest sum of squared
differences and therefore best represents the
observed data. Once we determine the slope
(b1)and intercept (b0) of the line, we can insert
different values of our predictor variable into the
model to estimate the value of the outcome variable.
Intercept (Constant) b0
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
5
Assessing Goodness of Fit
• Even the best fitting line can be a lousy fit to the data,
so we need to assess the goodness of fit of the
model against our best estimate--the mean.
• Let’s consider an example (see Field, p. 201):
– A music mogul wants to know how many records her company will
sell if she spends £100,000 on advertising.
– In the absence of a model of the relationship between advertising
and sales, the best guess would be the mean number of record
sales (say 200,000)--regardless of amount of advertising.
– So, as a basic strategy for predicting the outcome, we could use
the mean, because on average it is a good guess.
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
6
Assessing Goodness of Fit
SST
SSR
SSM
Represents the total amount of
differences present when the most
basic model is applied to the data.
Represents the degree of
inaccuracy when the best model is
fitted to the data.
Shows the reduction in inaccuracy
resulting from fitting the
regression model to the data.
SST uses the differences between
the observed data and the mean
value of Y.
SSR uses the differences between
the observed data and the
regression line.
SSM uses the differences between
the mean value of Y and the
regression line.
A large SSM implies the regression
model predicts the outcome
variable better than the mean.
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
7
Assessing Goodness of Fit
A large SSM implies the regression model is much better than using
the mean to predict the outcome variable.
How big is big? Assessed in two ways: (1) Via R2 and (2) the F-test
(assesses the ratio of systematic to unsystematic variance).
SSM
2
R =
SST
F=
Represents the amount of
variance in the outcome explained
by the model relative to how much
variance there was to explain.
PSYC 4310/6310
Advanced Experimental Methods and Statistics
SSM / df
SSR / df
=
MSM
MSR
df for SSM = number of variables in the model
df for SSR = number of observations minus
number of parameters being estimated.
© 2011, Michael Kalsher
8
Simple Regression Using SPSS:
Predicting Record Sales (Y) from Advertising Budget (X)
Record1.sav
What’s the overall relationship
between record sales and
advertising budget?
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
11
Interpreting a Simple Regression:
Overall Fit of the Model
Advertising expenditure
accounts for 33.5% of the
variation in record sales.
MSM
SSM
SSR
SST
MSR
The significant “F” test allows us to conclude that the regression model results in
significantly better prediction of record sales than the mean value of record sales.
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
12
Df = 1, 198
F=99.587
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
13
Critical Values for F
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
14 14
Interpreting a Simple Regression:
Model Parameters
b0, the Y intercept
b1, the slope, or the change in the outcome
associated with a unit change in the predictor
The ANOVA tells us whether the overall model results in a significantly good prediction of
the outcome variable … not about the individual contribution of variables in the model.
b0 = 134.14. Tells us that when no money is spent on ads, the model predicts 134,140
records will be sold.
b1 = .096. The amount of change in the outcome associated with a unit change in the
predictor. Thus, we can predict 96 extra record sales for every £1000 in advertising.
Regression coefficients should be sig. different from 0 and big relative to their S.E.
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
Interpreting a Simple Regression:
Model Parameters
Unstandardized Regression Weights
Ypred = b0 + b1X
Standardized Regression Weights
Z y(pred) = bZx
Intercept and Slope are in original units of
X and Y and so aren’t directly comparable
Standardized regression weights tell us
the number of standard deviations that the
outcome will change as a result of one
standard deviation change in the predictor.
Richards. (1982). Standardized versus Unstandardized Regression Weights. Applied Psychological Measurement, 6, 201-212.
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
16
Interpreting a Simple Regression:
Using the Model
Since we’ve demonstrated the model significantly improves
our ability to predict the outcome variable (record sales), we
can plug in different values of the predictor variable(s).
record salesi = b0 + b1 advertising budgeti
= 134.14 + (0.096 x advertising budgeti)
What could the record executive expect if she spent
£500,000 in advertising? How about £1,000,000?
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
17
Simple Regression:
Supermodel.sav
A fashion student interested in the factors that predict
salaries of catwalk models collects data from 231
models. For each model, she asks them their salary
per day on days they work (salary), their age (age),
number of years they have worked as a model (years),
and then gets a panel of experts from modeling
agencies to rate the attractiveness of each model as a
percentage with 100% being perfectly attractive
(beauty).
Use simple regression to predict the relationship
between each of the potential predictor variables (i.e.,
age, years, beauty) to predict a model’s salary.
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
18
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
19
Attractiveness
Age
Years
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
20
Attractiveness
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
21
Age
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
22
Years
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
23
PSYC 4310/6310
Advanced Experimental Methods and Statistics
© 2011, Michael Kalsher
24