Two variable linear regression.

Download Report

Transcript Two variable linear regression.

LECTURE 12
Multiple regression analysis
Epsy 640
Texas A&M University
Multiple regression analysis
• The test of the overall hypothesis that y is
unrelated to all predictors, equivalent to
• H0: 2y123… = 0
• H1: 2y123… = 0
• is tested by
• F = [ R2y123… / p] / [ ( 1 - R2y123…) / (n – p – 1) ]
• F = [ SSreg / p ] / [ SSe / (n – p – 1)]
Multiple regression analysis
SOURCE
df
Sum of Squares
Mean Square
x1 , x2…
p
SSreg
SSreg / p
e (residual)
n-p-1
SSe
SSe / (n-p-1)
n-1
SSy
SSy / (n-1)
total
• Table 8.2: Multiple regression table for Sums of Squares
F
SSreg/ 1
SSe /(n-p-1)
Multiple regression analysis
predicting Depression
Model Summary
Model
1
R
.774a
R Sq uare
.600
Adjusted
R Sq uare
.596
Std. Error of
the Estimate
6.120
a. Predictors: (Constant), t11, t9, t10
ANOVAb
Model
1
Reg ression
Residual
Total
Sum of
Squares
21819.235
14571.498
36390.733
df
3
389
392
Mean Square
7273.078
37.459
a. Predictors: (Constant), t11, t9, t10
b. Dependent Variable: t6
LOCUS OF CONTROL, SELF-ESTEEM, SELF-RELIANCE
F
194.162
Sig .
.000a
SSreg
ssx1
SSy
SSe
ssx2
Fig. 8.4: Venn diagram for multiple
regression with two predictors and
one outcome measure
Type I
ssx1
SSx1
SSy
SSe
SSx2
Type III
ssx2
Fig. 8.5: Type I contributions
Type III
ssx1
SSx1
SSy
SSe
SSx2
Type III
ssx2
Fig. 8.6: Type IIII unique
contributions
Multiple Regression ANOVA table
SOURCE
•
•
Model
df
2
•
•
x1
1
•
•
x2
1
•
e
•
•
n-3
Sum of Squares
(Type I)
SSreg
Mean Square
F
SSreg / 2
SSreg / 2
SSe / (n-3)
SSx1
SSx1 / 1
SSx1/ 1
SSe /(n-3)
SSx2  x1
SSx2  x1
SSx2  x1/ 1
SSe /(n-3)
SSe
SSe / (n-3)
total
n-1
SSy
SSy / (n-3)
Table 8.3: Multiple regression table for Sums of Squares of each predictor
PATH DIAGRAM FOR REGRESSION
X1
 = .5
.387
r = .4
Y
X2
 = .6
e
R2 = .742 + .82
- 2(.74)(.8)(.4)
 (1-.42)
= .85
Depression
Coefficientsa
Model
1
(Constant)
t9
t10
t11
Unstandardized
Coefficients
B
Std. Error
51.939
3.305
.440
.034
-.302
.036
-.181
.035
Standardized
Coefficients
Beta
.471
-.317
-.186
t
15.715
12.842
-8.462
-5.186
Sig .
.000
.000
.000
.000
a. Dependent Variable: t6
e
LOC. CON.
.4
.471
-.317
DEPRESSION
SELF-EST
SELF-REL
-.186
R2 = .60
Shrinkage
2
R
• Different definitions: ask which is being
used:
– What is population value for a sample R2?
• R2s = 1 – (1- R2)(n-1)/(n-k-1)
– What is the cross-validation from sample to
sample?
• R2sc = 1 – (1- R2)(n+k)/(n-k)
Estimation Methods
• Types of Estimation:
– Ordinary Least Squares (OLS)
• Minimize sum of squared errors around the
prediction line
– Generalized Least Squares
• A regression technique that is used when the error
terms from an ordinary least squares regression
display non-random patterns such as autocorrelation
or heteroskedasticity.
– Maximum Likelihood
Maximum Likelihood Estimation
•
•
•
•
Maximum likelihood estimation
There is nothing visual about the maximum likelihood method - but it is a powerful
method and, at least for large samples, very preciseMaximum likelihood estimation
begins with writing a mathematical expression known as the Likelihood Function of the
sample data. Loosely speaking, the likelihood of a set of data is the probability of
obtaining that particular set of data, given the chosen probability distribution model.
This expression contains the unknown model parameters. The values of these
parameters that maximize the sample likelihood are known as the Maximum Likelihood
Estimatesor MLE's. Maximum likelihood estimation is a totally analytic maximization
procedure.
MLE's and Likelihood Functions generally have very desirable large sample properties:
– they become unbiased minimum variance estimators as the sample size increases
– they have approximate normal distributions and approximate sample variances that
can be calculated and used to generate confidence bounds
– likelihood functions can be used to test hypotheses about models and parameters
With small samples, MLE's may not be very precise and may even generate a line that
lies above or below the data pointsThere are only two drawbacks to MLE's, but they are
important ones:
–
•
With small numbers of failures (less than 5, and sometimes less than 10 is small), MLE's can be
heavily biased and the large sample optimality properties do not apply
Calculating MLE's often requires specialized software for solving complex non-linear
equations. This is less of a problem as time goes by, as more statistical packages are
upgrading to contain MLE analysis capability every year.
Outliers
• Leverage (for a single predictor):
• Li = 1/n + (Xi –Mx)2/ x2 (min=1/n, max=1)
• Values larger than 1/n by large amount should
be of concern
 – Yi)
 2 / [(k+1)MSres]
• Cook’s Di = (Y
– the difference between predicted Y with and
without Xi
Outliers
• In SPSS Regression, under the SAVE option,
both leverage and Cook’s D will be computed
and saved as new variables with values for
each case