Multiple and complex regression

Download Report

Transcript Multiple and complex regression

Multiple and complex
regression
Extensions of simple linear
regression
• Multiple regression models: predictor
variables are continuous
• Analysis of variance: predictor variables
are categorical (grouping variables),
• But… general linear models can include
both continuous and categorical predictors
Relative abundance of C3 and C4 plants
•
Paruelo & Lauenroth (1996)
• Geographic distribution
and the effects of climate
variables on the relative
abundance of a number
of plant functional types
(PFTs): shrubs, forbs,
succulents, C3 grasses
and C4 grasses.
data
73 sites across temperate central North America
Response variable
• Relative abundance
of PTFs (based on
cover, biomass, and
primary production)
for each site
Predictor variables
•
•
•
•
•
•
•
Longitude
Latitude
Mean annual temperature
Mean annual precipitation
Winter (%) precipitation
Summer (%) precipitation
Biomes (grassland , shrubland)
Box 6.1
20
30
20
10
10
Std. Dev = .26
Std. Dev = .31
Mean = .27
Mean = .29
N = 73.00
0
0.00
.13
.06
C3
.25
.19
.38
.31
.50
.44
.63
.56
.75
.69
.88
N = 73.00
0
0.00
.81
.06
.13
.25
.19
.38
.31
.50
.44
.63
.56
.75
.69
.88
.81
.94
C4
Relative abundance transformed ln(dat+1) because positively skewed
Comparing l10 vs ln
12
12
10
10
8
8
6
6
4
4
Std. Dev = .33
2
Std. Dev = .20
2
Mean = -.55
N = 73.00
0
-.97 -.87
-.77 -.66 -.56 -.46 -.36 -.25 -.15 -.05
-.92 -.82 -.72 -.61 -.51 -.41 -.30 -.20
LC3
-.10
.01
Mean = .22
N = 73.00
0
-.01
.05
.02
LNC3
.12
.09
.19
.16
.26
.23
.33
.30
.40
.36
.47
.43
.54
.50
.60
.57
.64
Collinearity
• Causes computational problems because
it makes the determinant of the matrix of
X-variables close to zero and matrix
inversion basically involves dividing by the
determinant (very sensitive to small
differences in the numbers)
• Standard errors of the estimated
regression slopes are inflated
Detecting collinearlity
• Check tolerance values
• Plot the variables
• Examine a matrix of correlation
coefficients between predictor variables
Dealing with collinearity
• Omit predictor variables if they are highly
correlated with other predictor variables
that remain in the model
Correlations
LAT
LAT
LONG
MAP
MAT
JJAMAP
DJFMAP
Pearson Correlation
Sig . (2-tailed)
N
Pearson Correlation
Sig . (2-tailed)
N
Pearson Correlation
Sig . (2-tailed)
N
Pearson Correlation
Sig . (2-tailed)
N
Pearson Correlation
Sig . (2-tailed)
N
Pearson Correlation
Sig . (2-tailed)
N
1
.
73
.097
.416
73
-.247*
.036
73
-.839**
.000
73
.074
.533
73
-.065
.584
73
LONG
.097
.416
73
1
.
73
-.734**
.000
73
-.213
.070
73
-.492**
.000
73
.771**
.000
73
*. Correlation is significant at the 0.05 level (2-tailed).
**. Correlation is significant at the 0.01 level (2-tailed).
MAP
-.247*
.036
73
-.734**
.000
73
1
.
73
.355**
.002
73
.112
.344
73
-.405**
.000
73
MAT
JJAMAP
DJFMAP
-.839**
.074
-.065
.000
.533
.584
73
73
73
-.213
-.492**
.771**
.070
.000
.000
73
73
73
.355**
.112
-.405**
.002
.344
.000
73
73
73
1
-.081
.001
.
.497
.990
73
73
73
-.081
1
-.792**
.497
.
.000
73
73
73
.001
-.792**
1
.990
.000
.
73
73
73
6000
5000
4000
The collinearity
problems disappear
3000
2000
Rsq = 0.2396
90
100
110
120
130
100
LONG
If we omit the
interaction and refit
the model, the partial
regression slope for
latitude changes
0
RELALO
LOXLA
200
-100
Rsq = 0.0181
-20
LONRE
-10
0
10
20
(lnC3)= βo+ β1(lat)+ β2(long)+ β3(latxlong)
Coefficientsa
Model
1
(Constant)
LAT
LONG
LOXLA
Unstandardized
Coefficients
B
Std. Error
7.391
3.625
-.191
.091
-.093
.035
.002
.001
Standardized
Coefficients
Beta
-3.095
-1.824
4.323
t
2.039
-2.101
-2.659
2.572
Sig .
.045
.039
.010
.012
Collinearity Statistics
Tolerance
VIF
.003
.015
.002
307.745
66.784
400.939
a. Dependent Variable: LC3
After centering both lat and long
Coefficientsa
Model
1
(Constant)
LONRE
LATRE
RELALO
Unstandardized
Coefficients
B
Std. Error
-.553
.027
-.003
.004
.048
.006
.002
.001
a. Dependent Variable: LC3
Standardized
Coefficients
Beta
-.051
.783
.238
t
-20.131
-.597
8.484
2.572
Sig .
.000
.552
.000
.012
Collinearity Statistics
Tolerance
VIF
.980
.827
.820
1.020
1.209
1.220
Normal P-P Plot of Regression Standardized Residual
Dependent Variable: LC3
1.00
Expected Cum Prob
.75
.50
.25
0.00
0.00
.25
.50
Observed Cum Prob
.75
1.00
R2=0.514
Analysis of variance
Source of
variation
SS
Regression Σ(yhat-Y)2
df
MS
p
Σ(yhat-Y)2
p
Residual
Σ(yobs-yhat)2 n-p-1
Total
Σ(yobs-Y)2
n-1
Σ(yobs-yhat)2
n-p-1
Matrix algebra approach to OLS estimation of
multiple regression models
• Y=βX+ε
• X’Xb=XY
• b=(X’X) -1 (XY)
The forward selection is
Coefficientsa
Model
1
2
(Constant)
LAT
(Constant)
LAT
MAP
Unstandardized
Coefficients
B
Std. Error
-2.230
.218
.042
.005
-2.448
.245
.044
.005
.000
.000
a. Dependent Variable: LC3
Standardized
Coefficients
Beta
.680
.720
.163
t
-10.246
7.805
-10.005
8.144
1.840
Sig .
.000
.000
.000
.000
.070
Coefficientsa
Model
1
2
3
4
(Constant)
MAP
MAT
JJAMAP
DJFMAP
LONG
LAT
(Constant)
MAP
JJAMAP
DJFMAP
LONG
LAT
(Constant)
MAP
JJAMAP
DJFMAP
LAT
(Constant)
JJAMAP
DJFMAP
LAT
Unstandardized
Coefficients
B
Std. Error
-2.689
1.239
.000
.000
-.001
.012
-.834
.475
-.962
.716
.007
.010
.043
.010
-2.730
1.093
.000
.000
-.831
.470
-.963
.711
.007
.010
.044
.006
-2.011
.406
.000
.000
-.812
.467
-.670
.577
.044
.006
-1.725
.306
-1.002
.433
-1.005
.486
.042
.005
a. Dependent Variable: LC3
The backward selection is
Standardized
Coefficients
Beta
.181
-.012
-.268
-.275
.136
.703
.180
-.267
-.276
.138
.713
.113
-.261
-.192
.714
-.322
-.288
.685
t
-2.170
1.261
-.073
-1.755
-1.343
.690
4.375
-2.498
1.269
-1.769
-1.354
.708
7.932
-4.959
1.074
-1.738
-1.163
7.983
-5.640
-2.314
-2.070
8.033
Sig .
.034
.212
.942
.084
.184
.493
.000
.015
.209
.082
.180
.481
.000
.000
.287
.087
.249
.000
.000
.024
.042
.000
Criteria for “best” fitting in multiple regression with p predictors.
Criterion
r2
Adjusted r2
Akaike Information Criteria AIC
Formula
r2 
SSRe gression
SStotal
 1
SSRe sidual
SStotal
 n 1 
(1  r 2 )
1  
 n  p) 
 n
  pn 

 2 ln(2 (SSRe sidual ) / n))  1  2
 2
  n  p 1 
Akaike Information Criteria AIC
n[ln(SSRe sidual
 pn 

/ n)]  2
 n  p 1 
Hierarchical partitioning and model selection
No pred
Model
r2
Adjr2
AIC (R)
AIC
1
Lon
0.00005
-0.014
49.179 -165.10
1
Lat
0.4619
0.454
3.942 -204.44
2
Lon + Lat
0.4671
0.4519
5.220 -201.20
3
Long +Lat +
Lon x Lat
0.5137
0.4926
0.437 -209.69