Multiple and complex regression

Download Report

Transcript Multiple and complex regression

Multiple and complex
regression
Extensions of simple linear
regression
• Multiple regression models: predictor
variables are continuous
• Analysis of variance: predictor variables
are categorical (grouping variables),
• But… general linear models can include
both continuous and categorical predictors
Relative abundance of C3 and C4 plants
•
Paruelo & Lauenroth (1996)
• Geographic distribution
and the effects of climate
variables on the relative
abundance of a number
of plant functional types
(PFTs): shrubs, forbs,
succulents, C3 grasses
and C4 grasses.
data
73 sites across temperate central North America
Response variable
• Relative abundance
of PTFs (based on
cover, biomass, and
primary production)
for each site
Predictor variables
•
•
•
•
•
•
•
Longitude
Latitude
Mean annual temperature
Mean annual precipitation
Winter (%) precipitation
Summer (%) precipitation
Biomes (grassland , shrubland)
Box 6.1
20
30
20
10
10
Std. Dev = .26
Std. Dev = .31
Mean = .27
Mean = .29
N = 73.00
0
0.00
.13
.06
C3
.25
.19
.38
.31
.50
.44
.63
.56
.75
.69
.88
N = 73.00
0
0.00
.81
.06
.13
.25
.19
.38
.31
.50
.44
.63
.56
.75
.69
.88
.81
.94
C4
Relative abundance transformed ln(dat+1) because positively skewed
Comparing l10 vs ln
12
12
10
10
8
8
6
6
4
4
Std. Dev = .33
2
Std. Dev = .20
2
Mean = -.55
N = 73.00
0
-.97 -.87
-.77 -.66 -.56 -.46 -.36 -.25 -.15 -.05
-.92 -.82 -.72 -.61 -.51 -.41 -.30 -.20
LC3
-.10
.01
Mean = .22
N = 73.00
0
-.01
.05
.02
LNC3
.12
.09
.19
.16
.26
.23
.33
.30
.40
.36
.47
.43
.54
.50
.60
.57
.64
Collinearity
• Causes computational problems because
it makes the determinant of the matrix of
X-variables close to zero and matrix
inversion basically involves dividing by the
determinant (very sensitive to small
differences in the numbers)
• Standard errors of the estimated
regression slopes are inflated
Detecting collinearlity
• Check tolerance values
• Plot the variables
• Examine a matrix of correlation
coefficients between predictor variables
Dealing with collinearity
• Omit predictor variables if they are highly
correlated with other predictor variables
that remain in the model
Correlations
LAT
LAT
LONG
MAP
MAT
JJAMAP
DJFMAP
Pears on Correlation
Sig. (2-tailed)
N
Pears on Correlation
Sig. (2-tailed)
N
Pears on Correlation
Sig. (2-tailed)
N
Pears on Correlation
Sig. (2-tailed)
N
Pears on Correlation
Sig. (2-tailed)
N
Pears on Correlation
Sig. (2-tailed)
N
1
.
73
.097
.416
73
-.247*
.036
73
-.839**
.000
73
.074
.533
73
-.065
.584
73
LONG
.097
.416
73
1
.
73
-.734**
.000
73
-.213
.070
73
-.492**
.000
73
.771**
.000
73
*. Correlation is s ignificant at the 0.05 level (2-tailed).
**. Correlation is s ignificant at the 0.01 level (2-tailed).
MAP
-.247*
.036
73
-.734**
.000
73
1
.
73
.355**
.002
73
.112
.344
73
-.405**
.000
73
MAT
JJAMAP
DJFMAP
-.839**
.074
-.065
.000
.533
.584
73
73
73
-.213
-.492**
.771**
.070
.000
.000
73
73
73
.355**
.112
-.405**
.002
.344
.000
73
73
73
1
-.081
.001
.
.497
.990
73
73
73
-.081
1
-.792**
.497
.
.000
73
73
73
.001
-.792**
1
.990
.000
.
73
73
73
(lnC3)= βo+ β1(lat)+ β2(long)+ β3(latxlong)
Coefficientsa
Model
1
(Cons tant)
LAT
LONG
LOXLA
Uns tandardized
Coefficients
B
Std. Error
7.391
3.625
-.191
.091
-.093
.035
.002
.001
Standardized
Coefficients
Beta
-3.095
-1.824
4.323
t
2.039
-2.101
-2.659
2.572
Sig.
.045
.039
.010
.012
Collinearity Statis tics
Tolerance
VIF
.003
.015
.002
307.745
66.784
400.939
a. Dependent Variable: LC3
After centering both lat and long
Coefficientsa
Model
1
(Cons tant)
LONRE
LATRE
RELALO
Uns tandardized
Coefficients
B
Std. Error
-.553
.027
-.003
.004
.048
.006
.002
.001
a. Dependent Variable: LC3
Standardized
Coefficients
Beta
-.051
.783
.238
t
-20.131
-.597
8.484
2.572
Sig.
.000
.552
.000
.012
Collinearity Statis tics
Tolerance
VIF
.980
.827
.820
1.020
1.209
1.220
R2=0.514
Analysis of variance
Source of
variation
SS
Regression Σ(yhat-Y)2
df
MS
p
Σ(yhat-Y)2
p
Residual
Σ(yobs-yhat)2 n-p-1
Total
Σ(yobs-Y)2
n-1
Σ(yobs-yhat)2
n-p-1
Matrix algebra approach to OLS estimation of
multiple regression models
• Y=βX+ε
• X’Xb=XY
• b=(X’X) -1 (XY)
The forward selection is
Coefficientsa
Model
1
2
(Cons tant)
LAT
(Cons tant)
LAT
MAP
Uns tandardized
Coefficients
B
Std. Error
-2.230
.218
.042
.005
-2.448
.245
.044
.005
.000
.000
a. Dependent Variable: LC3
Standardized
Coefficients
Beta
.680
.720
.163
t
-10.246
7.805
-10.005
8.144
1.840
Sig.
.000
.000
.000
.000
.070
Coefficientsa
Model
1
2
3
4
(Cons tant)
MAP
MAT
JJAMAP
DJFMAP
LONG
LAT
(Cons tant)
MAP
JJAMAP
DJFMAP
LONG
LAT
(Cons tant)
MAP
JJAMAP
DJFMAP
LAT
(Cons tant)
JJAMAP
DJFMAP
LAT
Uns tandardized
Coefficients
B
Std. Error
-2.689
1.239
.000
.000
-.001
.012
-.834
.475
-.962
.716
.007
.010
.043
.010
-2.730
1.093
.000
.000
-.831
.470
-.963
.711
.007
.010
.044
.006
-2.011
.406
.000
.000
-.812
.467
-.670
.577
.044
.006
-1.725
.306
-1.002
.433
-1.005
.486
.042
.005
a. Dependent Variable: LC3
The backward selection is
Standardized
Coefficients
Beta
.181
-.012
-.268
-.275
.136
.703
.180
-.267
-.276
.138
.713
.113
-.261
-.192
.714
-.322
-.288
.685
t
-2.170
1.261
-.073
-1.755
-1.343
.690
4.375
-2.498
1.269
-1.769
-1.354
.708
7.932
-4.959
1.074
-1.738
-1.163
7.983
-5.640
-2.314
-2.070
8.033
Sig.
.034
.212
.942
.084
.184
.493
.000
.015
.209
.082
.180
.481
.000
.000
.287
.087
.249
.000
.000
.024
.042
.000
Criteria for “best” fitting in multiple regression with p predictors.
Criterion
r2
Adjusted r2
Akaike Information Criteria AIC
Akaike Information Criteria AIC
Formula
r2 
SS Re gression
SStotal
 1
SS Re sidual
SStotal
 n 1 
(1  r 2 )
1  
 n  p) 
 n
  pn 

 2 ln( 2 ( SS Re sidual) / n))  1  2
 2
  n  p 1 
 pn 

n[ln( SS Re sidual / n)]  2
 n  p 1 
Hierarchical partitioning and model selection
No pred
Model
r2
Adjr2
AIC (R)
AIC
1
Lon
0.00005
-0.014
49.179 -165.10
1
Lat
0.4619
0.454
3.942 -204.44
2
Lon + Lat
0.4671
0.4519
5.220 -201.20
3
Long +Lat +
Lon x Lat
0.5137
0.4926
0.437 -209.69