Multiple linear regression

Download Report

Transcript Multiple linear regression

Multiple regression models
Experimental design and data analysis
for biologists (Quinn & Keough, 2002)
Environmental sampling and analysis
Multiple regression models
• One response (dependent) variable:
–Y
• More than one predictor (independent
variable) variable:
– X1, X2, X3 …, Xj
– number of predictors = p (j = 1 to p)
• Number of observations = n (i = 1 to n)
Forest fragmentation
Forest fragmentation
• 56 forest patches in SE Victoria (Loyn
1987)
• Response variable:
– bird abundance
• Predictor variables:
– patch area (ha)
– years isolated (years)
– distance to nearest patch (km)
– distance to nearest larger patch (km)
– stock grazing intensity (1 to 5 scale)
– altitude (m)
Biomoinitoring with Vallisneria
• Indicators of sublethal effects of
organochlorine contamination
– leaf-to-shoot surface area ratio of
Vallisneria americana
– response variable
• Predictors:
– sediment contamination, plant
density, PAR, rivermile, water depth
• 225 sites in Great Lakes
• Potter & Lovett-Doust (2001)
Regression models
Linear model:
yi = 0 + 1xi1 + 2xi2 + .... + i
Sample equation:
y$i = b0 + b1 xi1 + b2 xi2 +...
Example
• Regression model:
(bird abundance)i = 0 + 1(patch area)i +
2(years isolated)i + 3(nearest patch
distance)i + 4(nearest large patch
distance)i + 5(stock grazing)i +
6(altitude)i + i
bird abundance
Multiple regression plane
altitude
log10area
Partial regression coefficients
• H 0:  1 = 0
• Partial population regression coefficient
(slope) for Y on X1, holding all other X’s
constant, equals zero
• Example:
– slope of regression of bird abundance
against patch area, holding years isolated,
distance to nearest patch, distance to
nearest larger patch, stock grazing
intensity and altitude constant, equals 0.
Testing H0: i = 0
•
•
•
•
Use partial t-tests:
t = bi / SE(bi)
Compare with t-distribution with n-2 df
Separate t-test for each partial
regression coefficient in model
• Usual logic of t-tests:
– reject H0 if P < 0.05
Model comparison
• Test H0: 1 = 0
• Fit full model:
– y = 0+1x1+2x2+3x3+…
• Fit reduced model:
– y = 0+2x2+3x3+…
• Calculate SSextra:
– SSRegression(full) - SSRegression(reduced)
• F = MSextra / MSResidual(full)
Overall regression model
• H0: 1 = 2 = ... = 0 (all population
slopes equal zero)
• Test of whether overall regression
equation is significant
• Use ANOVA F-test:
– variation explained by regression
– unexplained (residual) variation
Explained variance
r2
proportion of variation in Y explained
by linear relationship with X1, X2 etc.
SS Regression
SS Total
Forest fragmentation
Parameter
Intercept
Log10 area
Log10 distance
Log10 ldistance
Grazing
Altitude
Years
Coefficient
SE
Stand coeff
20.789
7.470
-0.907
-0.648
-1.668
0.020
-0.074
8.285
1.465
2.676
2.123
0.930
0.024
0.045
0
0.565
-0.035
-0.035
-0.229
0.079
-0.176
r2 = 0.685, F6,49 = 17.754, P <0 .001
P
0.015
<0.001
0.736
0.761
0.079
0.419
0.109
Biomoinitoring with Vallisneria
Parameter
Coefficient
Intercept
1.054
Sediment contamination
1.352
Plant density
0.028
PAR
-0.087
Rivermile
1.00 x 10-4
Water depth
0.246
SE
P
0.565
0.063
0.482
0.006
0.007
<0.001
0.017
<0.001
9.17 x 10-5 0.277
0.486
0.613
Assumptions
• Normality and homogeneity of variance
for response variable
• Independence of observations
• Linearity
• No collinearity
Scatterplots
• Scatterplot matrix (SPLOM)
– pairwise plots for all variables
• Partial regression (added variable) plots
– relationship between Y and Xj, holding
other Xs constant
– residuals from Y against all Xs except Xj vs
residuals from Xj against all other Xs
– graphs partial regression slope for Xj
Partial regression plot (log10 area)
Bird abundance
20
10
0
-10
-20
-2
-1
0
Log10 Area
1
2
Regression diagnostics
• Residual:
– observed yi - predicted yi
• Residual plots:
– residual against predicted yi
– residual against each X
• Influence:
– Cook’s D statistics
( y i  y$ i )
Collinearity
• Collinearity:
– predictors correlated
• Assumption of no collinearity:
– predictor variables uncorrelated with (ie.
independent of) each other
• Effect of collinearity:
– estimates of js and significance tests
unreliable
Collinearity
Response (Y) and 2 predictors (X1 and X2)
1. X1 and X2 uncorrelated (r = -0.24)
coeff
intercept -0.17
X1
1.13
X2
0.12
se
tol
1.03
0.14 0.95
0.14 0.95
t
-0.16
7.86
0.86
r2 = 0.787, F = 31.38, P < 0.001
P
0.873
<0.001
0.404
Collinearity
2. Rearrange X2 so X1 and X2 highly correlated
(r = 0.99)
coeff
intercept 0.49
X1
1.55
X2
-0.45
se
0.72
1.21
1.21
tol
t
P
0.01
0.01
0.69
1.28
-0.37
0.503
0.219
0.714
r2 = 0.780, F = 30.05, P < 0.001
Checks for collinearity
• Correlation matrix and/or SPLOM
between predictors
• Tolerance for each predictor:
– 1-r2 for regression of that predictor on all
others
– if tolerance is low (near 0.1) then
collinearity is a problem
– VIF (variance inflation factor)
L 1 0 L D IS T
L 1 0 D IS T
Forest fragmentation
Y RS
ALT
G RA ZE
L 1 0 A RE A
Tolerances:
0.396 – 0.681
L10DIST
L10LDIST
L10AREA
GRAZE
ALT
YRS
Solutions to collinearity
• Drop redundant (correlated) predictors
• Principal components regression
– potentially useful
– replace predictors by independent
components from PCA on predictor
variables
• Ridge regression
– controversial and complex
Predictor importance
• Tests on partial regression slopes
• Standardised partial regression
slopes
b = bj
*
j
sX j
sY j
Predictor importance
• Change in explained variation
– compare fit of full model to reduced model
omitting Xj
2
Xj
r
=
SSExtra
Reduced SSResidual
• Hierarchical partitioning
– splits total r2 for each predictor into
• independent contribution of each predictor
• joint contribution of each predictor with other
predictors
Forest fragmentation
Predictor
Independent
r2
Log10 area
Log10 distance
Log10 ldistance
Altitude
Grazing
Years
0.315
0.007
0.014
0.057
0.190
0.101
Joint
r2
Total
r2
Stand coeff
0.232
0.009
<0.001
0.092
0.275
0.152
0.548
0.016
0.014
0.149
0.466
0.253
0.565
-0.035
-0.035
0.079
-0.229
-0.176
Interactions
• Interactive effect of X1 and X2 on Y
• Dependence of partial regression slope
of Y against X1 on the value of X2
• Dependence of partial regression slope
of Y against X2 on the value of X1
• yi = 0 + 1xi1 + 2xi2 + 3xi1xi2 + i
Forest fragmentation
• Does effect of grazing on bird
abundance depend on area?
– log10 area x grazing interaction
• Does effect of grazing depend on years
since isolation?
– grazing x years interaction
• Etc.
Interpreting interactions
• Interactions highly correlated with individual
predictors:
– collinearity problem
– centring variables (subtracting mean) removes
collinearity
• Simple regression slopes:
– slope of Y on X1 for different values of X2
– slope of Y on X2 for different values of X1
– use if interaction is significant
Polynomial regression
• Modeling some curvilinear relationships
• Include quadratic (X2) or cubic (X3) etc.
• Quadratic model:
yi = 0 + 1xi1 + 2xi12 + i
• Compare fit with:
yi = 0 + 1xi1 + i
• Does quadratic fit better than linear?
Local and regional species richness
• Relationship between local and regional
species richness in North America
– Caley & Schluter (1997)
• Two models compared:
local spp = 0 + 1(regional spp) +
2(regional spp)2 + 
local spp = 0 + 1(regional spp) + 
Local species richness
200
150
Linear
100
50
Quadratic
0
0
50
100
150
200
250
Regional species richness
Model comparison
Full model:
SSResidual = 376.620, df = 5
Reduced model:
SSResidual = 1299.257, df = 6
Difference due to (regional spp)2:
SSExtra = 922.7, df = 1, MSExtra = 922.7
F = 12.249, P < 0.018
See Quinn & Keough Box 6.6
Categorical predictors
• Convert categorical predictors into
multiple continuous predictors
– dummy (indicator) variables
• Each dummy variable coded as 0 or 1
• Usually no. of dummy variables = no.
groups minus 1
Forest fragmentation
Grazing
intensity
Zero (1)
Low (2)
Medium (3)
High (4)
Intense (5)
Grazing1
0
1
0
0
0
Grazing2 Grazing3
0
0
1
0
0
0
0
0
1
0
Grazing4
0
0
0
0
1
Each dummy variable measures effect of low – intense
categories compared to “reference” category – zero
grazing
Forest fragmentation
Coefficient
Intercept
Grazing
Log10 area
Est
21.603
-2.854
6.890
SE
3.092
0.713
1.290
t
P
6.987 <0.001
-4.005 <0.001
5.341 <0.001
Intercept
Grazing1
Grazing2
Grazing3
Grazing4
Log10 area
15.716
0.383
-0.189
-1.592
-11.894
7.247
2.767
2.912
2.549
2.976
2.931
1.255
5.679 <0.001
0.131 0.896
-0.074 0.941
-0.535 0.595
-4.058 <0.001
5.774 <0.001
Categorical predictors
• All linear models fit categorical predictors
using dummy variables
• ANOVA models combine dummy variables
into single factor effect
– partition SS into factor and residual
– dummy variable effects often provided by software
• Models with both categorical (factor) and
continuous (covariate) predictors
– adjust factor effects based on covariate
– reduce residual based on strength of relationship
between Y and covariate – more powerful test of
factor