Transcript Slide 1

Dealing with Spatial Autocorrelation
Spatial Analysis Seminar
Spring 2009
Spatial Autocorrelation Defined
• “…the property of random variables taking
values, at pairs of locations a certain
distance apart, that are more similar
(positive autocorrelation) or less similar
(negative autocorrelation) than expected
for randomly associated pairs of
observations.”
– Legendre (1993)
Types of Spatial Autocorrelation
• Inherent autocorrelation: caused by
“contagious biotic processes”
vs.
• Induced spatial dependence: biological
variables of interest are functionally
dependent on one or more autocorrelated
exogenous variable(s)
Why Should We Care?
• “natural systems almost always have
autocorrelation in the form of patchiness or
gradients…over a wide range of spatial
and temporal scales.”
– Fortin & Dale (2005)
→ Autocorrelation is a “fact of life” for
ecologists!
2 Views of Spatial Autocorrelation:
1. It’s a nuisance that complicates statistical
hypothesis testing
2. It’s functionally important in many ecosystems,
so we must revise our theories and models to
incorporate spatial structure
•
Either way, the first step involves describing
the autocorrelation (i.e., the “spatial structure”)
Describing Spatial Autocorrelation
• Compute Moran’s I or Geary’s c
coefficients over multiple distances
• Correlogram: plot distance on X-axis
against correlation coefficient on Y-axis
• Mantel correlogram: multivariate response
• Semi-variogram/variogram
Example Data
• Wetland hardwood forest
(5 x 5 m cells)
• Response variable: log of
non-ground lidar points in
0-1 m vertical height bin
• n1 = 217, n2 = 68
• Welch’s t-test (unequal
variance, unequal sample
sizes) results: t = 2.33, df
= 181, p-value ≈ 0.021
Moran’s I correlograms
Now what do I do???
•
•
•
•
Adjusting the effective sample size
Spatial statistical modeling methods
Restricted randomization
Other methods: canonical ordination,
partial Mantel tests, etc.
Adjusting the Effective Sample Size
• Estimate of effective sample size (Fortin & Dale
2005, p. 223, Equation 5.15):
n' 
n2
n n
  Cor( xi , x j )
i 1 j 1
• For first-order autocorrelation ρ and large n:
n'  n 
1 
1 
Adjusting the Effective Sample Size
• For the “Recently Burned” example data:
1 
1  0.33
n'  n 
 217
 110
1 
1  0.33
• For the “Long Unburned” example data:
1 
1  0.22
n'  n 
 68
 43
1 
1  0.22
• Welch’s t-test results: t = 1.76, df = 123,
p ≈ 0.080
• BUT, this is a very simplistic model!
Detour: Autocorrelation Models
• Model 1 (“spatial
independence”):
xi  zi   i
• Model 2 (“first-order
autoregressive”):
xi  xi 1   i , 1    1
• Model 3 (“induced
autoregressive”):
• Model 4 (“doubly
autoregressive”):
zi  i
xi  zi   i
zi  zi 1  i
xi  zi   x  xi 1   i
zi   z  zi 1  i
SOURCE: Fortin & Dale (2005), pp. 213-216
Detour: Autocorrelation Models
• The models on the previous slide were onedimensional, but most spatial data is twodimensional (Lat-Long, XY-coordinates, etc.)
• The two-dimensional spatial autocorrelation
model incorporates W, a “proximity matrix” of
neighbor weights, which in turn affects the
variance-covariance matrix (C):
x  Z  W ( x  Z )  
C   2[(I  W )T ( I  W )]1
Generalized Least Squares (GLS)
• Relatively easy way to introduce spatial
autocorrelation structure to linear models
• Fits a parametric correlation function
(exponential, Gaussian, spherical, etc.)
directly to the variance-covariance matrix
• Assumes normally distributed errors, but
errors are allowed to be correlated and/or
have unequal variances
• Built-in R package: nlme
GLS Model – No Spatial Structure
library(nlme)
…
## Model A: spatial independence
ModelA <- gls(LN_COUNT~BURNED,data=SAC_data)
plot(Variogram(ModelA, form=~x+y))
GLS Models with Spatial Structure
>
>
>
>
>
ModelB <- gls(LN_COUNT~BURNED,data=SAC_data,corr=corAR1())
ModelC <- gls(LN_COUNT~BURNED,data=SAC_data,corr=corExp(form=~x+y))
ModelD <- gls(LN_COUNT~BURNED,data=SAC_data,corr=corGaus(form=~x+y))
ModelE <- gls(LN_COUNT~BURNED,data=SAC_data,corr=corSpher(form=~x+y))
AIC(ModelA,ModelB,ModelC,ModelD,ModelE)
df
ModelA
ModelB
ModelC
ModelD
ModelE
AIC
3 702.1288
4 677.3121
4 591.7996
4 607.3873
4 604.7950
> anova(ModelA,ModelC)
Model df
AIC
BIC
logLik
Test L.Ratio p-value
ModelA
1 3 702.1288 713.0652 -348.0644
ModelC
2 4 591.7996 606.3814 -291.8998 1 vs 2 112.3293 <.0001
→ Exponential GLS model seems to fit best
Other Autocorrelation Models
• Conditional autoregressive (CAR), simultaneous
autoregressive (SAR), and moving average (MA)
models
– See pp. 229-233 of Fortin & Dale (2005)
– Implemented in R package spdep, as well as SAM
(Spatial Analysis for Macroecology) software
• Generalized linear mixed models (GLMMs): R
built-in packages MASS, nlme
• But wait, there’s more: see Dormann et al.
(2007) review paper in Ecography (30) 609-628.
Models and Reality
• “Much of the treatment of spatial autocorrelation
in the statistical literature is predicated on the
simplest AR model, which produces an
exponential decline in autocorrelation as a
function of distance (Figure 5.16).”
– Fortin & Dale (2005, pp. 247-248)
• BUT, simple corrections based on first-order AR
don’t account for effects of potentially negative
autocorrelation at greater distances
Restricted Randomization
•
•
PROBLEM: randomization tests based
on complete spatial randomness will
destroy autocorrelation structure
POTENTIAL SOLUTIONS:
1. “Toroidal shift” randomization (Figure 5.12)
2. Contiguity-constrained permutations (see
Legendre et al. 1990 for algorithms)
Conclusion
• Incorporating spatial structure into ecological
models was identified by Legendre as a “new
paradigm” in 1993, BUT…
• …ecologists are still refining their methods for
dealing with spatial autocorrelation
• OUR LAST HOPE?: Dale, M.R.T. and M.-J.
Fortin. (in press). Spatial Autocorrelation and
Statistical Tests: Some Solutions. Journal of
Agricultural, Biological, and Environmental
Statistics.
Spatial autocorrelation, don’t make
me open this…