Transcript Slide 1
Dealing with Spatial Autocorrelation Spatial Analysis Seminar Spring 2009 Spatial Autocorrelation Defined • “…the property of random variables taking values, at pairs of locations a certain distance apart, that are more similar (positive autocorrelation) or less similar (negative autocorrelation) than expected for randomly associated pairs of observations.” – Legendre (1993) Types of Spatial Autocorrelation • Inherent autocorrelation: caused by “contagious biotic processes” vs. • Induced spatial dependence: biological variables of interest are functionally dependent on one or more autocorrelated exogenous variable(s) Why Should We Care? • “natural systems almost always have autocorrelation in the form of patchiness or gradients…over a wide range of spatial and temporal scales.” – Fortin & Dale (2005) → Autocorrelation is a “fact of life” for ecologists! 2 Views of Spatial Autocorrelation: 1. It’s a nuisance that complicates statistical hypothesis testing 2. It’s functionally important in many ecosystems, so we must revise our theories and models to incorporate spatial structure • Either way, the first step involves describing the autocorrelation (i.e., the “spatial structure”) Describing Spatial Autocorrelation • Compute Moran’s I or Geary’s c coefficients over multiple distances • Correlogram: plot distance on X-axis against correlation coefficient on Y-axis • Mantel correlogram: multivariate response • Semi-variogram/variogram Example Data • Wetland hardwood forest (5 x 5 m cells) • Response variable: log of non-ground lidar points in 0-1 m vertical height bin • n1 = 217, n2 = 68 • Welch’s t-test (unequal variance, unequal sample sizes) results: t = 2.33, df = 181, p-value ≈ 0.021 Moran’s I correlograms Now what do I do??? • • • • Adjusting the effective sample size Spatial statistical modeling methods Restricted randomization Other methods: canonical ordination, partial Mantel tests, etc. Adjusting the Effective Sample Size • Estimate of effective sample size (Fortin & Dale 2005, p. 223, Equation 5.15): n' n2 n n Cor( xi , x j ) i 1 j 1 • For first-order autocorrelation ρ and large n: n' n 1 1 Adjusting the Effective Sample Size • For the “Recently Burned” example data: 1 1 0.33 n' n 217 110 1 1 0.33 • For the “Long Unburned” example data: 1 1 0.22 n' n 68 43 1 1 0.22 • Welch’s t-test results: t = 1.76, df = 123, p ≈ 0.080 • BUT, this is a very simplistic model! Detour: Autocorrelation Models • Model 1 (“spatial independence”): xi zi i • Model 2 (“first-order autoregressive”): xi xi 1 i , 1 1 • Model 3 (“induced autoregressive”): • Model 4 (“doubly autoregressive”): zi i xi zi i zi zi 1 i xi zi x xi 1 i zi z zi 1 i SOURCE: Fortin & Dale (2005), pp. 213-216 Detour: Autocorrelation Models • The models on the previous slide were onedimensional, but most spatial data is twodimensional (Lat-Long, XY-coordinates, etc.) • The two-dimensional spatial autocorrelation model incorporates W, a “proximity matrix” of neighbor weights, which in turn affects the variance-covariance matrix (C): x Z W ( x Z ) C 2[(I W )T ( I W )]1 Generalized Least Squares (GLS) • Relatively easy way to introduce spatial autocorrelation structure to linear models • Fits a parametric correlation function (exponential, Gaussian, spherical, etc.) directly to the variance-covariance matrix • Assumes normally distributed errors, but errors are allowed to be correlated and/or have unequal variances • Built-in R package: nlme GLS Model – No Spatial Structure library(nlme) … ## Model A: spatial independence ModelA <- gls(LN_COUNT~BURNED,data=SAC_data) plot(Variogram(ModelA, form=~x+y)) GLS Models with Spatial Structure > > > > > ModelB <- gls(LN_COUNT~BURNED,data=SAC_data,corr=corAR1()) ModelC <- gls(LN_COUNT~BURNED,data=SAC_data,corr=corExp(form=~x+y)) ModelD <- gls(LN_COUNT~BURNED,data=SAC_data,corr=corGaus(form=~x+y)) ModelE <- gls(LN_COUNT~BURNED,data=SAC_data,corr=corSpher(form=~x+y)) AIC(ModelA,ModelB,ModelC,ModelD,ModelE) df ModelA ModelB ModelC ModelD ModelE AIC 3 702.1288 4 677.3121 4 591.7996 4 607.3873 4 604.7950 > anova(ModelA,ModelC) Model df AIC BIC logLik Test L.Ratio p-value ModelA 1 3 702.1288 713.0652 -348.0644 ModelC 2 4 591.7996 606.3814 -291.8998 1 vs 2 112.3293 <.0001 → Exponential GLS model seems to fit best Other Autocorrelation Models • Conditional autoregressive (CAR), simultaneous autoregressive (SAR), and moving average (MA) models – See pp. 229-233 of Fortin & Dale (2005) – Implemented in R package spdep, as well as SAM (Spatial Analysis for Macroecology) software • Generalized linear mixed models (GLMMs): R built-in packages MASS, nlme • But wait, there’s more: see Dormann et al. (2007) review paper in Ecography (30) 609-628. Models and Reality • “Much of the treatment of spatial autocorrelation in the statistical literature is predicated on the simplest AR model, which produces an exponential decline in autocorrelation as a function of distance (Figure 5.16).” – Fortin & Dale (2005, pp. 247-248) • BUT, simple corrections based on first-order AR don’t account for effects of potentially negative autocorrelation at greater distances Restricted Randomization • • PROBLEM: randomization tests based on complete spatial randomness will destroy autocorrelation structure POTENTIAL SOLUTIONS: 1. “Toroidal shift” randomization (Figure 5.12) 2. Contiguity-constrained permutations (see Legendre et al. 1990 for algorithms) Conclusion • Incorporating spatial structure into ecological models was identified by Legendre as a “new paradigm” in 1993, BUT… • …ecologists are still refining their methods for dealing with spatial autocorrelation • OUR LAST HOPE?: Dale, M.R.T. and M.-J. Fortin. (in press). Spatial Autocorrelation and Statistical Tests: Some Solutions. Journal of Agricultural, Biological, and Environmental Statistics. Spatial autocorrelation, don’t make me open this…