Transcript Chapter 2

Chapter 5
Part B: Spatial Autocorrelation and
regression modelling
www.spatialanalysisonline.com
Autocorrelation
Time series correlation model
 {xt,1} t=1,2,3…n-1 and {xt,2} t=2,3,4…n
3rd edition
www.spatialanalysisonline.com
2
Spatial Autocorrelation
n
 Correlation coefficient
r
 {xi} i=1,2,3…n, {yi} i=1,2,3…n
 x
 x y i  y 
i
i 1
n
 x
i
 x
2
i 1
Time series correlation model
n
 y
i
 y 2
i 1
 {xt,1} t=1,2,3…n-1 and {xt,2} t=2,3,4…n
 Mean values:
Lag 1 autocorrelation:
n 1
1
x.1 
xt
1
large n
n  1 t 1
x
n
n
1
x.2 
xt
n  1 t 2


3rd edition
n 1
n
x
t
t 1
www.spatialanalysisonline.com
r1 
x
t
 x  xt 1  x 
t 1
n
2
x

x


 t
t 1
3
Spatial Autocorrelation
Classical statistical model assumptions
Independence vs dependence in time and
space
Tobler’s first law:
“All things are related, but nearby things are more
related than distant things”
Spatial dependence and autocorrelation
Correlation and Correlograms
3rd edition
www.spatialanalysisonline.com
4
Spatial Autocorrelation
Covariance and autocovariance
Lags – fixed or variable interval
Correlograms and range
Stationary and non-stationary patterns
Outliers
Extending concept to spatial domain
Transects
Neighbourhoods and distance-based models
3rd edition
www.spatialanalysisonline.com
5
Spatial Autocorrelation
Global spatial autocorrelation
Dataset issues: regular grids; irregular lattice
(zonal) datasets; point samples
Simple binary coded regular grids – use of Joins
counts
Irregular grids and lattices – extension to x,y,z data
representation
Use of x,y,z model for point datasets
Local spatial autocorrelation
Disaggregating global models
3rd edition
www.spatialanalysisonline.com
6
Spatial Autocorrelation
Joins counts (50% 1’s)
A. Completely separated pattern (+ve)
3rd edition
B. Evenly spaced pattern (-ve)
www.spatialanalysisonline.com
C. Random pattern
7
Spatial Autocorrelation
Joins count
Binary coding
Edge effects
Double counting
Free vs non-free sampling
Expected values (free sampling)
1-1 = 15/60, 0-0 = 15/60, 0-1 or 1-0 = 30/60
3rd edition
www.spatialanalysisonline.com
8
Spatial Autocorrelation
Joins counts
A. Completely separated (+ve)
3rd edition
B. Evenly spaced (-ve)
www.spatialanalysisonline.com
C. Random
9
Spatial Autocorrelation
Joins count – some issues
Multiple z-scores
Binary or k-class data
Rook’s move vs other moves
First order lag vs higher orders
Equal vs unequal weights
Regular grids vs other datasets
Global vs local statistics
Sensitivity to model components
3rd edition
www.spatialanalysisonline.com
10
Spatial Autocorrelation
 Irregular lattice – (x,y,z) and adjacency tables
Cell data
Cell coordinates (row/col)
x,y,z view
+4.55
+5.54
1,1
1,2
1,3
x
y
z
+2.24
-5.15
+9.02
2,1
2,2
2,3
1
2
4.55
+3.10
-4.39
-2.09
3,1
3,2
3,3
1
3
5.54
+0.46
-3.06
4,1
4,2
4,3
2
1
2.24
2
2
-5.15
2
3
9.02
3
1
3.1
3
2
-4.39
3
3
-2.09
4
2
0.46
4
3
-3.06
3
7
1
4
8
2
5
9
6
10
Cell numbering
Adjacency matrix, total
1’s=26
3rd edition
www.spatialanalysisonline.com
11
Spatial Autocorrelation
“Spatial” (auto)correlation coefficient
Coordinate (x,y,z) data representation for cells
Spatial weights matrix (binary or other), W={wij}
From last slide: Σ wij=26
Coefficient formulation – desirable properties
Reflects co-variation patterns
Reflects adjacency patterns via weights matrix
Normalised for absolute cell values
Normalised for data variation
Adjusts for number of included cells in totals
3rd edition
www.spatialanalysisonline.com
12
Spatial Autocorrelation
 Moran’s I
w (z  z)( z

1
I
p
(z  z)
p   w / n,
ij
i
i
j
j
2
i
 z)
, w her e
i
ij
i
j
hence p  26/10 for our 10 cell ex ample
 TSA model
  x  x  x

x  x 
t 1
t
r.1
 x
t
2
t
t
3rd edition
www.spatialanalysisonline.com
13
Spatial Autocorrelation
Moran I =10*16.19/(26*196.68)=0.0317  0
A. Computation of variance/covariance-like quantities, matrix C
B. C*W: Adjustment by multiplication of the weighting matrix, W
3rd edition
www.spatialanalysisonline.com
14
Spatial Autocorrelation
w (z  z)( z

Moran’s I I  1
p
(z  z)
ij
i
i
j
 z)
j
2
i
, w her e p 
w
i
ij / n
j
i
Modification for point data
Replace weights matrix with distance bands, width h
Pre-normalise z values by subtracting means
Count number of other points in each band, N(h)
 z z
I(h)  N(h)
z
i
i
j
j
2
i
i
3rd edition
www.spatialanalysisonline.com
15
Spatial Autocorrelation
 Moran I Correlogram
Source data points
3rd edition
Lag distance bands, h
www.spatialanalysisonline.com
Correlogram
16
Spatial Autocorrelation
Geary C
Co-variation model uses squared differences
rather than products

(z  z )
w

p2
C
1
p
wij (zi  z j )2
i
2
ij
n 1
Similar approach is used in geostatistics
3rd edition
www.spatialanalysisonline.com
17
Spatial Autocorrelation
Extending SA concepts
Distance formula weights vs bands
Lattice models with more complex
neighbourhoods and lag models (see GeoDa)
Disaggregation of SA index computations (rowwise) with/without row standardisation (LISA)
Significance testing
Normal model
Randomisation models
Bonferroni/other corrections
3rd edition
www.spatialanalysisonline.com
18
Regression modelling
Simple regression – a statistical
perspective
One (or more) dependent (response) variables
One or more independent (predictor) variables
Linear regression is linear in coefficients:
y  0  1x1  2 x2  3 x3  ..., or
y  xβ
Vector/matrix form often used
Over-determined equations & least squares
3rd edition
www.spatialanalysisonline.com
19
Regression modelling
Ordinary Least Squares (OLS) model
yi  0  1x1i  2 x2i  3 x3i  ...  i , or
y  Xβ  ε
Minimise sum of squared errors (or residuals)
Solved for coefficients by matrix expression:

ˆ  XX T
β
3rd edition

1

ˆ)  σ 2 XX T
X T y var (β
www.spatialanalysisonline.com

1
20
Regression modelling
OLS – models and assumptions
Model – simplicity and parsimony
Model – over-determination, multi-collinearity
and variance inflation
Typical assumptions
Data are independent random samples from an
underlying population
Model is valid and meaningful (in form and statistical)
Errors are iid
• Independent; No heteroskedasticity; common distribution
Errors are distributed N(0,2)
3rd edition
www.spatialanalysisonline.com
21
Regression modelling
Spatial modelling and OLS
Positive spatial autocorrelation is the norm,
hence dependence between samples exists
Datasets often non-Normal >> transformations
may be required (Log, Box-Cox, Logistic)
Samples are often clustered >> spatial
declustering may be required
Heteroskedasticity is common
Spatial coordinates (x,y) may form part of the
modelling process
3rd edition
www.spatialanalysisonline.com
22
Regression modelling
 OLS vs GLS
OLS assumes no co-variation
 Solution:

ˆ  XX T
β

1
XT y
GLS models co-variation:
 y~ N(,C) where C is a positive definite covariance matrix
 y=X+u where u is a vector of random variables (errors)
with mean 0 and variance-covariance matrix C
 Solution:
3rd edition

ˆ  XC1X T
β

1
T
1
X C y
www.spatialanalysisonline.com

ˆ  X T C 1X T
var(β)

1
23
Regression modelling
 GLS and spatial modelling
y~ N(,C) where C is a positive definite covariance
matrix (C must be invertible)
C may be modelled by inverse distance weighting,
contiguity (zone) based weighting, explicit covariance
modelling…
 Other models
Binary data – Logistic models
Count data – Poisson models
3rd edition
www.spatialanalysisonline.com
24
Regression modelling
Choosing between models
Information content perspective and AIC
AIC  2 ln(L)  2k
n


AICc  2 ln(L)  2k

 n  k  1
where n is the sample size, k is the number of
parameters used in the model, and L is the likelihood
function
3rd edition
www.spatialanalysisonline.com
25
Regression modelling
 Some ‘regression’ terminology
 Simple linear
 Multiple
 Multivariate
 SAR
 CAR
 Logistic
 Poisson
 Ecological
 Hedonic
 Analysis of variance
 Analysis of covariance
3rd edition
www.spatialanalysisonline.com
26
Regression modelling
Spatial regression – trend surfaces and
residuals (a form of ESDA)
General model:
y  f (x 1, x 2 , w)
y - observations, f( , , ) - some function, (x1,x2) - plane
coordinates, w - attribute vector
Linear trend surface plot
Residuals plot
2nd and 3rd order polynomial regression
Goodness of fit measures – coefficient of
determination
3rd edition
www.spatialanalysisonline.com
27
Regression modelling
Regression & spatial autocorrelation (SA)
Analyse the data for SA
If SA ‘significant’ then
Proceed and ignore SA, or
Permit the coefficient,  , to vary spatially (GWR), or
Modify the regression model to incorporate the SA
3rd edition
www.spatialanalysisonline.com
28
Regression modelling
Regression & spatial autocorrelation (SA)
Analyse the data for SA
If SA ‘significant’ then
Proceed and ignore SA, or
Permit the coefficient,  , to vary spatially (GWR)
or
Modify the regression model to incorporate the SA
3rd edition
www.spatialanalysisonline.com
29
Regression modelling
 Geographically Weighted Regression (GWR)
Coefficients, , allowed to vary spatially, (t)
Model: y  Xβ(t)  ε
Coefficients determined by examining neighbourhoods
of points, t, using distance decay functions (fixed or
adaptive bandwidths)
Weighting matrix, W(t), defined for each point
1 T
Solution: β(
ˆ t)  XW(t)X T
X W(t)y

GLS:
3rd edition


ˆ  XC1X T
β

1
X T C 1y
www.spatialanalysisonline.com
30
Regression modelling
Geographically Weighted Regression
Sensitivity – model, decay function, bandwidth,
point/centroid selection
ESDA – mapping of surface, residuals,
parameters and SEs
Significance testing
Increased apparent explanation of variance
Effective number of parameters
AICc computations
3rd edition
www.spatialanalysisonline.com
31
Regression modelling
Geographically Weighted Regression
Count data – GWPR
use of offsets
Fitting by ILSR methods
Presence/Absence data – GWLR
True binary data
Computed binary data - use of re-coding, e.g.
thresholding
Fitting by ILSR methods
3rd edition
www.spatialanalysisonline.com
32
Regression modelling
Regression & spatial autocorrelation (SA)
Analyse the data for SA
If SA ‘significant’ then
Proceed and ignore SA, or
Permit the coefficient,  , to vary spatially (GWR)
or
Modify the regression model to incorporate the
SA
3rd edition
www.spatialanalysisonline.com
33
Regression modelling
Regression & spatial autocorrelation (SA)
Modify the regression model to incorporate the
SA, i.e. produce a Spatial Autoregressive model
(SAR)
Many approaches – including:
SAR – e.g. pure spatial lag model, mixed model,
spatial error model etc.
CAR – a range of models that assume the expected
value of the dependent variable is conditional on the
(distance weighted) values of neighbouring points
Spatial filtering – e.g. OLS on spatially filtered data
3rd edition
www.spatialanalysisonline.com
34
Regression modelling
SAR models
Spatial weights matrix
Pure spatial lag:
y   Wy  ε
Autoregression parameter
Re-arranging:
y  (I   W)1  ε
MRSA model:
y  Xβ  ρW y  ε
Linear regression added
3rd edition
www.spatialanalysisonline.com
35
Regression modelling
SAR models
Linear regression + spatial error
Spatial error model:
y  Xβ  ε, where
ε  λWε  u
iid error vector
Spatial weighted error vector
Substituting and re-arranging:
y  Xβ   W(y  Xβ)  u, or
y  Xβ   Wy   WXβ  u
Linear regression (global)
iid error vector
SAR lag
3rd edition
www.spatialanalysisonline.com
Local trend
36
Regression modelling
CAR models
Standard CAR model:


Autoregression parameter
E y i | all y j  i  i  
 w y
ij
j
 j

j i
Expected value at i
weighted mean for neighbourhood of i
Local weights matrix – distance or contiguity
Variance : var (y)  (I  W)1M
Different models for W and M provide a range of
CAR models
3rd edition
www.spatialanalysisonline.com
37
Regression modelling
Spatial filtering
Apply a spatial filter to the data to remove SA
effects
Model the filtered data
y   Wy = Xβ   WXβ + ε, or
Example: y = Xβ + ε
y I   W  = I   W  Xβ + ε, hence
1
y = Xβ + I   W  ε
Spatial filter
3rd edition
www.spatialanalysisonline.com
38