Transcript Slide 1

Spatial Dependency Modeling
Using Spatial Auto-Regression
Mete Celik 1,3, Baris M. Kazar 4, Shashi Shekhar
Daniel Boley 1, David J. Lilja 1,2
1
2
3
4
CSE Department @ University of Minnesota, Twin Cities
ECE Department @ University of Minnesota, Twin Cities
Army High Performance Computing Research Center
Oracle USA
1,3,
Outline of Today’s Talk
•
•
•
•
•
•
Motivation & Background
Problem Definition
Related Work & Contributions
Proposed Approach
Experimental Evaluation
Conclusion & Future Work
07/08/2006
Spatial Dependency Modeling Using SAR
2
Motivation
•
Widespread use of spatial databases
 Mining spatial patterns
 The 1855 Asiatic Cholera on London [Griffith]
•
•
•
•
Fair Landing [NYT, R. Nader]
 Correlation of bank locations with loan
activity in poor neighborhoods
Retail Outlets [NYT, Walmart, McDonald etc.]
 Determining locations of stores by relating
neighborhood maps with customer
databases
Crime Hot Spot Analysis [NYT, NIJ CML]
 Explaining clusters of sexual assaults by
locating addresses of sex-offenders
Ecology [Uygar]
 Explaining location of bird nests based on
structural environmental variables
07/08/2006
Spatial Dependency Modeling Using SAR
3
Spatial Auto-correlation (SA)
•
Random Distributed Data (no SA): Spatial distribution satisfying assumptions of
classical data
Pixel property
with
independent
identical
distribution
•
Random
Nest
Locations
Cluster Distributed Data: Spatial distribution NOT satisfying assumptions of
classical data
Pixel property
with
spatial
autocorrelation
07/08/2006
Cluster
Nest
Locations
Spatial Dependency Modeling Using SAR
4
Execution Trace
Given:
• Spatial framework
• Attributes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
6th row
Space +
4-neighborhood
0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 
1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 
0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 
0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 
1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 
0 1 0 0 1 0 1 0 0 1 0 0 0 0 0 0 
0 0 1 0 0 1 0 1 0 0 1 0 0 0 0 0 
0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 
0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 
0 0 0 0 0 1 0 0 1 0 1 0 0 1 0 0 
0 0 0 0 0 0 1 0 0 1 0 1 0 0 1 0 
0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 
0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 
0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 
0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 
0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0
Binary W
 0 12 0 0 12 0 0 0 0 0 0 0 0 0 0 0 
 13 0 13 0 0 13 0 0 0 0 0 0 0 0 0 0 
 0 13 10 13 0 0 13 10 0 0 0 0 0 0 0 0 
 10 00 02 00 00 10 00 02 10 00 00 00 00 00 00 00 
 03 1 0 0 1 03 1 0 03 1 0 0 0 0 0 0 
4
4
4
4
6th row 00 00 104 10 00 104 10 104 00 00 104 10 00 00 00 00 
3
3
3
 0 0 0 0 13 0 0 0 0 13 0 0 13 0 0 0 
 0 0 0 0 0 14 0 0 14 0 14 0 0 14 0 0 
 0 0 0 0 0 0 14 0 0 14 0 14 0 0 14 0 
 0 0 0 0 0 0 0 13 10 0 13 0 0 10 0 13 
 00 00 00 00 00 00 00 00 02 10 00 00 10 02 10 00 
 0 0 0 0 0 0 0 0 0 03 1 0 03 1 03 1 
 0 0 0 0 0 0 0 0 0 0 03 12 0 03 12 03 
Row-normalized W
(i  1, j ) 2  i  p,1  j  q NORTH
 (i, j  1) 1  i  p, 1  j  q-1 EAST
neighbors(i, j )  
(i  1, j) 1  i  p-1, 1  j  q SOUTH

 (i, j  1) 1  i  p, 2  j  q WEST
W allows other neighborhood definitions
• distance based
• 8-neighbors
07/08/2006
Spatial Dependency Modeling Using SAR
5
SDM Provides Better Model!
• Linear Regression y  xβ  ε → SAR y  Wy  xβ  ε
• Spatial auto-regression (SAR) model has higher accuracy and removes
IID assumption of linear regression
07/08/2006
Spatial Dependency Modeling Using SAR
6
Data Structures in SAR Model
W
y
y
x
ε
β

=
+
n-by-1 1-by-1
n-by-n
n-by-1
+
n-by-k k-by-1
n-by-1
• Vectors: y, β, ε
• Matrices: W, x
• W is a large matrix
07/08/2006
Spatial Dependency Modeling Using SAR
7
Computational Challenge
• Maximum-Likelihood Estimation = MINimizing the loglikelihood Function
A  I  W
T
1 T
B  [I  x(x x) x ](I  W)y
n 1 T 
MIN ln | A |  ln  B B
|  |1 Log-det term 2
n


Theorem 1
SSE term
 : thespatial auto - regression (auto - correlation) parameter
W : n - by - n neighborhood matrix overspatial framework
• Solving SAR Model

– =0
– β= 0, = 0
– General case:
07/08/2006
ε
→ Least Squares Problem
→ Eigen-value Problem
→ Computationally expensive due to the
log-det term in the ML Function
Spatial Dependency Modeling Using SAR
8
Outline
•
•
•
•
•
•
Motivation & Background
Problem Definition
Related Work & Contributions
Proposed Approach
Experimental Evaluation
Conclusion & Future Work
07/08/2006
Spatial Dependency Modeling Using SAR
9
Problem Statement
Given:
• A spatial framework S consisting of sites {s1, …, sq}
for an underlying geographic space G
• A collection of explanatory functions fxk: S  k ,
k=1,…, K. k is the range of possible values for the
explanatory functions
• A dependent function fy:   y
• A family of F (SAR equation) of learning model
functions mapping 1 x … x k  y
• A neighborhood relationship (4 and 8- neighbor) on
the spatial framework
Find:
• The SAR parameter
 and the regression coefficient
vector  with a desired precision to save log-det
computations.
07/08/2006
Spatial Dependency Modeling Using SAR
10
Problem Statement – Cont’d
Objective:
 Algebraic error ranking of approximate SAR model
solutions.
Constraints:
• S is a multi-dimensional Euclidean Space,
• The values of the explanatory variables x and the
dependent function (observed variable) y may not
be independent with respect to those of nearby
spatial sites, i.e., spatial autocorrelation exists.
• The domain of x and y are real numbers.
• The SAR parameter  varies in the range [0,1),
• The error is normally distributed with unit standard
deviation and zero mean, i.e.,  ~N(0,2I) IID
• The neighborhood matrix W exhibits sparsity.
07/08/2006
Spatial Dependency Modeling Using SAR
11
Related Work
Maxim um
Likelihood
Exact
Estimate
Eigen-value based 1-D Surface
Partitioning[Li96,Kazar03-04]
Matrix Exponential Specification [Pace00]
Direct Sparse Matrix Algorithms
[Pace97, Kazar05]
Taylor Series [Martin93, Kazar04, Shekhar04]
Chebyshev Poly. [Pace02, Kazar04,Shekhar04]
NORTHSTAR [Kazar05-06]
Semiparametric Estimates[Pace02]
Graph Theory [Pace00]
Characteristic Poly. [Smirnov01]
Double Bounded Likelihood Estimator[Pace04]
Upper & Lower Bounds via Div&Conq [Pace03]
SAR Local Estimation[Pace03]
Gauss-Lanczos [Bai, Golub98,Kazar05-06]
Matrix Exponential Specification[LeSage00]
Bayesian
07/08/2006
None
MCMC [Barry99,LeSage00]
Spatial Dependency Modeling Using SAR
12
Contributions
• A new approximate SAR model solution: GaussLanczos approximation method
– Key Idea: Do not find all of the eigenvalues of W
• Error ranking of approximate SAR model
solutions
df 1 (( | y ))
 
( | y )
d( | y )
07/08/2006
Spatial Dependency Modeling Using SAR
13
Outline
•
•
•
•
•
•
Motivation & Background
Problem Definition
Related Work & Contributions
Proposed Approach
Experimental Evaluation
Conclusion & Future Work
07/08/2006
Spatial Dependency Modeling Using SAR
14
Gauss-Lanczos Approximation
• Log-det is approximated by transforming the eigenvalue problem
to the quadratic form.
• Finally, Gauss-type quadrature rules are applied using Lanczos
procedure
1 n (i )
ln I   W  tr(ln(I   W))   I r
m i1
~
07/08/2006
~
Spatial Dependency Modeling Using SAR
15
How does GL Method Work?




Tr  




a1
1
0
1
a2
2
0
2
...
...
...
...
ar 1
0
...
0
 r 1
0 




0 
 r 1 

ar 
• GL (Algorithm 3.2) is repeated
m (i.e., 400) times in our experiments
• Parameter r varies between 5 and 8
in our experiments.
• For large problem sizes, the effects of m
and r for getting good solution are low.
07/08/2006
Spatial Dependency Modeling Using SAR
16
Taylor’s Series Approximation
• Log-det term in terms of Taylor’s Series
– Trace is sum of eigen-values & W is symmetrized neighborhood matrix
ln | I  W | 
q
 k trace( W k )
k 1
k

~
W , W, ρ , x, y
Taylor’s Series
Expansion
applied to
ln | I  W |
ˆ
ML
Function
Value
Golden Section
search
Calculate ML
Function
Similar to Stages A & B
ˆ bestfit
One Dense
Matrix (n-byn) and Vector
(n-by-1)
Multiplicatio
n
2 Dense
Matrix (nby-k) and
Vector
(n-by-1)
Multiplicatio
ns
ˆ , βˆ ,ˆ 2
3 Vector
(n-by-1)
Dot Products
Scalar
Operation
SSE stage (Stage C)
07/08/2006
Spatial Dependency Modeling Using SAR
17
Chebyshev Polynomial Approximation
• Log-det term in terms of Chebyshev Polynomials
– Trace is sum of eigen-values, Ts are matrix polynomials, cs are Chebyshev
polynomial coefficients
q 1
~
W, W , ρ ,x,y
1
ln | I  W |  ck (  )trace (Tk 1 ( W))  c1 (  )
2
k 1
Chebyshev Polynomial Approximation
q
ˆ
Chebyshev
coefficients
Golden
Section
ML
search
Function
Value Calculate
ML
Function
c j ( )
~
W
q-1 dense
n-by-n
matrix-matrix
multiplications
Trace of
n-by-n
dense
matrix
ˆ bestfit
One Dense
Matrix (n-byn) and Vector
(n-by-1)
Multiplication
2 Dense
Matrix (n-byk) and Vector
(n-by-1)
Multiplication
s
ˆ , βˆ ,ˆ 2
3 Vector
(n-by-1)
Dot
Products
Scalar
Operation
Similar to Stages A & B
Chebyshev Polynomial applied to
07/08/2006
ln | I  W |
Spatial Dependency Modeling Using SAR
SSE stage (Stage C)
18
Outline
•
•
•
•
•
•
Motivation & Background
Problem Definition
Related Work & Contributions
Proposed Approach
Experimental Evaluation
Conclusion & Future Work
07/08/2006
Spatial Dependency Modeling Using SAR
19
Experiment Design
Factor Name
Problem Size (n)
Neighborhood
Structure
Parameter Domain
400, 1600, 2500 observation points
2-D with 4-neighbors
Candidates
• Exact Approach (Eigenvalue Based)
• Taylor's Series Approximation
• Chebyshev Polynomial Approximation
• Gauss-Lanczos Approximation
Dataset
Synthetic Dataset for =0.1, 0.2, ….., 0.9
SAR Parameter 
Programming
Language
07/08/2006
[0,1)
Matlab
Spatial Dependency Modeling Using SAR
20
Exact and Approximate Values of Log-det
• GL
gives better approximation while spatial autocorrelation
increases
07/08/2006
Spatial Dependency Modeling Using SAR
21
Absolute Relative Error of Approximations
• Absolute
relative error of approximation goes down as
spatial autocorrelation increases (GL Mean error % 0.9, GL max error % 1.78)
07/08/2006
Spatial Dependency Modeling Using SAR
22
Conclusions
• GL is slightly more expensive than Taylor
series and Chebyshev polynomials.
• GL gives better approximations when
spatial autocorrelation is high and the
problem size is large.
• GL quality depends on the number of
iterations and the initial Lanczos vector
and the random number generator.
• No need to compute all eigenvalues.
07/08/2006
Spatial Dependency Modeling Using SAR
23
`
•
•
•
•
•
•
•
•
•
•
Acknowledgments
AHPCRC
Minnesota Supercomputing Institute (MSI)
Spatial Database Group Members
ARCTiC Labs Group Members
Dr. Dan Boley
Dr. Sanjay Chawla
THANK YOU VERY MUCH
Dr. Vipin Kumar
Q/A
Dr. James LeSage
Dr. Kelley Pace
Dr. Pen-Chung Yew
07/08/2006
Spatial Dependency Modeling Using SAR
24