CHAPTER V - Middle East Technical University

Download Report

Transcript CHAPTER V - Middle East Technical University

CHAPTER V
MODELING OF POINT PATTERNS
METU, GGIT 538
OUTLINE(Last Week)
ANALYSIS OF POINT PATTERNS
4.1. Introduction
4.2. Case Studies
4.3. Visualizing Spatial Point Patterns
4.4. Exploring Spatial Point Patterns
4.4.1. Quadrat Methods
4.4.2. Kernel Estimation
4.4.3. Nearest Neighbor Distance
4.4.4. The K Function
METU, GGIT 538
OUTLINE
MODELING OF POINT PATTERNS
5.1. Complete Spatial Randomness (CSR)
5.2. Simple Quadrat Tests for CSR
5.3. Nearest Neighbor Tests for CSR
5.3.1. Testing for CSR Based on Various Summary
Statistics
5.3.2. Testing for CSR Based on Distribution
Function
5.4. The K Function Tests for CSR
METU, GGIT 538
5.1. Introduction
The exploratory analyses in most of the cases may be
insufficient and it may be required to go further to
consider the explicit tests of various hypotheses or
construct specific models to explained the observed
point pattern.
The term modeling refers to statistically comparing
various summary measures computed from the
observed distribution of events, which leads to
designing and testing hypotheses. The common model
used is “complete spatial randomness (CSR)”.
METU, GGIT 538
Reasons for Testing Against CSR
 Rejection of CSR is a prerequisite for any serious
attempt to model an observed pattern
 Tests are used to explore a set of data and assist
in the formulations of alternatives to CSR
 CSR operates as a dividing hypothesis between
regular and clustered patterns
METU, GGIT 538
5.1. Complete Spatial Randomness (CSR)
CSR is a standard model and states that the events
follow a homogeneous Poisson Process over the study
region. In this model, point pattern is considered to be
number of events occurring in arbitrary sub-regions or
areas, A, of the whole study region R.
Spatial point process is defined by:
Where;
Y(A) is the number of events occurring in the area A.
METU, GGIT 538
A hypothesis of complete spatial randomness for a spatial point
pattern {Y(A), A Є R} asserts that:
 The number of events in any planar region with area A follows a
Poisson distribution with mean λA.
 Given n events in A, the events are an independent random
sample from a uniform distribution on A
 implies constant intensity – no first order effects
 implies no spatial interaction
METU, GGIT 538
In other words:
1. Any event has an equal probability of occurring at
any position in R.
2. The position of any event is independent of the
position of any other, i.e. events do not interact with
one another
METU, GGIT 538
Therefore, by simulating n events from such a process
by enclosing R in a rectangle, i.e. generating events
with x coordinates from a uniform distribution on
(x1,x2) and y coordinates from a uniform distribution on
(y1,y2), the observed pattern of points can be compared
with the simulated ones based on CSR. i.e. CSR
represents a baseline hypothesis against which to
assess whether observed patterns are regular,
clustered or random.
METU, GGIT 538
5.2. Simple Quadrat Tests for CSR
The quadrat counts can be tested for CSR by using the
so called index of dispersion test.
Let (x1,…,xm) be the counts of the number of events in
m quadrats, either randomly scattered in R or forming
a regular grid covering the whole of R. Then
randomness can be tested based on the idea that if
these counts follow a Poisson distribution, it is
expected to achieve equal mean and variance of the
counts (variance mean ration).
H0: Point pattern is random and λ = s2
METU, GGIT 538
When the test is applied to particular set of observations,
the number of points and grid-squares are fixed,
consequently the mean will be constant irrespective of
whether the points are clustered, random or regular. It is
therefore differences in the variance that indicate the
nature of the point pattern.
If the VMR is significantly greater than 1.0 then
clustering of the points is indicated whereas value
lower than 1.0 denotes regularity.
METU, GGIT 538
is called index of dispersion (I) and
is called
index of cluster size (ICS)
E(ICS) = 0
E(ICS) > 0
E(ICS) < 0



CSR
Clustering (extra events)
Regularity (insufficient events)
The index of dispersion test is advantageous since it can be applied in
conjunction with the sampling of point patterns. In this case m quadrats will
be randomly scattered in R and events exhaustively counted on each
quadrat. Such a sampling scheme can be applied to estimate the intensity, λ
of the events in R.
METU, GGIT 538
The test statistic for I is defined as follows:
Where;
= Mean observed counts
s2 = Observed variance of the counts
m = Number of grids
Under CSR the theoretical chi-square distribution is:
for m > 6 and
METU, GGIT 538
>1
Properties of Quadrat Tests for CSR
 Under CSR the test statistic I is distributed as
 Compare test statistic I with percentage points of
 Significantly large values indicate clustering
 Significantly small values indicate regularity
METU, GGIT 538
#
of # of quadrats Total # of X2
events/qua with n events events
in
drat (n)
(q)
quadrats (X)
0
70
0
0
1
42
42
42
2
26
52
104
3
17
51
153
4
3
12
48
5
1
5
25
6
1
6
36
160
168
408
su
m
2
X

X
s2 

N
X
s 2  1.378
I
METU, GGIT 538
m  1s 2
x
I  208.66
x  1.05
compare  2m 1  214.8
If λ is assumed to be constant and CSR holds the estimate
of λ is given by:
Where Q is the area of each quadrat.
Then the 95 % confidence interval of λ can be estimated by:
Where;
METU, GGIT 538
Problems Encountered
1. Problem of overlapping quadrats: If randomly scattered
quadrats are to be used, they may overlap each other
and produce a problem if occurs frequently, since the xi
counts will not be independent. This can be overcome
by using a sampling scheme that guaranties disjoint
quadrats.
2. Problem of overlapping quadrats with the edge of R: If
the quadrats overlap with the edge of R, introduction of
a guard area inside the perimeter of R can be a solution.
In this case only the quadrats randomly scatter
throughout that part of R which is not in the guard area,
allowing events in the guard area to be counted as in
any quadrats which overlap into this area.
METU, GGIT 538
3. Problem of choosing appropriate quadrat size: An
empirical suggestion is to aim for a mean quadrat
count of about 1.6.
4. Problem of quadrat position: Usually no account is
taken care of the relative position of quadrats or
the relative position of events within a quadrat. One
common method to consider the relative position of
quadrats is called Greig-Smith Procedure, which is
given by:
a. Calculate the variance of quadrat counts for the
original grid
b. Divide the grids into sub-grids each formed by
successive combination of adjacent quadrats in
the original grid into blocks of increasing size
c. Plot the variance estimates at each block size,
where the peaks and troughs indicate evidence
METU, GGIT 538of scale of pattern
Table 5.1. Available Indexes for testing CSR
METU, GGIT 538
5.3. Nearest Neighbor Tests for CSR
In order to test for CSR in nearest neighbor distances, the
cumulative distributions of G(w) and F(x) must be known
when dealing with any specific area. However, it is usually
impossible to know G(w) and F(x) due to the edge effects,
since they depend of the particular shape of R. On the
other hand, it is possible to derive theoretical distribution
results for W and X if the edge effects are ignored.
There are two ways for testing for CSR in nearest
neighbor distances:
 Testing based on various summary statistics
 Testing based on distribution function
METU, GGIT 538
5.3.1. Testing for CSR Based on Various Summary Statistics
Let the mean density of events / unit area be λ. If CSR
holds, events are independent and the number of events in
any area is Poisson distributed.
Probability that no events fall within a circle of radius x
around any randomly chosen point is:
The distribution function F(x) of nearest neighbor pointevent distances X for CSR is given by:
,
This implies that πX2 follows an exponential distribution
withGGIT
parameter
λ. i.e. 2πλX2 is distributed as
.
METU,
538
Then it may be deduced that:
If X1,…Xn are independent nearest neighbor distances then
is distributed as
METU, GGIT 538
.
The same arguments apply to the nearest neighbor eventevent distances for CSR process. i.e. Under CSR, the
distribution function G(w) is:
,
E(W) and VAR(W) are the same for X.
Now it is possible to derive sampling distributions under
CSR of various summary statistics of the observed nearest
neighbor distances.
METU, GGIT 538
Distribution theory for these tests is based on the
assumption that n nearest neighbor measurements
randomly sampled from the study region R is independent.
This assumption of independence may be violated in case
of small numbers of events and if the proportion of them
used is large.
METU, GGIT 538
Basic Assumption:
1. The nearest neighbor distances used to compute
the summary statistics must be independently
sampled from the study region.
Therefore
independence is assured for large number of
events.
Rule
of thumb: The number m, of the nearest
neighbor measurements sampled should be
where n is the total number of events.
!!!Remark: The general effect of lack of independence
is that the test statistics will have a large variance
than their theoretical values under independence.
This implies that the standard test may show
significant departure from CSR, which would not be
so is the dependence is not taken into account.
METU, GGIT 538
2. The nearest neighbor distances used to compute
the summary statistics have not been biased by
edge effects.
There are various tests suggested to detect departures
from CSR based on summary statistics of m randomly
sampled nearest neighbor event-event distances
(w1,…,wm) or point-event distances (x1,…,xm). The most
commonly used are:
 Clark-Evans
 Hopkins
 Byth and Ripley
METU, GGIT 538
Clark-Evans: It compares
points of the distribution:
with percentage
Basic Properties:
 The test is based on event-event distances
 It requires enumerated point pattern to be available,
from which events can be randomly sampled and their
nearest neighbor distances determined.
 λ is unknown and needs to be replaced by appropriate
estimate, which is λ = n/R (n is the number of events in
R).
 If an estimate of λ is used it is desirable to use all n
event-event distances, if possible, rather than a sample
METU, of
GGITm
538of them.
For the case m = n
Where P is the perimeter of the study region which has
area A.
METU, GGIT 538
METU, GGIT 538
Hopkins: It compares
with percentage points of the distribution.
The physical implication of the test is that in clustered patterns
the point-event distances xi will be large relative to event-event
distances wi, vice versa in a regular pattern.
Basic Properties:
 The test requires complete enumeration of all n events in the
study region since it uses wi, so that event-event distances can
be randomly sampled.
 The above rule can be relaxed an it can be applied in conjunction
with sampling of point patterns if a “semi-systematic” sampling
scheme is employed, whereby a regular grid of study points for
calculating point-event distances xi.
METU, GGIT 538
Byth & Ripley: It compares
points:
the wi values.
METU, GGIT 538
with percentage
, where xi values are randomly paired with
Table 5.1. Available statistics for testing CSR in nearest neighbor distances
METU, GGIT 538
5.3.2. Testing for CSR Based on Distribution Function
Looking at the complete estimated distribution function
of W or X rather than just a single statistic is another
alternative for testing CSR. The basic question is:
? Can we construct a formal method for comparing the
whole of the distribution function with its theoretical
form under CSR?
The theoretical distributions for G(w) and F(x) under CSR
are:
METU, GGIT 538
Then the plots of the theoretical distributions G(w) and
F(x) are compared with the estimated
and
.
Here there is still no formal way of assessing the
significance of differences in the plots.
A more
satisfactory approach is to compare the estimated
functions with a simulation estimate of their theoretical
distributions.
METU, GGIT 538
The simulation estimate for G(w) under CSR is
calculated as:
Where;
= Empirical distribution functions each of which
is estimated from one of m independent simulations of
n events under CSR (i = 1, …, m). i.e. n events
independently and uniformly distributed in R.
METU, GGIT 538
For the purposes of assessing the significance of
departures between the simulated CSR distribution,
and that is actually observed
, it is also
necessary to define upper and lower simulation
envelopes:
METU, GGIT 538
When
is plotted against
are added to the plot:
and U(w) and L(w)
 If the data are compatible with CSR  the plot vs
should be roughly linear and at 45°.
 If the clustering is present the plot will lie above
the line.
 If the regularity is present the plot will lie under
the line.
METU, GGIT 538
U(w) and L(w) will help to assess the significance of
departures from 45° line in the plot since they have the
following property:
This also indicates the required number of simulations
in order to detect departure at a specified significance
level.
METU, GGIT 538
5.4. The K Function Tests for CSR
Under CSR the expected number of events within a
distance of h of a randomly chosen event is:
Hence theoretically under CSR:
METU, GGIT 538
Hence the estimated K function from the observed data,
, is compared with the theoretical one. One way
of doing this is comparing theoretical value with the plot
of
against h
Positive peaks

Negative troughs 
METU, GGIT 538
Clustering
Regularity
The formal assessment of the significance of observed
peaks and troughs requires knowledge of sampling
distribution of
and
under
CSR.
This
is
unknown and complex because of the edge corrections
built into
. However, it is possible to use an
analogous approach to that used for nearest neighbor
distances.
METU, GGIT 538
The method involves:
Obtaining a simulation estimate of the sampling
distributions
Constructing
envelopes:
METU, GGIT 538
upper
and
lower
simulation
Plotting
enveloped
vs h together with plots of
and
Assessing the significance of peaks troughs on the
basis of:
METU, GGIT 538
Alternate Models to CSR
 For clustered patterns
 First order effects only:
 Heterogeneous Poisson Process
 Cox Process
 Second order effects only:
 Poisson Cluster Process
 For regular patterns
 Simple Inhibition Process
 Markov Point Processes
 Either
 Markov Point Processes
METU, GGIT 538