ESRI Technology Update - City University of New York

Download Report

Transcript ESRI Technology Update - City University of New York

Exploring Continuous Data
Spatial Interpolation
Sampling
Inverse Distance Weighting (IDW)
Trend
Spline
Kriging
Spatial Analysis Lecture #8 (Exploring Continuous Data)
What is interpolation?
Process of creating a surface based on values at
isolated sample points.
Sample points are locations where we collect data on
some phenomenon and record the spatial coordinates
We use mathematical estimation to “guess at” what
the values are “in between” those points
We can create either a raster or vector interpolated
surface
Interpolation is used because field data are
expensive to collect, and can’t be collected everywhere
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Spatial Interpolation in GIS
Estimate z-value for any point location
within the map area
Assume:
The data is continuous
The data are spatially dependent (can estimate
based in surrounding locations)
Spatial Analysis Lecture #8 (Exploring Continuous Data)
What does it look like???
Ground water pollution samples example…
After
Interpolation
gives
Spatial Analysis Lecture #8 (Exploring Continuous Data)
How is it used…
This can be displayed as a 3D surface…
Spatial Analysis Lecture #8 (Exploring Continuous Data)
How is it used?
We can also use interpolation methods to create contours
Spatial Analysis Lecture #8 (Exploring Continuous Data)
What isn’t interpolation?
Interpolation only works where values are spatially
dependant, or spatially autocorrelated, that is, where
nearby location tend to have similar Z values.
Examples of spatially autocorrelated features: elevation,
property value, crime levels, precipitation
Non-autocorrelated examples: number of TV sets per
city block; cheeseburgers consumed per household.
Where values across a landscape are geographically
independent, interpolation does not work because value of
(x,y) cannot be used to predict value of (x+1, y+1).
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Where interpolation
does NOT work
Cannot use interpolation where values are not
spatially autocorrelated
Say looking at household income—in an
income-segregated city, you could take a small
sample of households for income and probably
interpolate
However, in a highly income-integrated city,
where a given block has rich and poor, this
would not work
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Data sampling
• Method of sampling is critical for
subsequent interpolation...
Regular
Random
Transect
Stratified random
Cluster
Contour
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Sampling
Systematic sampling pattern
Easy
Samples spaced uniformly at fixed
X, Y intervals
Parallel lines
Advantages
Easy to understand
Disadvantages
All receive same attention
Difficult to stay on lines
May be biases
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Sampling
Random Sampling
Select point based on random
number process
Plot on map
Visit sample
Advantages
Less biased (unlikely to match pattern
in landscape)
Disadvantages
Does nothing to distribute samples
in areas of high
Difficult to explain, location of
points may be a problem
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Sampling
Cluster Sampling
Cluster centers are established
(random or systematic)
Samples arranged around each center
Plot on map
Visit sample
(e.g. US Forest Service, Forest
Inventory Analysis (FIA)
Clusters located at random then
systematic pattern of samples at that
location)
Advantages
Reduced travel time
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Sampling
Adaptive sampling
Higher density sampling where the feature of
interest is more variable.
Requires some method of estimating feature
variation
Often repeat visits (e.g. two stage sampling)
Advantages
Often efficient as large homogeneous areas have
few samples reserving more for areas with
higher spatial variation.
Disadvantages
If no method of identifying where features are
most variable then several you need to make
several sampling visits.
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Very Important Points
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Interpolation basics
Methods of spatial interpolation:
Many different methods available
Classification according to:
•
•
•
•
exact or approximate
deterministic or stochastic
local or global
gradual or abrupt
Examples:
•
•
•
•
thiessen polygons
spatial moving overage
TINs
Kriging
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Interpolation Techniques
Thiessen Polygons
TINS
Inverse Distance Weighting (IDW)
Trend
Spline
Kriging
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Thiessen Polygons
Ease of application (appropriate for discrete
variables)
Accuracy depends largely on sampling
density
Boundaries often odd shaped as transitions
between polygons are often abrupt
Continuous variables often not well
represented
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Thiessen Polygons
Thiessen (Voronoi) polygons:
assume values of unsampled locations are
equal to the value of the nearest sampled point
Vector-based method
regularly spaced points produces a regular
mesh
irregularly spaced points produces an network
of irregular polygons
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Thiessen Polygons
Assigns interpolated value equal to the
value found at the nearest sample location
Conceptually simplest method
Only one point used (nearest)
Often called nearest sample or nearest
neighbor
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Thiessen Polygons
Sampled locations and values
Thiessen polygons
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Thiessen polygon construction
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Example Thiessen polygon
Source surface with
sample points
Thiessen polygons with
sample points
Spatial Analysis Lecture #8 (Exploring Continuous Data)
TINs
Another vector-based method often used to
create digital terrain models (DTMs)
adjacent data points connected by lines
(vertices) to create a network of irregular
triangles
• calculate real 3D distance between data points
along vertices using trigonometry
• calculate interpolated value along facets between
three vertices
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Surface Analysis in a Vector GIS
Several ways of building a TIN are possible:
from a set of irregularly-spaced points
from points in a regular fashion - a lattice
from digitized contours as line features
Not usually practical to use polygon features
Spatial Analysis Lecture #8 (Exploring Continuous Data)
The TIN Model
Sample points are connected by lines to
form triangles
Each triangle's surface would be defined by
the elevations of the three corner points
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Spatial Analysis Lecture #8 (Exploring Continuous Data)
TIN Construction
value b
value c
Interpolated
value x
value a
b
c
a
Plan view
Isometric view
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Example TIN
Source surface with
sample points
Resulting TIN
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Spatial Interpolation
Continuous Data
Whole Area Methods – interpolation based on
all points in a study area
• Trend Surface Analysis
• Fourier Series
Local Interpolators – interpolation may be
applied to only a portion of the data
• Moving Average
• Splines
• Kriging
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Whole Area Interpolators
Trend Surface
When an order higher than 1 is used, the interpolator
may generate a Grid whose minimum and maximum
might exceed the minimum and maximum of the
input points
As the order of the polynomial is increased, the
surface being fitted becomes progressively more
complex
Main Use
• Not an interpolator within a region, but a way of
removing broad features of the data prior to
using some other local interpolator
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Whole Area Interpolation
polynomial regression to fit a leastsquares surface to the input points
• least-squares fit of a plane to the set
of input points
• X,Y are independent variables, while
Z is the dependent variable
Advantages
• Trend surface interpolation creates
smooth surfaces
• Surface generated will seldom pass
through the original data points since
© Paul Bolstad, GIS Fundamentals
it performs a best fit for the entire
surface
Disadvantages
• Higher order polynomials may reach
ridiculously large or small values
outside of the area covered by the
data
Spatial Analysis Lecture #8 (Exploring Continuous Data)
• Susceptible to outliers in the data
Trend surfaces
Uses a polynomial regression to fit a least-squares
surface to the data points
normally allows user control over the order of the
polynomial used to fit the surface
as the order of the polynomial is increased, the surface
being fitted becomes progressively more complex
• higher order polynomial will not always generate
the most accurate surface, it dependent upon the
data
• most common order of polynomials is 1 through 3.
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Fitting a single polynomial trend
surface
interpolated point
data point
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Example trend surfaces
Source surface with
sample points
Linear
Quadratic
Cubic
Goodness of fit
Goodness of fit
(R2) = 82.11 %
Goodness of fit
(R2) = 92.72 %
(R2) = 45.42Spatial
% Analysis Lecture #8 (Exploring Continuous Data)
Trend surfaces
Fitting a statistical model, a trend surface,
through the measured points. (typically
polynomial)
Where Z is the value at any point x
Where ais are coefficients estimated in a
regression model
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Inverse Distance Weighting (IDW)
Estimates the values at unknown points
using the distance and values to nearby know
points (IDW reduces the contribution of a
known point to the interpolated value)
Weight of each sample point is an inverse
proportion to the distance.
The further away the point, the less the
weight in helping define the unsampled
location
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Inverse Distance Weighting (IDW)
Zi is value of known point
Dij is distance to known
point
Zj is the unknown point
n is a user selected exponent
(often 1,2 or 3)
Any number of points may
be used up to all points in
the sample; typically 3 or
more
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Inverse Distance Weighting (IDW)
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Inverse Distance Weighting (IDW)
Factors affecting interpolated surface:
Size of exponent, n affects the shape of
the surface (larger n means the closer points
are more influential)
A larger number of sample points results
in a smoother surface
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Spline Method
Another option for interpolation method
This fits a curve through the sample data assign values to
other locations based on their location on the curve
Thin plate splines create a surface that passes through
sample points with the least possible change in slope at all
points, that is with a minimum curvature surface
SPLINE has two types: regularized and tension
Tension results in a rougher surface that more closely
adheres to abrupt changes in sample points
Regularized results in a smoother surface that smoothes out
abruptly changing values somewhat
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Splines
Spline functions are mathematical equivalents of
the flexible ruler
• Piecewise functions (fit to a small number of
data points) using a polynomial
Fits a minimum-curvature surface through the
input points. Conceptually, it is like bending a
sheet of rubber to pass through the points, while
minimizing the total curvature of the surface.
It fits a mathematical function to a specified
number of nearest input points, while passing
through the sample points.
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Splines
Name derived from the drafting tool, a flexible ruler, that
helps create smooth curves through several points
Spline functions (also called splines) are use to
interpolate along a smooth curve. (similar to the flexible
ruler)
Force a smooth line to pass through a desired set of
points
Constructed from a set of joined polynomial functions
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Kriging
Similar to Inverse Distance Weighting
(IDW)
Kriging uses the minimum variance
method to calculate the weights rather
than applying an arbitrary or less precise
weighting scheme
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Kriging
A statistical based estimator of spatial variables
Components:
Spatial trend (an increase/decrease in a variable
that depends on direction e.g. temperature may
decrease toward the northwest)
Autocorrelation (the tendency for points near each
other to have similar values)
Random (stochastic)
Creates a mathematical model which is used to
estimate values across the surface
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Kriging Method
Semivariograms measure the strength of statistical correlation as
a function of distance; they quantify spatial autocorrelation
Because Kriging is based on the semivariogram, it is
probabilistic, while IDW and Spline are deterministic
Kriging associates some probability with each prediction, hence
it provides not just a surface, but some measure of the accuracy of
that surface
Kriging equations are determined by fitting line through points
so as to minimize weighted sum of squares between points and line
These equations are weighted based on spatial autocorrelation,
which is determined from the semivariograms
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Kriging
Lag distance
Where:
Zi is a variable at a sample point
hi is the distance between sample points
Every possible set of pairs Zi,Zj defines a
distance hij, and is different by the amount
Zi – Zj.
The distance hij is know as the lag distance
between point i and j. Also there is a
subset of points in a sample set that are a
given lag distance apart
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Kriging
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Kriging
Semi-variance
Where Zi is the measured variable at one point
Zj is another at h distance away
n is the number of pairs that are approximately h distance apart
Semi-variance may be calculated for any h
(When nearby points are similar (Zi-Zj) is small so the semivariance is small. High spatial autocorrelation means points
near each other have similar Z values)
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Variogram
Semi-variance is usually small at small lag distances and
increases to a constant value as the lag distance h increases
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Variogram
A nugget is the initial semi-variance when the autocorrelation typically is highest
The sill is the point where the variogram levels off; background noise; where there is
little autocorrelation
Range the lag distance at which the sill is reached
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Kriging
A set of sample points are used to estimate the
shape of the variogram
Variogram model is made
(A line is fit through the set of semi-variance
points)
The Variogram model is then used to interpolate
the entire surface
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Exact vs. Non-Exact Methods
Exact
Thiessen
IDW
Spline
Kriging
Non Exact
Fixed-Radius (averages several points near
the sample location)
Trend surface (surface typically does not
pass through the measured points)
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Example
Here are some sample elevation points from which surfaces were
derived using the three methods
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Example: IDW
Done with P =2. Notice how it is not as smooth as Spline. This is
because of the weighting function introduced through P
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Example: Spline
•Note how smooth the curves of the terrain are; this is because
Spline is fitting a simply polynomial equation through the points
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Example: Kriging
This one is kind of in between—because it fits an equation
through point, but weights it based on probabilities
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Theissen
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Inverse Distance Weighting
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Kriging
Spatial Analysis Lecture #8 (Exploring Continuous Data)
Spatial Analysis Lecture #8 (Exploring Continuous Data)
© ESRI
Conclusions
Interpolation of environmental point data is
important skill
Many methods classified by
local/global, approximate/exact, gradual/abrupt and
deterministic/stochastic
choice of method is crucial to success
Error and uncertainty
poor input data
poor choice/implementation of interpolation method
Spatial Analysis Lecture #8 (Exploring Continuous Data)