GIS Data Preparation and Integration Digesting the Food 11/6/2015 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals.

Download Report

Transcript GIS Data Preparation and Integration Digesting the Food 11/6/2015 Ron Briggs, UTDallas GISC 6381 GIS Fundamentals.

GIS
Data Preparation and Integration
Digesting the Food
11/6/2015 Ron Briggs, UTDallas
GISC 6381 GIS Fundamentals
1
Data Preparation and Integration: the necessary steps
• Geocoding: assigning geographic coordinates to points
– Perhaps the most basic form of spatial data entry
• data media conversion
– scanning
– digitizing
• data format conversion
– raster & vector
• data reduction
• Topology, error detection and topological editing
• rectification and registration (one on top of the other)
– overlaying sheets and referencing to the real world
• edge matching & image adjustment (side by side)
– linking & balancing adjacent sheets
• interpolation
• conflation
11/6/2015 Ron Briggs, UTDallas
GISC 6381 GIS Fundamentals
2
Geocoding:assigning spatial coordinates to point data
Address Matching assigns spatial coordinates (explicit location) to
addresses (implicit location)
Address matching requires street network file with street attribute information
(street name and number range) for all street segments (block sides)
–
–
–
–
“Zone” variable required if data spans multiple cities (to handle duplicated street names)
precise matching of street names can be problematic
completeness (esp. for ‘new’ streets) important
PO boxes, building names, and apartment complex names cause problems.
Implementation in ArcGIS is 3-step process
– In ArcToolbox (9.2), process street network file to create a Geocoding Service
– In ArcMap, load appropriate geocoding service via Tools/Geocoding/Services Manager
– In ArcMap, geocode a table of addresses using Tools/Geocoding/Geocode Addresses
Point Location Files containing lat/long or x,y coordinates
(e.g derived via GPS)
– bring table (e.g. in .csv or .dbf format) into ArcGIS using add data icon
– Right click table name in T of C and select Display X,Y data
– Displays as “event layer.” Export to shapefile or gdb feature class for spatial data set.
Input table must contain 3 variables at minimum: Feature ID, x, y
Data Media Conversion--Scanning:
automated recording of map or aerial
• Produces “dumb” raster data
• Great if need only raster representation
– vectorize using conversion software • Automated creation of vector data from
– Create “smart” image using digital
scanning very problematic:
image processing techniques
• electromechanical
– $100-$50,000 instruments
– drum or flatbed
– scan resolution depends on price!
• down to 20 microns
(millionth of m)
• Scanners v. sensors
–
–
–
–
–
docs must be clean
complex line work adds error
lines shouldn’t be broken with text.
text may be interpreted as lines
automatic feature detection (road versus
railroad) difficult
• ESRI’s ArcScan for ArcGIS (included
with ArcEditor) provides interactive,
– Sensors collect data directly in digital semi-automated raster to vector
form (e.g. digital cameras)
conversion.
– Sensor resolution now (2005>)
matches that of photos, so scanning
photos becoming old technology
– Still lots of paper maps around e.g. •
property ownership records
– Other vendors offer specialized
conversion software
Digital image processing techniques
used to create “smart raster”
– Identify feature type within each raster
11/6/2015 Ron Briggs, UTDallas
GISC 6381 GIS Fundamentals
4
Data Media Conversion--Digitizing:
manually tracing a map or aerial
•
•
•
•
•
•
•
•
•
Applied to map or aerial photo
Use hard copy map/photo on
table/tablet, or scanned image on
screen (heads-up digitizing)
pen or cursor detects x, y coords
coordinates are in inches/cms from
lower left (0,0)
control points (tic marks) relate
digitized coordinates to real world
lat/long coordinates
coordinates captured in stream or
point mode
accuracy of table (but not user!)
usually better than 0.1 mm
all nodes and polygons should be
marked and numbered first
essentially a vector approach
11/6/2015 Ron Briggs, UTDallas
Problems:
•
paper maps unstable
– crease and fold
– stretch with humidity ( up to 3%)
– photos more stable (0.2%)
•
map errors transferred to GIS
– maps often prepared for display
not accuracy
•
•
human hand very shaky
often generates undershoots,
overshoots, & double lines
– editing and clean-up essential
GISC 6381 GIS Fundamentals
5
Data Format Conversion:
•
Vector to Vector
– e.g. whole polygon (e.g SAS map data) to
point/arc/polygon
– computationally intense
– no accuracy loss providing data is ‘clean’
– perfectly transitive
•
raster to raster
– may involve resampling (see under data
reduction)
– may involve conversion between different
vendor’s raster formats (e.g. GRID to
BIL)
•
vector to raster: point
– node x,y assigned to closest raster cell
– locational shift almost inevitable; error
depends on raster size.
– two points in one cell indistinguishable
– not transitive; cannot retrieve original data
without error
11/6/2015 Ron Briggs, UTDallas
Vector raster
vector
raster
4 possibilities
vector to raster: line
– cells assigned if touched by line
– stair step appearance of diagonal
lines (called aliasing)
– can be visually improved through
anti aliasing: brightness of cells
varied based on fraction of cell
covered by the line
• raster to vector
– by far the most difficult
Transitive: the ability to reproduce the
original data after conversion.
GISC 6381 GIS Fundamentals
6
Vector to Raster Conversion
Point
Orthogonal Line
Diagonal Line
(more problemmatic)
Vector
Note the use of
anti-aliasing to
improve line’s
visual
appearance
Raster
11/6/2015 Ron Briggs, UTDallas
GISC 6381 GIS Fundamentals
7
Raster to Vector Data Conversion: 3-step process
– skeletonizing (or thinning): to reduce rasters to unit width
• peeling approach successively removes outer edges
• medial axis approach determines set of interior pixels farthest from outer edges
– vector extraction: to identify lines
• 4-connected reconstruction
– joins center points of 4-connected neighbors if present
– particularly bad for diagonal line reproduction
• 8-connected reconstruction
– joins center points of 8-connected neighbors if present
– diagonal lines reproduced but adds extra lines
• 8-connected reconstruction with redundancy elimination
– if 4-connected neighbor line exists, don’t draw diagonal
– reduces redundant lines
Available via the
ArcScan
extension for
ArcGIS, as well
as via several
specialized
packages from
other vendors
– topological reconstruction: recreates topological structure
– create nodes at line junctions
– construct arcs
– define polygons (manual designation required)
11/6/2015 Ron Briggs, UTDallas
GISC 6381 GIS Fundamentals
8
Raster to Vector Conversion
Skeletonizing
For example, go to:
http://www.cosc.canterbury.ac.nz/people/mukundan/covn/Thin.html
11/6/2015 Ron Briggs, UTDallas
GISC 6381 GIS Fundamentals
9
Raster to Vector Conversion:
Vector Extraction
4-connect reconstruction
Vector
Raster
4-connect reconstruction:
search the 4 surrounding cells and
join center points if present
11/6/2015 Ron Briggs, UTDallas
GISC 6381 GIS Fundamentals
10
Raster to Vector Conversion:
Vector Extraction
8-connect reconstruction
Vector
Raster
8-connect reconstruction:
search the 8 surrounding cells and join
center points if present.
11/6/2015 Ron Briggs, UTDallas
GISC 6381 GIS Fundamentals
11
Raster to Vector Conversion:
Vector Extraction
8-connect reconstruction with redundancy elimination
Vector
Raster
8-connect with redundancy
elimination:
draw diagonal from 8-cell search only if not already
connected by orthogonal from 4-cell search
11/6/2015 Ron Briggs, UTDallas
GISC 6381 GIS Fundamentals
12
Data Format Conversion Implementation in ArcGIS 9
To Vector
To Raster
Arctoolbox>Conversion Tools>To Raster>
Raster To Other (multiple)
From
Raster
Converts one or more raster dataset formats
supported by ArcGIS to a GRID, IMAGINE,
TIFF, or geodatabase raster dataset format
Can also be accomplished thru ArcCatalog,
Export function
Arctoolbox>Conversion Tools>To Raster>
Feature to Raster
From
Vector
Converts any shapefile, coverage, or geodatabase
feature class containing point, line, or polygon
features to a raster dataset
Arctoolbox>Conversion Tools>From Raster>
Raster to Point
Raster to Polygon
Raster to PolyLine
Converts raster datasets in GRID, IMAGINE, or
TIFF formats to shapefiles or feature classes.
Results may not be what you expect!
Can also be accomplished thru ArcCatalog, Export
function
Use ArcCatalog, Export function for
conversions between shapefiles, gdb feature
classes, coverages and CAD
ArcGIS Data Interoperability Extension
for the most comprehensive set of conversions
Can also be accomplished thru ArcCatalog,
Export function.
11/6/2015 Ron Briggs, UTDallas
GISC 6381 GIS Fundamentals
13
• Why?
– conserve space
Data Reduction
• Disk in past
• Comm. bandwidth today
• Thinning (vector data)
– conserve time
• reduce processing time (batch)
• speed response time (interactive)
• Resampling (raster data)
– ‘average’ the 4 values in a 2by2
neighborhood
– use this 1 value in a single cell
occupying the location of the 4
original cells
– use mean for interval data; rules
required for ordinal or nominal data
– not transitive!
3
7
2
4
16 bytes
– often applied to data digitized
in stream mode
– tolerance elimination: remove
nearest-neighbor points which
are ‘too close’ (e.g. output
device resolution insufficient
to distinguish)
– topological elimination*:
remove points unnecessary for
topo structure
– model-based elimination: fit
polynomial by least squares
and record fewer points along
its path
*Normally uses the Douglas/Poiker (or Peucker) algorithm: David H.
4 bytes
Douglas & Thomas K. Peucker Algorithms for the reduction of the
number of points required to represent a digitized line or its caricature,
Canadian Cartographer, 1973
4
1 byte
11/6/2015 Ron Briggs, UTDallas
Implement in ArcGis via Advanced Editing toolbar,
Generalize tool
14
GISC 6381 GIS Fundamentals
Topology & Errors
Topology
--knowledge about relative spatial positioning
--spatial relationships between features and rules about these
relationships
--managing data cognizant of shared geometry
Implies knowledge of the three Cs:
– connectivity (linked):
– congruency (coincident/same as/on top of)
– contiguity (adjacent)
It is critical that spatial data be created and managed so that it is topological
clean--free from topological errors
--editing must always aim to maintain topological structure
In topological editing, changes made to one feature (line, polygon, etc.) are also
reflected in all other features to which it is connected, coincident, or adjacent
In the classic GIS data structure model (as discussed in GIS Data Structures
lecture) this implies that, for example
--all arcs have nodes at end points
--there is a node wherever arcs intersect or connect
--a single arc forms the border between contiguous
polygons (e.g. Dallas and Tarrant county)
Tarrant
Dallas
--a single arc represents a common boundary
(e.g. state and county boundary)
11/6/2015 Ron Briggs, UTDallas
GISC 6381 GIS Fundamentals
15
Errors: detection and removal
• GIS packages commonly use topological structure checking to
detect errors
• Editing based on node snapping used to correct errors:
moving a feature so its coordinates correspond exactly with
another’s
• snapping conducted based on tolerances -- snap if within 1
foot, for example
• Care must always be taken to assure that topological
“cleaning” does not itself introduce errors (e.g. snapping
nodes and lines together which shouldn’t be)
11/6/2015 Ron Briggs, UTDallas
GISC 6381 GIS Fundamentals
16
Topological errors or real world occurrences?
common problems
•
•
•
•
•
•
dangling arc (node missing at one
end)
No node at arc intersection
(overpass?)
Overshoot (or missing node)?
undershoot?
pseudo node (but perhaps road
surface changes)
pseudo arc (connects to itself)
•
open polygon
•
Sliver polygon
•
gap
11/6/2015 Ron Briggs, UTDallas
GISC 6381 GIS Fundamentals
17
How ArcGIS Handles Topology
• The original Coverage data model, introduced with
ArcInfo in 1981, incorporated topology as a part of the
data
– The CLEAN command checked for, and automatically “fixed”,
topological errors based on a set tolerance
• It could introduce errors into the data
– The BUILD command then rebuilt polygon structures
• ArcGIS 8.3 introduced the concept of topological rules for
geodatabases in which the topological relationships are
stored as a topology feature class separate from the data
itself
– The user can generate an error report, review each error, and then
fix it in the data if desired, or mark it as an “exception”
11/6/2015 Ron Briggs, UTDallas
GISC 6381 GIS Fundamentals
18
Georeferencing: Rectification and Registration
providing true earth location/overlaying layers
• rectification: rearrangment of
location of objects to correspond to
a specific reference system (usually
geodetic)
• registration:
rearrangment of location of objects
of one set so they correspond with
those of another, without reference
to a specific reference system
Despite formal difference, often used
interchangeably
Two methods
• homogeneous transformation
via rotation, translation,
scaling, skewing
– used for map projection and
similar conversions
• differential transformation via
rubber sheeting
– used to correctly position
distorted images or scanned
maps or documents
•Most commonly used to relate images (e.g. scanned photo) to a vector layer, but
can also be used to “fix” incorrect positioning of features in a vector layer
•Implemented in ArcMap: via the Georeferencing toolbar for images
via the Spatial Adjustment toolbar for vector layers
19
11/6/2015 Ron Briggs, UTDallas
GISC 6381 GIS Fundamentals
Transformation:
(homogeneous conversion)
• translation of origin
– from digitizer origin for
sheet to ‘true’ origin of GIS
file
• rotation of axis
– e.g to true north
• scaling of axis
translation
differential
scaling
• homogenous:
• differential (ovals to
circles)
• skewing of axis
Changing map projections may
involve all 4
11/6/2015 Ron Briggs, UTDallas
rotation
GISC 6381 GIS Fundamentals
skewing
20
Rubber Sheeting
(differential conversion)
•
•
GIS file is differentially ‘stretched’
so that tic points in file overlay
corresponding ground control (tie)
points on earth’s surface (or tic
points in a second file)
polynomial fitted by least squares
between known ground control
coords and tic point coords in GIS
– “Least squares” minimizes the sum
of the squared distances between
tic/tie pairs
•
•
•
derived parameters then applied to
all coordinates in file
after conversion, tic points are on
average closer to ground control
points, but not identical
can’t do this with a paper map!
11/6/2015 Ron Briggs, UTDallas
--the more the better
ground control (tie)
--well distributed
map locations (tic)
--known lat/long of
ground control tie points
(usually obtained from
GPS)
needed for rectification
--common identifiable
points in each file
needed for registration
GIS file
GISC 6381 GIS Fundamentals
21
Edge Matching:
Joining map sheets to create a seamless GIS
Process
• required for topo. consistency even if
features line-up visually
• snapping used to connect features
Issues
• acceptable tolerance before
‘further investigation’ of mismatch
• ‘how far back’ to go on sheet(s) with
adjustments for mismatch
Causes of mismatch
• paper map shrinkage/expansion
• errors from digitizing/scanning
– georeferencing errors
– accuracy of equipment
– extrapolation or round-off errors
Corresponding features fail
to match on two sheets:
Edge matching in this example
would likely require ‘further research’
• overlapping map coverage
Implement in ArcGIS 9 by:
1. ArcToolbox>Data Management>General>Append
(replaces Geoprocessing Tools>Merge in AG 8)
– combines two (or more) files, but does not link features
2. Spatial Adjustment toolbar, edge match tool
– links features (after links have been manually identified)
11/6/2015 Ron Briggs, UTDallas
GISC 6381 GIS Fundamentals
22
Image Adjustments
raster/image data issues
Raster data is made from separate images (photos) or tiles which are mosaiced to produce
“seamless image”
Collars: must be removed for seamless image
– Overlap between adjacent images
– Borders of scanned maps
Image Balancing and Feathering: adjusting radiometry for consistent and/or desired
image color, brightness, contrast
– Checker board appearance
– Abrupt line between adjacent images
– Brightness levels wash out detail in highly reflective areas, but enhance detail in low
reflectance areas
– Inconsistent signature for same features, especially water as function of wind or sun
relative to camera (and is it blue?)
Digital Ortho adjustments:
–
–
–
–
Ground control (usually with GPS for visible points) to obtain ‘real world’ location
Ground control for camera’s angle relative to ground
Camera calibration data to remove lens distortion
Digital terrain model (dtm) to remove elevation “distance”
(5 mi. on map to mountain top, but 6 mi walking or on photo if mountain is 5,280 feet high!)
11/6/2015 Ron Briggs, UTDallas
GISC 6381 GIS Fundamentals
23
Collar removal required.
Image
Balancing/
feathering
required
Tiles
Before
After
2005 NCTCOG Digital Orthos
11/6/2015 Ron Briggs, UTDallas
GISC 6381 GIS Fundamentals
27
Interpolation:
to create regular spacings from irregular data
(e.g creating raster elevation surface from set of point height measurements)
• estimating values for
locations with no data
based on:
– known values, and
– understanding of spatial
behavior of phenomena
• generally, should assign
more importance to closer
known values than those
further away
Estimated values
• weighting functions
– average closest n (2?) points
• ignores distance
– fit line between closest 2
– fit surface between closest 3
• trend surface approaches
– one high order polynomial
• oscillation a problem
– finite element approach:
fit separate polynomials for
each local area
– kriging: uses correlations of
values with distance
Implemented in ArcGIS 9 via ArcToolbox>Spatial Analyst Tools>Interpolation
11/6/2015 Ron Briggs, UTDallas
GISC 6381 GIS Fundamentals
28
Conflation
• create new master coverage from the best spatial and attribute qualities
of two or more source coverages
– combine multiple coverages into one to simplify support
– updated data obtained (e.g. new TIGER file) but need to preserve
enhancements made to earlier version
– two groups modify a single file, then need to recreate single version which
preserves mods
• create new master coverage from quality spatial data in one source and
quality attribute data in another
– somewhat narrower definition
• Depending on the situation, can require application of a variety of
processing tools and can be labor intensive:
• Approaches available within ArcGIS 9 include
– Spatial Adjustment toolbar, specifically attribute transfer tool
– ArcToolbox>Analysis Tools>Overlay>Update
• other add-ins available such as
• MapMerge from ESEA, Mountain View CA for ArcGIS
• GIS/T-Conflate for transportation applications
11/6/2015 Ron Briggs, UTDallas
GISC 6381 GIS Fundamentals
29
NAVSTAR Global Positioning System (gps)
–use to collect ground control for imagery/orthos
–or for point/line data (manholes, roads, etc)
Types of Ground Collection and Corrrection
NAVSTAR Satellite Program
•
•
•
•
SA turned off May 1st, 2000
–
–
–
–
•
–
–
Hand-held unit provides 10m accuracy (with SA off)
$150-$1,500 per unit
24 (NAVigation Satellite Time and Ranging) WAAS (wide area augmentation system)
satellites in 11,00 mile orbit provide 24 hour – <3 meter accuracy in practice (spec. is 7m vert/horiz)
– Base stations (25 across US) monitor satellites
coverage worldwide
– 2 master stations (E & W coast) calculate corrections
first launched 1978; full system operational
– upload to two geosynchronous satellites over equator
December 1993.
– correction signal broadcast to GPS receivers (no
special extra equipment needed unlike DGPS)
gps receiver computes locations/elevations via
signals from simultaneously visible satellites – Began operation June, 1998
– To be expanded to cover Canada, Mexico, Panama
(minimum 3 for 2-D, 4 for 3-D)
– European EGNO, Asian MSAS under development
Selective Availability (SA) security system
– 100m accuracy with single receiver, if active
– 10-15m accuracy if inactive
•
Autonomous
Multiple ways to counteract SA
Even USCG broadcasted correction signal!
Europeans threatened to compete
Regional denial of signal possible
Russia’s 21-satellite GLONASS (Global
Navigation Satellite System) also available.
Differential (DGPS-predecessor to WAAS)
–
–
–
accuracy 1-5m depending on equipment/exact method
equipment $1,500-$15,000 per receiver
correct for SA and other errors via either
• real time correction signals over FM radio
• post process with data from Internet
Kinematic:
–
–
–
–
high accuracy engineering (within cms);
two receivers (base station and rover
must lock-on to satellites
equipment $15-30K per station
Factors Affecting GPS Accuracy
• Ionosphere
– worst in evening at low altitudes (but ephemerous best there)
• troposhere
– especially water vapor which slows signal
• multipath
– reflected signals from buildings, cliffs, etc
• ephemerous
– position and number of satellites in sky
– 4 required for 3D (horiz. and vertical), 3 for 2D (no elevation)
– ideallly, 3 every 120° horizon. with 20° elev., 1 directly above
• blockage (of satellite signal)
– by foliage, buildings, cliffs, etc.
– WAAS signal espec. subject to blocking by terrain & buildings ‘cos is from
geostationary equatorial satellite
Overall, accuracy better at night than during day.
11/6/2015 Ron Briggs, UTDallas
GISC 6381 GIS Fundamentals
31
Conclusion
Most of the effort in most GIS projects involves
data preparation and integration!
11/6/2015 Ron Briggs, UTDallas
GISC 6381 GIS Fundamentals
32