lecture 10 data qual.. - Dycker@control
Download
Report
Transcript lecture 10 data qual.. - Dycker@control
DATA QUALITY AND
ERROR
Terminology, types and sources
Importance
Handling error and uncertainty
DATA QUALITY
GIGO: garbage in, garbage out
Because it’s in the computer, don’t mean it’s
right
Accept there will always be errors in GIS
INTRODUCTION
•
GIS - great tool for spatial data analysis and
display
•
question: what about error?
•
data quality, error and uncertainty
error propagation
confidence in GIS outputs
be careful, be aware, be upfront
TERMINOLOGY
•
various (often confused terms) in use:
error
uncertainty
accuracy
precision
data quality
ERROR AND UNCERTAINTY
Error
• wrong or mistaken
• degree of inaccuracy in a calculation
e.g. 2% error
Uncertainty
• lack of knowledge about level of error
• unreliable
Accuracy and Precision
Accuracy
• extent of system-wide
bias in measurement
process
Imprecise
Precision
• level of exactness
associated with
measurement
Precise
Inaccurate
Accurate
1
2
3
4
DATA QUALITY
•
degree of excellence in data
•
general term for how good the data is
•
takes all other definitions into account
error
uncertainty
precision
accuracy
DATA QUALITY
•
based on the following elements:
positional accuracy
attribute accuracy
logical consistency
data completeness
POSITIONAL ACCURACY
•
spatial: deviance from true position (horizontal or vertical)
•
general rule: be within the best possible data resolution
i.e: for scale of 1:50,000, error can be no more than
25m
•
can be measured in root mean square error (RMS) measure of the average distance between the true and
estimated location
•
temporal: difference from actual
time and/or date
ATTRIBUTE ACCURACY
•
classification and measurement accuracy
a feature is what the GIS thinks it to be
i.e. a railroad is a railroad and not a road
i.e. a soil sample agrees with the type mapped
•
rated in terms of % correct
•
in a database, forest types are grouped and placed
within a boundary
•
in reality - no solid boundary where only pine trees grow
on one side and spruce on the other
ATTRIBUTE ACCURACY
LOGICAL CONSISTENCY
•
presence of contradictory relationships in the database
•
non-spatial
crimes recorded at place of occurrence, others at
place where report taken
data for one country is for 2000, another for 2001
data uses different source or estimation technique for
different years
LOGICAL CONSISTENCY
•
spatial
overshoots and gaps in road networks or parcel
polygons
Good logical consistency
COMPLETENESS
•
reliability concept
•
partially a function of the criteria for including features
•
are all instances of a feature the GIS claims to
include, in fact, there?
when does a road become a track?
simply put, how much data is missing?
SOURCES OF ERROR
•
sources of error:
data collection and input
human processing
actual changes
data manipulation
data output
DATA COLLECTION AND INPUT
•
•
inherent instability of phenomena itself
random variation of most phenomena (i.e. leaf size)
edges may not be sharp boundaries (i.e. forest
edges)
description of source data
data source
name, date of collection, method of collection, date of
last modification, producer, reference, scale,
projection
inclusion of metadata
DATA COLLECTION AND INPUT
•
instrument inaccuracies:
satellite/air photo/GPS/spatial surveying
e.g. resolution and/or accuracy of digitizing equipment
thinnest visible line: 0.1 - 0.2 mm
at scale of 1:20,000 - 6.5 - 12.8 feet
anything smaller, not able to capture
attribute measuring instruments
DATA COLLECTION AND INPUT
•
model used to represent data
•
e.g. choice of datum, classification system
data encoding and entry
e.g. keying or digitizing errors
original
digitised
DATA COLLECTION AND INPUT
Attribute uncertainty
•
uncertainty regarding characteristics (descriptors,
attributes, etc.) of geographical entities
•
types: imprecise or vague, mixed up, plain wrong
•
sources: source document, misinterpretation, database
error
505.9
500
500-510
505.9
238.4
238.4
240
230-240
238.4
505.9
HUMAN PROCESSING
•
misinterpretation (i.e.
photos), spatial and
attribute
•
effects of classification
(nominal/ordinal/
interval)
•
Global
DEM
Nation
al DEM
European
DEM
effects of scale change
and generalization
Scale of data
Local
DEM
HUMAN PROCESSING
•
generalization - simplification of reality by cartographer
to meet restrictions of map scale and physical size,
effective communication and message
•
can result in:
reduction, alteration,
omission and
simplification of map
elements
1:10,000
1:500,000
1:25,000
City of Sapporo,
Japan
ACTUAL CHANGES
•
gradual natural changes: river courses, glacier
recession
•
catastrophic changes: fires, floods, landslides
•
seasonal and daily changes: lake/sea/river levels
•
man-made: urban development, new roads
•
attribute change: forest growth (height), discontinued
trail/roads, road surfacing
ACTUAL CHANGES
•
age of data
Northallerton circa 1999
Northallerton circa 1867
DATA MANIPULATION
vector to raster conversion errors
•
coding and topological mismatch errors:
cell size (majority class and central point)
Fine raster
Coarse raster
DATA MANIPULATION
vector to raster conversion errors
•
coding and topological mismatch errors:
grid orientation
Original
Tilted
Original raster
Shifted
DATA MANIPULATION
•
compounding effects of processing and analysis of
multiple layers
if two layers each have correctness of 90%, the
accuracy of the resulting overlay is around 81%
•
density of observations - TIN modeling and interpolation
•
inappropriate or inadequate class intervals or inputs for
models
DATA OUTPUT
•
scaling accuracies
•
detail on scale bar and scale type
error caused by inaccuracy of the output devices:
resolution of computer screen or printer
colour palettes: intended colours don’t match from
screen to printer
DATA OUTPUT USE
•
information may be incorrectly understood
•
information may be inappropriately used
HANDLING ERROR
• must learn to cope with error and uncertainty in GIS
applications
minimise risk of erroneous results
minimise risk to life/property/environment
• more research needed:
mathematical models
procedures for handling data error and propagation
empirical investigation of data error and effects
procedures for using output data uncertainty estimates
incorporation as standard GIS tools
HANDLING ERROR
• Awareness
knowledge of types, sources and effects
• Minimization
use of best available data
correct choices of data model/method
• Communication
to end user!