Logical consistency - DePaul University GIS Collaboratory

Download Report

Transcript Logical consistency - DePaul University GIS Collaboratory

Logical consistency
February 24, 2006
Geog 458: Map Sources and Errors
Outlines
• What is logical consistency?
• Testing spatial consistency
– Attribute consistency
– Temporal consistency
– Spatial consistency
• Documenting logical consistency in
metadata
Logical consistency
• Lack of contradiction in a database
• Fidelity of relationships encoded in the data structure of
the digital cartographic data (SDTS)
• Ensures logical consistency of operations performed on
data
• Deals with logical rules applied to space, time, and
attribute
• Attribute consistency draws upon database integrity
rules (database consistency)
• Spatial consistency refers to the conformance to
topological rules based on graph theory (topological
consistency)
Attribute consistency
• Key constraints
– Primary Key (PK: attributes of a class that uniquely identify an
instance of the class) should be unique
• I.e. should not be repeated
– PK should be Not Null (if it’s missing, how would you identify an
instance uniquely?)
• Referential consistency constraints
– Foreign Key (FK: attributes of a class that establish relationships
among tables) should correspond to PK in related tables, or
ensure the existence of related tables
– No value should be inserted in a table as FK without a
corresponding column in the related tables (i.e. they should be
updated simultaneous whenever update occurs)
Attribute consistency
• Domain consistency rules (consistency within
column values)
– Attribute values must fall within certain ranges or may
assume only certain pre-defined values
• E.g. the value for the day may only be in the range 1 and 31
• E.g. area or length cannot be negative
• E.g. a location in geographical coordinates can only in a
certain range
– Domains can be composite
• E.g. a location in Degree Minute Second
• E.g. a day in Day Month Year
Attribute consistency
• Consistency between column values
– A given attribute value must comply with other
attributes when its value is derived from the attributes
• 5 digit FIPS code for county should be consistent with values
for state and county
– A given attribute value must also comply with the
value that can be derived from the metric
characteristics of spatial objects
• The area of a parcel stored in a DB must be consistent with
the computed are based on the stored geometry
– Whenever column values are changed, whose values
are derived from that column, should be updated
accordingly
Attribute consistency
• Broadly defined, conformance to logical rules applied to
attributes
– E.g. If the study area is the conterminous U.S., longitude should
be negative (since it’s in the western hemisphere)
• Usually, database management system enforces
integrity rules automatically
– Automatic update, NotNull enforcement
• If GIS is not implemented as DBMS, you should check
them manually
– Use of summarize tool for checking key/domain constraints
– Compute metric values; Compare metric values with stored
values
– Table join with master table; compare the master value to stored
values
Temporal consistency
• No violation of temporal topological rule
• Mostly temporal information is treated as
attributes in common DB system
• For example,
– Individual travel survey data: one person can
exist at one point in time and at one point in
space
– Traffic accident data: accident time should
occur before dispatch time
Possible temporal relationships
between entities
• Process: a series of
changes with some
unifying principle
• Event: countable
occurrence located in
point in space
Possible qualitative relations
between two intervals
– Can be point-based (at
some point in time)
– Can be line-based (at
some interval in time)
You can use this interval-logic for testing temporal consistency: e.g. session A
should be during the conference
Spatial consistency
• Mostly refers to the conformance to topological
rules
• Can arise at various stages of data handling
– Digitizing/ updating error
– Error propagation through processing
– Miscoding of topological relationships
• Topological rules can be checked on topological
vector data model (link-node model); can be
checked within a single layer or between layers
Topological rules within a layer
• Missing node makes a correct topological description
impossible (e.g. road intersection will make it connected
to other links)
• Pseudo nodes (nodes where only two edges meet) can
lengthen computation time
• Undershoots and overshoots (edges that end in only one
node) may be fictitious line due to digitizing errors
• Duplicate lines: comes from manual digitizing, when two
data sets are to be merged; can create slivers
• Label points (reference points or centroids): used to link
polygon to attributes to place labels
– Missing label points: can cause the inconsistency between
geometric data description and topological data description
– Multiple label points: # polygons =/= actual # polygons
Read SDTS data quality (or reading in the course package)
• Some encodings are not necessarily errors
What ‘s encoded in
the database?
What it is indeed?
Is it consistent? How to check
errors?
Dangling node
Dead end
Yes (no error)
Dangling node
There’s no such
road segment
No (error)
Intersecting node
Road intersection
Yes (no error)
Intersecting node
Overpass
No (error)
Check dangling
node: too short
one? How do I
know if it’s a
dangling node?
Check intersecting
node: overlay with
overpass layer if
any
Possible spatial relationships
between entities
• Metric
– Distance, direction
• Topological
– Qualitative spatial
relation between regions
Disjoint, meet,
within, covered by,
cover, contain,
equal, overlap
You can compare this relational info. encoded in the database to the actual
relational info. (e.g. King County should be within Washington State)
Topological rules between layers
• For example,
• the street center lines should fall inside the
pavement area;
• rivers should be inside their floodplain;
• zoning boundaries should follow certain parcel
boundaries (most of the time);
• hierarchical relationships should nest (cities
inside counties inside states - except Bothell)
• ... too many potential relationships to delineate
them all
Detecting topological errors
• Functionalities available in GIS
– Nodeerrors: check whether there are node errors
(pseudo node, dangling node)
– Labelerrors: check whether there are missing/multiple
label points of polygon
– Display node by types
– Trace: check connectivity
• Your own method
– Missing node: build/clean the data and compare the
test data with the cleaned data…
Editing topological errors:
automated vs manual
• Automated methods
– Use “BUILD” and “CLEAN” command given a
threshold; it can remove duplicate lines, sliver,
undershoot, and overshoot given threshold in
batch
– But care should be exercised since topology
building is dependent on threshold
• Interactive methods
– Manual inspection with some visual aids
Editing topological errors:
depending GISystem
• The Arc/INFO CLEAN and BUILD do topology
building/cleaning. This is now hidden in the ArcToolbox,
and invokes in hidden ways with less ability to control it.
Since the coverage model is not as central anymore,
• An ArcMap shapefile can contain overlapping shapes.
There is no easy way to get all the slivers and gaps
edited away.
• SDE (Spatial Database Engine) does have advanced
topological error checking even though it uses something
equivalent to a shapefile (and quite different in storage
details). SDE can find all shared segments and correct
them.
Logical consistency and objectoriented database
• You can embed any rules (as much as you can
come up with) in the database
• Spatial consistency by defining topological rules
• Temporal consistency by defining topological
rules
• Thematic consistency by defining domain values
• It can also embed the relationships with other
entities, in addition to internal consistency
• Quality of database can be ensured better as it
checks the possible logical inconsistency
internally based on the rules
Logical consistency test: general
procedures
• Divide database into space, time, and attribute
components (whenever necessary)
• Choose samples in space, time, and attribute
• Come up with logical rules to be followed for
each component
• Check if the encoding in the database conforms
to the rules
• Report on % violation of rules out of total cases
tested for each component
Documenting logical consistency in
the metadata
• State what elements are tested
• State the number of features that were checked
• State the number of inconsistencies
encountered
– Describe the methods used for detecting errors
• State a detailed specification of the nature of the
problem and possible solutions
– Mention whether it is corrected or not
• If possible, provide a graphical display of the
error condition