Transcript Document

OFFICIAL AND CROWDSOURCED
GEOSPATIAL DATA
INTEGRATION
Searching solutions to improve the processes in
cartography updating
By Jimena Martínez
Supervisors: Antonio Vázquez and Marianne de Vries
2
Table of contents
Background
Problems
The idea
The steps to develop the idea, and an
example to show it
3
Background
BCN200
BTN25
BTA5
MGCP
National (Spain)
National (Spain)
Local (Spanish
provinces)
International
(Africa, Middle
East)
Cartography
units
Provinces
Sheets
Sheets
Cells (208 Spain: 6
countries)
Scale
1/200.000
1/25.000
1/5.000
1/50.000
Updating cycle
2 years
4 years
4 years
4 years
Budget
300.000 €
3.500.000 €
800.000 €
Spanish budget:
27.000.000 €
Scope
4
Problems
1. Why official cartography is never enough updated?
Update
process
Satellite/ aerial
images collecting date
Feb. 2011
Dec. 2010
1st real change
Release date
(2011 version)
Dec. 2011
May 2011
2nd real change
Off. Data reflects
1st change
5
Problems
2. Why updating process is such long and expensive?
Traditional updating process
Vector cartography from last year.
Set of data sources against which
compare the cartography (images,
maps, raster, vector)
Reviewing the whole cartography
unit.
Too much time to review, not
much time to edit features.
6
Problems
2. Why updating process is such long and expensive?
Madrid case (1/200k)
Time to update: 4 weeks 1 person
Features edited percentage: 30%
Time to edit this features: 1,5
weeks
Would be possible to save the
other 3,5 weeks?
7
Problems
1. Why official cartography
is never enough updated?
2. Why updating process is such
long and expensive?
As a result:
Traditional process based
on different data sources.
Reviewing the whole cartography
against different data sources is
needed…
Long process
Data sources have different
dates (collecting dates)
to detect changes.
Expensive process
Not always useful result
(if highly updated
cartography is needed)
8
The idea
To develop a general
methodology to decide
whether crowdsourced
(OpenStreetMap) and
official geodata could be
integrated or not in order
to use OSM to improve
the official cartography
updating process.
A system that finds
where the official
dataset need to be
updated, and which
type of update needs
each feature, without
reviewing the whole
cartography unit.
Saving costs
and obtaining
better updated
cartography
9
The idea
Data sources in the updating process
Official data
Better updated
features (not always)
Official data
Vector format
Not complete
OSM data
Not homogeneous
OSM to indicate
where to update
10
The idea
Differences in updating processes
NMAs official data
Months/
Years
Crowdsourced data (OSM)
Government (NMA)
Hours/days
MAP v.1.
Updating &
production
processes
Tenders
(companies)
Updating processes
MAP v.2
Users/NMAs/companies
MAP v.1…v.n
11
The idea
Differences in updating processes
Update
process
OSM update
Jan. 2011
Dec. 2010
1st real change
Satellite/ aerial
images collecting date
Feb. 2011
OSM update
June 2011
Release date
(2011 version)
Dec. 2011
May 2011
OSM reflects
1st change
2nd real change
OSM reflects
2nd change
Off. Data reflects
1st change
12
The idea
Differences
Which dataset is “better”?
Official data
OSM data
Which one is
better?
Some studies (Haklay 2008,
Zielstra&Zipf, 2010) take this data
set as the “truth” against which
to compare OSM
Official data
As a result OSM
is not 100%
complete
But, what
happens with
that?
The desired
result will be:
OSM data
Types of
updates
13
The idea
“Given enough
eyeballs, all bugs are
shallow”
Questions to answer
WHY OSM?
Amount of data.
Accuracy data (Linus Law).
Comparative studies
Updated data.
WHAT features from OSM?
OSM not as features to take, but as indicators
to use.
If not useful, not used: types of
updates.
AIM 3
HOW to integrate OSM and official data?
Matching data models in a reference semantic
supra model (INSPIRE?- ontologies?)
Quality indicators (traditional and
Crowd quality parameters)
AIM 1
AIM 2
The idea: the proposed system
WEB
Update OSM
Official data set
OSM data set
Reference
semantic model
(INSPIRE?)
INPUT specifications
Feature class 1
Feature class 2
Specifications
Feature class 1 (50)
Feature class 2 (80)
Candidates
...
...
Feature class n
Feature class n (N)
Matching process (feature classes filter)
Updating process
“Updating gaps”
Types of updates
VGI teams/
Online updating
Feature class 1
Feature class 2
Feature class 1 (50)
Feature class 2 (80)
...
...
Feature class n
Feature class n (N)
Feature class 1 (30)
QC and QA (features filter)
...
Feature class n (N-M)
ISO 19157
14
Crowd
Quality
15
The steps to reach the goal
And an example to show them
1
• Making the matching between data models
and features.
2
• To study Quality parameters to decide which
features could be used.
3
• Proposing a new updating process based on
flags and types of updates.
16
1st step: making the matching
Comparing data models
NMA data model
OSM data model
Format
Database, shp
XML (.osm)
(Geometric) Primitives
Node
Arc
Face
Node
Way
Relations
Feature class
Table, file
Primary tag (key)
Feature (each object)
Row
Primary tag (value)
Attribute
Column
Tag (key)
Values (domains)
Cells
Tag (value)
Tag
17
1st step: making the matching
An approach (based on H. Uitemark)
A1
1. Official dataset
C1
Legend
D1
A
Real world
2. OpenStreetMap
C2
C
D
D2
A2
A1: building of interest
C1: motorway
D1: toll motorway
B
Candidates:
{[(A1,A2), (A1,B2)], [(C1,C2), (C1,D2)], [(D1,C2), (D1,D2)]}
B2
Legend
A2: building, church
B2: building, school
C2, D2: highway, motorway
18
1st step: making the matching
The example: motorways (BCN Spain-OSM)
19
2nd step: quality study
Studying the quality: traditional parameters
van Oort (2006)
Haklay (2008)
ISO 19157 (2011)
Completeness
Completeness
Completeness
Logical consistency
Logical consistency
Logical consistency
Positional accuracy
Positional accuracy
Positional accuracy
Attribute accuracy
Attribute accuracy
Thematic accuracy
Temporal quality
Temporal quality
Temporal quality
Semantic Accuracy
Semantic Accuracy
Usage, purpose and
constraints
Usage, purpose and
constraints
Usability element
Lineage
Lineage
Lineage (19115)
Variation in Quality
Meta-quality
Resolution (≈ scale)
20
2nd step: quality study
Studying the quality: (some) crowd quality parameters
Maué (2007). PGIS
Haklay (2008)
Reputation of contributors
Longevity of engagement
User quality
Number of editions on a
feature
• Local knowledege
• Experience
• Recognition
Information assymetry
Number of contributors
on a feature
Number of bugs fixed
van Exel (2010)
Feature related quality
• Lineage
• Possitional accuracy
• Semantic accuracy
Others
Lineage
Homogeneity in Quality
Time between editions
on a feature
21
2nd step: quality study
Higher quality
Lower quality
Some methods to measure traditional quality (pos. accuracy)
• Possitional accuracy
• Interpretation of epsilon band
Buffer width:
Until blue is totally
inside orange
• Possitional accuracy
• Complete data sets are needed
• A higher quality dataset is needed
Buffer width:
Until blue is 9095% inside orange
Haklay
(2008)
• Possitional accuracy (OS-OSM)
• Complete data sets are needed. He
completed OSM
• Suposed OS is higher quality than OSM
Buffer width:
Two buffers.
Compare de
overlap areas
BCNSpainOSM
• No complete data (and nobody is going
to complete). Neither BCN nor OSM
• Don´t know which data set is better
(OSM to update BCN)
Buffer width:
Could be
impossible to
achieve 90-95%
Perkal
(1966)
Goodchild
and
Hunter
(1997)
22
2nd step: quality study
Example: measures of positional accuracy on motorways
BCN Spain
OSM
23
2nd step: quality study
Example: measures of positional accuracy on motorways
% of BCN roadswithinthe OSM buffer
% length of BCN motorways within
the OSM buffer
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
1
2
3
4
5
6
7
8
9 10 15 20 25 30 50 100 200 500
Bufferwidth(m)
A 500 m buffer around OSM is needed to reach
80% of the BCN length within the buffer= lack of
completeness in OSM dataset
BCN Scale 1/200k (buffer must be ≈ 20m, which
means 73% of the length wihtin the OSM buffer)
24
2nd step: quality study
% of OSM roadswithinthe BCN buffer
Example: measures of positional accuracy on motorways
% length of OSM motorways within the
BCN buffer
100%
80%
60%
40%
20%
0%
1
2
3
4
5
6
7
8
9
10 15 20 25 30 50 100 200 500
Bufferwidth(m)
A 25m buffer around BCN is needed to reach 90% of
the OSM length within the buffer.
In this case the method works because every OSM
motorways are also in BCN dataset.
25
2nd step: quality study
Higher quality
Lower quality
Some methods to measure traditional quality (completeness)
OSL Musical
Chairs
Algorithm
BCNSpainOSM
• Based on boundary box on each feature
• 300 m radius to find candidates to
match
• Additionally, levenshtein distance
(streets)
• A higher quality data set is needed to
compare
• http://humanleg.org.uk/code/oslmusic
alchairs/
• Not useful for motorways or long
features.
• Useful for streets or polygons
• Convex hull could be used instead Bbox
If the Bbox
matches, then
the street
name is
compared
26
2nd step: quality study
Conclusions about traditional quality
Which parameter comes before? • Complete data set (not a measure of
completeness) is needed to measure positional
Completeness or Possitional
accuracy
accuracy
It is been proved that OSM is not • Congrats!
complete
• OSM not as features to take, but as indicators
to use
It brings me to the first statement
• It doesn´t matter if OSM is not complete
A new approach
• “Updating gaps”: which include the lack of
completeness of OSM
27
3rd step: purpose updating process
Traditional classification of updates
Add
Updates
Delete
Geometry
Modify
Attributes
28
3rd step: purpose updating process
Proposed classification of updates
YES
YES
Don´t need to be
updated
NO
Updating gap,
type I
Attribute
updating
Updating gap,
type II
Classification
updating
Doesn´t exist in
OSM
Updating gap,
type III
OSM can´t be
used, but
adviced.
Doesn´t exist in
official dataset
Updating gap,
type IV
Automatically
updating from
OSM?
ROAD_ATT (offic) =
ROAD_ATT (OSM)
ROAD_G (official)=
ROAD_G (OSM)
YES
Official data
OSM data
NO
ROAD_G (official)=
OTHER_G (OSM)
NO
29
The result
Madrid case (1/200k)
Time to update: 1,5 weeks, 1
person
Features edited percentage: 30%
Time saved: 3,5 weeks
Costs saved: 40%
30
Next steps
Find the best method to compare both data sets and
try it in different data sets (based on TQ and CQ)
Obtaining automatically different types of updating
gaps.
Look for a better way to compare data models (not
manually)
Try an automatic method to update the updating gaps
based on OSM.
31
Thank you!
Dank je wel!
Gracias!
[email protected]