Introduction to the Geospatial Data Content Area Steve Morris Head of Digital Library Initiatives North Carolina State University Libraries Preservation Issues Related to Digital Geospatial.

Download Report

Transcript Introduction to the Geospatial Data Content Area Steve Morris Head of Digital Library Initiatives North Carolina State University Libraries Preservation Issues Related to Digital Geospatial.

Introduction to the Geospatial Data
Content Area
Steve Morris
Head of Digital Library Initiatives
North Carolina State University Libraries
Preservation Issues Related to Digital Geospatial Data
Apr. 21, 2008
Outline
Digital geospatial data: types and formats
Standards (metadata, interoperability)
Mass market geospatial industry directions
Not covered:
Types of spatial analysis
Developing GIS services
Discussion of available data resources
Data reference interviews and data selection criteria
Specific approaches to data preservation
Note: Percentages based on the actual number of
respondents to each question
2
What is a GIS?
“A geographic information system is a system
used to capture, store, manipulate, analyze,
and display all types of spatially referenced
geographic information about what is where
on the earth’s surface and how they relate to
each other” (Fischer and Nijkamp, 1992).
Note: Percentages based on the actual number of
respondents to each question
3
Local Applications Where GIS Is Used
100%
80%
60%
40%
20%
0%
Economic Development
GIS/Mapping
Emergency Management
Planning/Community Development
Police/Public Safety
Public Works
Utilities
Water/Waste Water
Note: Percentages based on the actual number of
respondents to each question
Source: NC OneMap Data Inventory 2004
4
State and Local Government Geospatial Data
Problem Scope: The North Carolina Example
• 98 of 100 North Carolina Counties have
GIS systems as do many municipalities
• Over 30 state agency data producers
• Exceptional value
– Detailed, current,
accurate
• Exceptional risk
– Inconsistent or
nonexistent
archiving
practices
– Complicated
formats and
complex objects
Source: NC OneMap
5
Carrboro, NC : Population 17,797 (2005 est.)
22 downloadable GIS
data layers
3 OGC WMS services
(web services)
10 web mapping
applications
9 downloadable PDF map
layers
Key Geospatial Data Types
Vector data
Raster data
Tabular data
http://www.lib.ncsu.edu/gis/data.html
Note: Percentages based on the actual number of
respondents to each question
7
Geospatial data types: Vector data
Note: Percentages based on the actual number of
respondents to each question
8
Vector Data
Vector Representation
Real World
Pasture
Corn
Stream
House
Note: Percentages based on the actual number of
respondents to each question
9
Vector Linkage to Tabular Data
ID
2
1
Area
Perimiter
Landuse
1
Corn
2
Pasture
3
Pasture
Products approximate hand
drawn maps
3
Better description of
individual objects
Topology allows more
spatial analyses: networks,
adjacency
Note: Percentages based on the actual
number of
respondents to each question
10
Individual “data layers” are overlayed
on top of one another to create customized maps.
Note: Percentages based on the actual number of
respondents to each question
11
Time series – vector data
Parcel Boundary Changes 2001-2004, North Raleigh, NC
Note: Percentages based on the actual number of
respondents to each question
12
NC OneMap Initial Data Layers Produced
by Cities and Counties
80%
60%
40%
20%
0%
Ortho
County Bnd.
Land Use
Hospitals
Landfills
Building Footprints
Cadastral
ETJs
Airports
Storm Surge
Watersheds
Future Land Use
Roads
Surface Waters
Schools
Police Stations
Wetlands
Water Lines
Note: Percentages based on the actual number of
respondents to each question
Municipal Bnd.
Elevation
Universities
Fire Stations
Hazardous Disposal Sites
Sewer Lines
Source: NC OneMap Data Inventory 2004
13
County Street Centerline Specifics
Street Centerline Attributes
Source of Street Centerlines
100%
90%
80%
60%
70%
50%
60%
40%
50%
30%
40%
20%
30%
10%
20%
10%
0%
Photo Interpretation
Commercial Product
Cadastral
Not Sure
0%
Digitize from Maps
GPS
Other
Road Names
Road Type
Road Direction
Address Ranges
Not Sure
Note: Percentages based on the actual number of
respondents to each question
Route Numbers
Road Description
Street Addresses
ZIP Codes
Source: NC OneMap Data Inventory 2004
14
County Cadastral Specifics
Methods Used to Create Data
Cadastral Attributes
100%
80%
80%
70%
60%
60%
50%
40%
40%
20%
30%
20%
10%
0%
0%
COGO
Digitize from Maps
Other
Photo Interpretation
GPS
Not Sure
Owner
Owner Address
Ownership Type
Land Use
Acreage
# of Structures
Assessed Value
Note: Percentages based on the actual number of
respondents to each question
Parcel Address
Construction Date
Zoning
Deed/Book/Page
Land Value
Building Value
None
15
Some Common Vector GIS Formats
ArcInfo Coverages (ESRI)
ESRI Export file (.e00)
Shapefiles (ESRI)
MapInfo MID/MIF
TIGER files
Spatial Data Transfer Standard (SDTS)
Digital Line Graphs (DLG)
Many more …
Note: Percentages based on the actual number of
respondents to each question
16
Some Common Vector GIS Formats
ArcInfo Coverages (ESRI)
ESRI Export file (.e00)
Shapefiles (ESRI)
MapInfo MID/MIF
TIGER files
Spatial Data Transfer Standard (SDTS)
Digital Line Graphs (DLG)
Many more …
Note: Percentages based on the actual number of
respondents to each question
17
Vector Data Standards Issues
No widely-adopted, open standard for geospatial vector
data
SDTS intended as an open exchange standard but is difficult to
implement and not widely supported
Geography Markup Language (GML) is not a format – a
language to define industry specific application schemas
adhering to specific profiles
Shapefile is widely supported and openly documented
(though proprietary)
Functions as de facto lingua franca of vector data
Lacks some functionality (topology, annotation, ..)
Vector data conversions are complex, lossy
Note: Percentages based on the actual number of
respondents to each question
18
Geospatial data types: Raster data
Downtown Pittsboro, NC
10 meter SPOT imagery
Note: Percentages based on the actual number of
respondents to each question
19
Geospatial data types: Raster data
Downtown Pittsboro, NC
1 meter DOQQ
Note: Percentages based on the actual number of
respondents to each question
20
Geospatial data types: Raster data
Downtown Pittsboro, NC
2 foot county orthophoto
Note: Percentages based on the actual number of
respondents to each question
21
Geospatial data types: Raster data
Downtown Pittsboro, NC
6 inch county orthophoto
Note: Percentages based on the actual number of
respondents to each question
22
Raster Data
Raster Representation
Real World
Column
Pasture
Corn
House
Stream
Row
Origin
Note: Percentages based on the actual number of
respondents to each question
Cell Size
23
Raster Linkage to Attribute Data
Value
Count
Landuse
1
15
Corn
1
1
2
3
3
3
1
1
2
2
3
3
2
7
Stream
1
1
1
2
2
3
3
13
Pasture
1
1
1
1
2
2
4
1
Housing
1
1
1
1
3
3
4
3
3
3
3
3
Advantage: frequent
data reacquisition
Simple data structure of grid cells
All types of features share one
data structure
Simple to analyze several layers at
once
Note: Percentages based on the actual number of
respondents to each question
24
Geospatial Data Types – Raster to Vector
1.) GIS-ready Image Data
2.) Feature Extraction
3.) GIS Layers
Semi-automated Feature Extraction:
Uses Spatial Context, Image Texture,
Multiple layers of data, Existing GIS layers
County Orthorectified
Aerial Photography
Source: NCCGIA
Impervious Surfaces, Landcover
Tree Type, Urban Green Space, etc
25
Geospatial data types: Raster data
Scanned
Paper Maps
(Digital
Raster
Graphic
-Classification
DRG)
Landcover
Aerial
Photos (Digital
Orthophoto
Quarter
Quadrangle
– DOQQ)
Map
Note: Percentages based on the actual number of
respondents to each question
Satellite Imagery - IKONOS
26
Time series – Ortho imagery
Vicinity of Raleigh-Durham International Airport 1993-2002
Note: Percentages based on the actual number of
respondents to each question
27
County Digital Orthophotography
Specifics
Imagery Type
1.5%
Maintenance Frequency
1.5%
3.0%
21.0%
2.0%
3.0%
10.6%
Color-Infrared
True Color
Black & White
Not Sure
1.5%
1.7%
1.6%
10.6%
76.0%
Data Meet LRMP Specifications?
7.6%
16.9%
3.1%
36.4%
Yes
No
Not Sure
Daily
Every 4 Yrs.
Not Sure
Annually
Every 5 Yrs.
Not Maintained
Every 2 Yrs.
As Funds Allow
Every 3 Yrs.
Other
80.0%
Note: Percentages based on the actual number of
respondents to each question
Source: NC OneMap Data Inventory 2004
28
Image re-processing
Example of the 1993 Digital Orthophoto Quarter Quadrangles
Und.Systems
State Plane (f)
BMP
NCDOT TIFF
State Plane (m)
Clipped
NCDOT JPEG
State Plane (m)
Clipped
USGS JPEG
Unclipped UTM
USGS Unclipped
BIP UTM
NCDOT JPEG
Thumbnail
Clipped
NCSU Libraries
MrSID UTM
Unclipped
NCSU Libraries
MrSID UTM
County Mosaic
Reprojecting
Image Conversion
Retiling (clipping, mosaics)
Resampling
NCDOT MrSID
State Plane (m)
County Mosaics
Note: Percentages based on the actual number of
respondents to each question
29
Increasing Commercial Options for High Resolution
Satellite Imagery
Project Status
Note: Percentages based on the actual number of
respondents to each question
30
Some Common Raster GIS Formats
TIFF/GeoTIFF
BIP/BIL/BSQ
JPEG
JPEG 2000
MrSID
ESRI Grid
Many more …
A couple key acronyms:
DOQQ – Digital Orthophoto Quarter
Quadrangle
Nationwide orthophoto series, typically
at one meter resolution
DRG – Digital Raster Graphic
Scanned image of a U.S. Geological
Survey (USGS) standard series
topographic map, including all map
collar information.
Note: Percentages based on the actual number of
respondents to each question
31
Geospatial data types: Tabular data (w/vector)
Note: Percentages based on the actual number of
respondents to each question
32
Geospatial data types: Spatial database
Geodatabase Availability in NC Local Govt. Agencies
Local agencies, especially municipalities, are increasingly turning to
the ESRI Geodatabase format to manage geospatial data.
According to the 2003 Local Government GIS Data Inventory,
10.0% of all county framework data and 32.7% of all municipal
framework data were managed in that format.
Cities: Street Centerline Formats
Counties: Street Centerline Formats
Geodatabase
Geodatabase
Shapefile
Shapefile
Coverage
Coverage
Other
Other
Note: Percentages based on the actual number of
respondents to each question
33
Inside the Geodatabase
Feature Datasets
contain feature classes (vector data)
Topology
rules to ensure data integrity
Geometric Network
rules to manage connectivity
Tabular Data
Attributes of spatial data
Relationship Class
links geographic features to tabular data
Metadata
XML format, for each dataset
Survey Data
Coordinate System, measurements, etc.
Note: Percentages based on the actual number of
respondents to each question
Raster Datasets
Slide from Amanda
Henley, UNC-CH
34
Geospatial data types: Cartographic
Counterpart to the map is not
just the dataset but also models,
symbolization, classification,
annotation, etc.
Note: Percentages based on the actual number of
respondents to each question
35
Geospatial data types: Cartographic
GIS Software
Software project file (.mxd, .apr, …)
Data layer file (.avl, .lyr, …)
PDF map exports
Web Services-based representations
Note: Percentages based on the actual number of
respondents to each question
36
Other Data Issues That I Don’t Have
Time to Go Into
Coordinate Systems and Projections
The world is not flat but maps are – there are various ways
to describe the earth’s surface as a two dimensional place
Vertical and Horizontal Datums
Establishing starting points for describing the earth’s surface
Tiling Schemes
Method of data organization (e.g., county, state, tax map
grid, river basin, hydrologic unit)
Rights Issues
Public domain vs. commercial
Varied interpretations of public records law
Ambiguous rights with web services
GeoDRM
Note: Percentages based on the actual number of
respondents to each question
37
Versioning and Updating
Orthophotos
County digital orthophotos reflown every 2-7 years
Statewide digital orthophoto plan: every 5 years
(alternating B&W and color infrared)
Vector Data
State agency vector data: some static, some
periodically updated, relatively fewer continuously
updated
County/City/COG vector data: many data layers
continuously or periodically updated
Old versions supplanted, exist on relatively
inaccessible backups
Note: Percentages based on the actual number of
respondents to each question
38
Geospatial Metadata Standards
Federal Geographic Data Committee (FGDC) Content
Standard for Digital Geospatial Data (CSDGM)
Version 1: 1994, Version 2: 1998
Mandated for use by federal agencies from 1995
Widespread state govt. use, spotty local agency use
Widespread tool availability from late 1990’s
334 Elements: Descriptive, technical, administrative
Next generation standard
ISO 19115 Geographic information - Metadata
ISO 19139 XML schema implementation
North American Profile of ISO 19115 as implemented under
19139 near finalization
Industry and vendor profiles (ESRI, NBII, …)
Note: Percentages based on the actual number of
respondents to each question
39
Data/Metadata Workflow
Data
Orthophoto work contracted out to commercial
firms
Some vector data contracted out (notably parcels)
Most other vector data produced in-house
Early, middle, late, and late-late stage products
Metadata
Metadata published by producer, with NC
Metadata Outreach Program support
Metadata published to NC NSDI clearinghouse,
Geospatial One-Stop, and NC OneMap
Note: Percentages based on the actual number of
respondents to each question
40
NC Local Government Metadata Availability
26.8%
59.7%
13.8%
FGDC-Compliant
Internally Defined
Data Dictionary
None/Not Sure
8.7%
Note: Percentages based on the actual number of
respondents to each question
41
Metadata Availability
Note: Percentages based on the actual number of
respondents to each question
42
Preservation Metadata Issues
FGDC Metadata
Many flavors, incoming metadata needs processing
Cross-walk elements to PREMIS, MODS?
Metadata wrapper/Content packaging
METS (Metadata Encoding and Transmission
Standard) vs. other industry solutions
Need a geospatial industry solution for the ‘METSlike problem’
GeoDRM a likely trigger—wrapper to enforce
licensing (MPEG 21 references in OGIS Web
Services 3)
Note: Percentages based on the actual number of
respondents to each question
43
Geospatial Data Discovery
National Spatial Data Infrastructure (NSDI)
Clearinghouse development from 1995
From mid-1990’s metasearch centered approach, using ‘geo’
profile of Z39.50
Early-mid 2000’s shift to harvest-based catalog approach,
development of Geospatial One-Stop (GOS)
Harvest protocols supported: Z39.50 (modified profile), OAIPMH, Web Accessible Folder (WAF)
Direct search/browse at producer or state clearinghouse sites
still prominent
Integration with Google Earth, etc.
Metadata problems
Absent or incomplete, asynchronous with the data
Inconsistently structured (no encoding standard, until 19139)
Note: Percentages based on the actual number of
respondents to each question
44
Data Sources
International
Low
Low
(1:500,00)
Low
High
High
(1:24,000)
Coverage
Area
Accuracy
Scale
High
Federal
State
Local
Note: Percentages based on the actual number of
respondents to each question
45
Choosing the Right Data
What do you want to do with the data?
mapping, search, analysis, geocode
What specific geographic features will you need?
major highways vs. detailed streets
What is the geographic extent of your
area of interest?
local, regional, state, national, international
What attributes of those features will you need?
unique IDs, names, address ranges
Note: Percentages based on the actual number of
respondents to each question
46
Additional Factors in Choosing Data
1. Source - Fed, state, local, international,
other
2. Age - 1-2 years old vs. 3-7 vs. 8 or more
3. Data accuracy and scale - positional and
Attribute
4. File size - How much free space do you
have?
5. Metadata availability
6. File/Image Format
Free, Fast, and Accurate…
7. Projection and Datum
Pick Two
8. Use Restrictions
9. How Soon?
Note: Percentages based on the actual number of
respondents to each question
47
Geospatial Web Services
Image services
Deliver image resulting from query against underlying data
Limited opportunity for analysis
OGC Web Map Service Specification (WMS), from 2000 –
widely deployed
Feature services
Stream actual feature data, greater opportunity for data
analysis
OGC Web Feature Service Specification (WMS), from 2002 –
not as widely deployed
Other
OGC Web Coverage Services (raster)
Geocoding services
ArcXML, etc. – commercial web service specs
Note: Percentages based on the actual number of
respondents to each question
48
NC OneMap
Cascading WMS Services
Note: Percentages based on the actual number of
respondents to each question
49
NC OneMap
State Govt. Vector Data
Note: Percentages based on the actual number of
respondents to each question
50
NC OneMap
State Govt. Ortho Images
Note: Percentages based on the actual number of
respondents to each question
51
NC OneMap
County and City Data
Note: Percentages based on the actual number of
respondents to each question
52
NC OneMap
Multi-County requests
Note: Percentages based on the actual number of
respondents to each question
53
Concordance of layer naming, attribute naming,
classification, and symbolization come from
community development of best practices -- not
from the WMS spec itself
County Boundary
NC OneMap
Multi-County requests
Note: Percentages based on the actual number of
respondents to each question
54
WMS Services accessed
through desktop GIS
(ArcGIS)
Note: Percentages based on the actual number of
respondents to each question
55
Services Metadata:
WMS Capabilities File
Note: Percentages based on the actual number of
respondents to each question
56
New Mapping Environments
Online Mapping API’s or Environments
Google Maps
Yahoo Maps
MSN Virtual Earth
OpenLayers
More …
Desktop Client Systems
Google Earth - KML
NASA WorldWind
More …
Also a multitude of systems that build on other systems
Note: Percentages based on the actual number of
respondents to each question
57
Changes in the Domain:
Mashups, Google Earth,
Map APIs, and More
• Huge new audience for
geospatial content/services
• Massive crossover of
mainstream IT to geospatial,
spurring open source activities
• Rapid development of
lightweight interoperability
specifications
• “Good enough” approaches to
data (formats, quality,
standards)
58
Lightweight Spec Example: GeoRSS
Encode locations in RSS feeds
Describe information in an interoperable manner so
that applications can request, aggregate, share
and map geographically tagged feeds.
GeoRSS Flavors
GeoRSS Simple
GeoRSS GML (GML Application Profile)
W3C
Micro
Varied industry adoption
Note: Percentages based on the actual number of
respondents to each question
59
Changes in the Domain:
New Information Ecosystem of Static, Tiled Map Data
• Web mashup/AJAX interactions
with existing systems spur
creation of intermediate content
layers: e.g., tiling and caching of
web map services
• Ongoing development of a tiling
services spec creates a new
preservation opportunity
60
Changes in the Domain:
More Place-based
(versus spatial) Data
Oblique Imagery
• Mobile, LBS, and, social
networking applications
drive demand for placebased data
• Long-term cultural
heritage value in nonoverhead imagery: more
descriptive of place and
function
Tax Dept. Photos
Street View Images
DOT Videologs
61
Relevant Organizations
International
Open Geospatial Consortium (OGC) – “a non-profit,
international, voluntary consensus standards organization that
is leading the development of standards for geospatial and
location based services.” Coordinates with ISO.
National
Federal Geographic Data Committee (FGDC) - Coordinates
the development of the National Spatial Data Infrastructure
(NSDI). Participates in OGC, applies (profiles) OGC specs to
U.S. environment.
Open Source
Open Geospatial Foundation (OSGEO) – New, not a
standards organization (focus on open software) but acts as a
coordinator and incubator for grassroots interoperability efforts.
Note: Percentages based on the actual number of
respondents to each question
62
Preservation Points of Engagement with
the Open Geospatial Consortium (OGC)
GML for archiving
GeoDRM -- Adding preservation use cases
Content Packaging -- Industry solution?
Decision Support Systems – supporting past views of
data
Content Transfer
Persistent Identifiers
OGC Data Preservation Working Group
formed in Dec. 2006
Note: Percentages based on the actual number of
respondents to each question
63
Spatial Metaphor for Repository Search
Beginning to see map-based interfaces using map APIs
(Google Maps, Yahoo Maps) on top of repository
software such as Dspace
Gazetteer protocol work (UCSB, etc.) going back
several years
Text mining for place names (Metacarta, EDINA)
Many other applications
Note: Percentages based on the actual number of
respondents to each question
64
Note: Percentages based on the actual number of
respondents to each question
65
Note: Percentages based on the actual number of
respondents to each question
66
Note: Percentages based on the actual number of
respondents to each question
67
Questions?
Contact:
Steve Morris
Head of Digital Library Initiatives
NCSU Libraries
[email protected]
Phone: (919) 515-1361
http://www.lib.ncsu.edu/ncgdap
Note: Percentages based on the actual number of
respondents to each question
68
Note: Percentages based on the actual number of
respondents to each question
69