Introduction to the Geospatial Data Content Area Steve Morris Head of Digital Library Initiatives North Carolina State University Libraries Preservation Issues Related to Digital Geospatial.
Download ReportTranscript Introduction to the Geospatial Data Content Area Steve Morris Head of Digital Library Initiatives North Carolina State University Libraries Preservation Issues Related to Digital Geospatial.
Introduction to the Geospatial Data Content Area Steve Morris Head of Digital Library Initiatives North Carolina State University Libraries Preservation Issues Related to Digital Geospatial Data Apr. 21, 2008 Outline Digital geospatial data: types and formats Standards (metadata, interoperability) Mass market geospatial industry directions Not covered: Types of spatial analysis Developing GIS services Discussion of available data resources Data reference interviews and data selection criteria Specific approaches to data preservation Note: Percentages based on the actual number of respondents to each question 2 What is a GIS? “A geographic information system is a system used to capture, store, manipulate, analyze, and display all types of spatially referenced geographic information about what is where on the earth’s surface and how they relate to each other” (Fischer and Nijkamp, 1992). Note: Percentages based on the actual number of respondents to each question 3 Local Applications Where GIS Is Used 100% 80% 60% 40% 20% 0% Economic Development GIS/Mapping Emergency Management Planning/Community Development Police/Public Safety Public Works Utilities Water/Waste Water Note: Percentages based on the actual number of respondents to each question Source: NC OneMap Data Inventory 2004 4 State and Local Government Geospatial Data Problem Scope: The North Carolina Example • 98 of 100 North Carolina Counties have GIS systems as do many municipalities • Over 30 state agency data producers • Exceptional value – Detailed, current, accurate • Exceptional risk – Inconsistent or nonexistent archiving practices – Complicated formats and complex objects Source: NC OneMap 5 Carrboro, NC : Population 17,797 (2005 est.) 22 downloadable GIS data layers 3 OGC WMS services (web services) 10 web mapping applications 9 downloadable PDF map layers Key Geospatial Data Types Vector data Raster data Tabular data http://www.lib.ncsu.edu/gis/data.html Note: Percentages based on the actual number of respondents to each question 7 Geospatial data types: Vector data Note: Percentages based on the actual number of respondents to each question 8 Vector Data Vector Representation Real World Pasture Corn Stream House Note: Percentages based on the actual number of respondents to each question 9 Vector Linkage to Tabular Data ID 2 1 Area Perimiter Landuse 1 Corn 2 Pasture 3 Pasture Products approximate hand drawn maps 3 Better description of individual objects Topology allows more spatial analyses: networks, adjacency Note: Percentages based on the actual number of respondents to each question 10 Individual “data layers” are overlayed on top of one another to create customized maps. Note: Percentages based on the actual number of respondents to each question 11 Time series – vector data Parcel Boundary Changes 2001-2004, North Raleigh, NC Note: Percentages based on the actual number of respondents to each question 12 NC OneMap Initial Data Layers Produced by Cities and Counties 80% 60% 40% 20% 0% Ortho County Bnd. Land Use Hospitals Landfills Building Footprints Cadastral ETJs Airports Storm Surge Watersheds Future Land Use Roads Surface Waters Schools Police Stations Wetlands Water Lines Note: Percentages based on the actual number of respondents to each question Municipal Bnd. Elevation Universities Fire Stations Hazardous Disposal Sites Sewer Lines Source: NC OneMap Data Inventory 2004 13 County Street Centerline Specifics Street Centerline Attributes Source of Street Centerlines 100% 90% 80% 60% 70% 50% 60% 40% 50% 30% 40% 20% 30% 10% 20% 10% 0% Photo Interpretation Commercial Product Cadastral Not Sure 0% Digitize from Maps GPS Other Road Names Road Type Road Direction Address Ranges Not Sure Note: Percentages based on the actual number of respondents to each question Route Numbers Road Description Street Addresses ZIP Codes Source: NC OneMap Data Inventory 2004 14 County Cadastral Specifics Methods Used to Create Data Cadastral Attributes 100% 80% 80% 70% 60% 60% 50% 40% 40% 20% 30% 20% 10% 0% 0% COGO Digitize from Maps Other Photo Interpretation GPS Not Sure Owner Owner Address Ownership Type Land Use Acreage # of Structures Assessed Value Note: Percentages based on the actual number of respondents to each question Parcel Address Construction Date Zoning Deed/Book/Page Land Value Building Value None 15 Some Common Vector GIS Formats ArcInfo Coverages (ESRI) ESRI Export file (.e00) Shapefiles (ESRI) MapInfo MID/MIF TIGER files Spatial Data Transfer Standard (SDTS) Digital Line Graphs (DLG) Many more … Note: Percentages based on the actual number of respondents to each question 16 Some Common Vector GIS Formats ArcInfo Coverages (ESRI) ESRI Export file (.e00) Shapefiles (ESRI) MapInfo MID/MIF TIGER files Spatial Data Transfer Standard (SDTS) Digital Line Graphs (DLG) Many more … Note: Percentages based on the actual number of respondents to each question 17 Vector Data Standards Issues No widely-adopted, open standard for geospatial vector data SDTS intended as an open exchange standard but is difficult to implement and not widely supported Geography Markup Language (GML) is not a format – a language to define industry specific application schemas adhering to specific profiles Shapefile is widely supported and openly documented (though proprietary) Functions as de facto lingua franca of vector data Lacks some functionality (topology, annotation, ..) Vector data conversions are complex, lossy Note: Percentages based on the actual number of respondents to each question 18 Geospatial data types: Raster data Downtown Pittsboro, NC 10 meter SPOT imagery Note: Percentages based on the actual number of respondents to each question 19 Geospatial data types: Raster data Downtown Pittsboro, NC 1 meter DOQQ Note: Percentages based on the actual number of respondents to each question 20 Geospatial data types: Raster data Downtown Pittsboro, NC 2 foot county orthophoto Note: Percentages based on the actual number of respondents to each question 21 Geospatial data types: Raster data Downtown Pittsboro, NC 6 inch county orthophoto Note: Percentages based on the actual number of respondents to each question 22 Raster Data Raster Representation Real World Column Pasture Corn House Stream Row Origin Note: Percentages based on the actual number of respondents to each question Cell Size 23 Raster Linkage to Attribute Data Value Count Landuse 1 15 Corn 1 1 2 3 3 3 1 1 2 2 3 3 2 7 Stream 1 1 1 2 2 3 3 13 Pasture 1 1 1 1 2 2 4 1 Housing 1 1 1 1 3 3 4 3 3 3 3 3 Advantage: frequent data reacquisition Simple data structure of grid cells All types of features share one data structure Simple to analyze several layers at once Note: Percentages based on the actual number of respondents to each question 24 Geospatial Data Types – Raster to Vector 1.) GIS-ready Image Data 2.) Feature Extraction 3.) GIS Layers Semi-automated Feature Extraction: Uses Spatial Context, Image Texture, Multiple layers of data, Existing GIS layers County Orthorectified Aerial Photography Source: NCCGIA Impervious Surfaces, Landcover Tree Type, Urban Green Space, etc 25 Geospatial data types: Raster data Scanned Paper Maps (Digital Raster Graphic -Classification DRG) Landcover Aerial Photos (Digital Orthophoto Quarter Quadrangle – DOQQ) Map Note: Percentages based on the actual number of respondents to each question Satellite Imagery - IKONOS 26 Time series – Ortho imagery Vicinity of Raleigh-Durham International Airport 1993-2002 Note: Percentages based on the actual number of respondents to each question 27 County Digital Orthophotography Specifics Imagery Type 1.5% Maintenance Frequency 1.5% 3.0% 21.0% 2.0% 3.0% 10.6% Color-Infrared True Color Black & White Not Sure 1.5% 1.7% 1.6% 10.6% 76.0% Data Meet LRMP Specifications? 7.6% 16.9% 3.1% 36.4% Yes No Not Sure Daily Every 4 Yrs. Not Sure Annually Every 5 Yrs. Not Maintained Every 2 Yrs. As Funds Allow Every 3 Yrs. Other 80.0% Note: Percentages based on the actual number of respondents to each question Source: NC OneMap Data Inventory 2004 28 Image re-processing Example of the 1993 Digital Orthophoto Quarter Quadrangles Und.Systems State Plane (f) BMP NCDOT TIFF State Plane (m) Clipped NCDOT JPEG State Plane (m) Clipped USGS JPEG Unclipped UTM USGS Unclipped BIP UTM NCDOT JPEG Thumbnail Clipped NCSU Libraries MrSID UTM Unclipped NCSU Libraries MrSID UTM County Mosaic Reprojecting Image Conversion Retiling (clipping, mosaics) Resampling NCDOT MrSID State Plane (m) County Mosaics Note: Percentages based on the actual number of respondents to each question 29 Increasing Commercial Options for High Resolution Satellite Imagery Project Status Note: Percentages based on the actual number of respondents to each question 30 Some Common Raster GIS Formats TIFF/GeoTIFF BIP/BIL/BSQ JPEG JPEG 2000 MrSID ESRI Grid Many more … A couple key acronyms: DOQQ – Digital Orthophoto Quarter Quadrangle Nationwide orthophoto series, typically at one meter resolution DRG – Digital Raster Graphic Scanned image of a U.S. Geological Survey (USGS) standard series topographic map, including all map collar information. Note: Percentages based on the actual number of respondents to each question 31 Geospatial data types: Tabular data (w/vector) Note: Percentages based on the actual number of respondents to each question 32 Geospatial data types: Spatial database Geodatabase Availability in NC Local Govt. Agencies Local agencies, especially municipalities, are increasingly turning to the ESRI Geodatabase format to manage geospatial data. According to the 2003 Local Government GIS Data Inventory, 10.0% of all county framework data and 32.7% of all municipal framework data were managed in that format. Cities: Street Centerline Formats Counties: Street Centerline Formats Geodatabase Geodatabase Shapefile Shapefile Coverage Coverage Other Other Note: Percentages based on the actual number of respondents to each question 33 Inside the Geodatabase Feature Datasets contain feature classes (vector data) Topology rules to ensure data integrity Geometric Network rules to manage connectivity Tabular Data Attributes of spatial data Relationship Class links geographic features to tabular data Metadata XML format, for each dataset Survey Data Coordinate System, measurements, etc. Note: Percentages based on the actual number of respondents to each question Raster Datasets Slide from Amanda Henley, UNC-CH 34 Geospatial data types: Cartographic Counterpart to the map is not just the dataset but also models, symbolization, classification, annotation, etc. Note: Percentages based on the actual number of respondents to each question 35 Geospatial data types: Cartographic GIS Software Software project file (.mxd, .apr, …) Data layer file (.avl, .lyr, …) PDF map exports Web Services-based representations Note: Percentages based on the actual number of respondents to each question 36 Other Data Issues That I Don’t Have Time to Go Into Coordinate Systems and Projections The world is not flat but maps are – there are various ways to describe the earth’s surface as a two dimensional place Vertical and Horizontal Datums Establishing starting points for describing the earth’s surface Tiling Schemes Method of data organization (e.g., county, state, tax map grid, river basin, hydrologic unit) Rights Issues Public domain vs. commercial Varied interpretations of public records law Ambiguous rights with web services GeoDRM Note: Percentages based on the actual number of respondents to each question 37 Versioning and Updating Orthophotos County digital orthophotos reflown every 2-7 years Statewide digital orthophoto plan: every 5 years (alternating B&W and color infrared) Vector Data State agency vector data: some static, some periodically updated, relatively fewer continuously updated County/City/COG vector data: many data layers continuously or periodically updated Old versions supplanted, exist on relatively inaccessible backups Note: Percentages based on the actual number of respondents to each question 38 Geospatial Metadata Standards Federal Geographic Data Committee (FGDC) Content Standard for Digital Geospatial Data (CSDGM) Version 1: 1994, Version 2: 1998 Mandated for use by federal agencies from 1995 Widespread state govt. use, spotty local agency use Widespread tool availability from late 1990’s 334 Elements: Descriptive, technical, administrative Next generation standard ISO 19115 Geographic information - Metadata ISO 19139 XML schema implementation North American Profile of ISO 19115 as implemented under 19139 near finalization Industry and vendor profiles (ESRI, NBII, …) Note: Percentages based on the actual number of respondents to each question 39 Data/Metadata Workflow Data Orthophoto work contracted out to commercial firms Some vector data contracted out (notably parcels) Most other vector data produced in-house Early, middle, late, and late-late stage products Metadata Metadata published by producer, with NC Metadata Outreach Program support Metadata published to NC NSDI clearinghouse, Geospatial One-Stop, and NC OneMap Note: Percentages based on the actual number of respondents to each question 40 NC Local Government Metadata Availability 26.8% 59.7% 13.8% FGDC-Compliant Internally Defined Data Dictionary None/Not Sure 8.7% Note: Percentages based on the actual number of respondents to each question 41 Metadata Availability Note: Percentages based on the actual number of respondents to each question 42 Preservation Metadata Issues FGDC Metadata Many flavors, incoming metadata needs processing Cross-walk elements to PREMIS, MODS? Metadata wrapper/Content packaging METS (Metadata Encoding and Transmission Standard) vs. other industry solutions Need a geospatial industry solution for the ‘METSlike problem’ GeoDRM a likely trigger—wrapper to enforce licensing (MPEG 21 references in OGIS Web Services 3) Note: Percentages based on the actual number of respondents to each question 43 Geospatial Data Discovery National Spatial Data Infrastructure (NSDI) Clearinghouse development from 1995 From mid-1990’s metasearch centered approach, using ‘geo’ profile of Z39.50 Early-mid 2000’s shift to harvest-based catalog approach, development of Geospatial One-Stop (GOS) Harvest protocols supported: Z39.50 (modified profile), OAIPMH, Web Accessible Folder (WAF) Direct search/browse at producer or state clearinghouse sites still prominent Integration with Google Earth, etc. Metadata problems Absent or incomplete, asynchronous with the data Inconsistently structured (no encoding standard, until 19139) Note: Percentages based on the actual number of respondents to each question 44 Data Sources International Low Low (1:500,00) Low High High (1:24,000) Coverage Area Accuracy Scale High Federal State Local Note: Percentages based on the actual number of respondents to each question 45 Choosing the Right Data What do you want to do with the data? mapping, search, analysis, geocode What specific geographic features will you need? major highways vs. detailed streets What is the geographic extent of your area of interest? local, regional, state, national, international What attributes of those features will you need? unique IDs, names, address ranges Note: Percentages based on the actual number of respondents to each question 46 Additional Factors in Choosing Data 1. Source - Fed, state, local, international, other 2. Age - 1-2 years old vs. 3-7 vs. 8 or more 3. Data accuracy and scale - positional and Attribute 4. File size - How much free space do you have? 5. Metadata availability 6. File/Image Format Free, Fast, and Accurate… 7. Projection and Datum Pick Two 8. Use Restrictions 9. How Soon? Note: Percentages based on the actual number of respondents to each question 47 Geospatial Web Services Image services Deliver image resulting from query against underlying data Limited opportunity for analysis OGC Web Map Service Specification (WMS), from 2000 – widely deployed Feature services Stream actual feature data, greater opportunity for data analysis OGC Web Feature Service Specification (WMS), from 2002 – not as widely deployed Other OGC Web Coverage Services (raster) Geocoding services ArcXML, etc. – commercial web service specs Note: Percentages based on the actual number of respondents to each question 48 NC OneMap Cascading WMS Services Note: Percentages based on the actual number of respondents to each question 49 NC OneMap State Govt. Vector Data Note: Percentages based on the actual number of respondents to each question 50 NC OneMap State Govt. Ortho Images Note: Percentages based on the actual number of respondents to each question 51 NC OneMap County and City Data Note: Percentages based on the actual number of respondents to each question 52 NC OneMap Multi-County requests Note: Percentages based on the actual number of respondents to each question 53 Concordance of layer naming, attribute naming, classification, and symbolization come from community development of best practices -- not from the WMS spec itself County Boundary NC OneMap Multi-County requests Note: Percentages based on the actual number of respondents to each question 54 WMS Services accessed through desktop GIS (ArcGIS) Note: Percentages based on the actual number of respondents to each question 55 Services Metadata: WMS Capabilities File Note: Percentages based on the actual number of respondents to each question 56 New Mapping Environments Online Mapping API’s or Environments Google Maps Yahoo Maps MSN Virtual Earth OpenLayers More … Desktop Client Systems Google Earth - KML NASA WorldWind More … Also a multitude of systems that build on other systems Note: Percentages based on the actual number of respondents to each question 57 Changes in the Domain: Mashups, Google Earth, Map APIs, and More • Huge new audience for geospatial content/services • Massive crossover of mainstream IT to geospatial, spurring open source activities • Rapid development of lightweight interoperability specifications • “Good enough” approaches to data (formats, quality, standards) 58 Lightweight Spec Example: GeoRSS Encode locations in RSS feeds Describe information in an interoperable manner so that applications can request, aggregate, share and map geographically tagged feeds. GeoRSS Flavors GeoRSS Simple GeoRSS GML (GML Application Profile) W3C Micro Varied industry adoption Note: Percentages based on the actual number of respondents to each question 59 Changes in the Domain: New Information Ecosystem of Static, Tiled Map Data • Web mashup/AJAX interactions with existing systems spur creation of intermediate content layers: e.g., tiling and caching of web map services • Ongoing development of a tiling services spec creates a new preservation opportunity 60 Changes in the Domain: More Place-based (versus spatial) Data Oblique Imagery • Mobile, LBS, and, social networking applications drive demand for placebased data • Long-term cultural heritage value in nonoverhead imagery: more descriptive of place and function Tax Dept. Photos Street View Images DOT Videologs 61 Relevant Organizations International Open Geospatial Consortium (OGC) – “a non-profit, international, voluntary consensus standards organization that is leading the development of standards for geospatial and location based services.” Coordinates with ISO. National Federal Geographic Data Committee (FGDC) - Coordinates the development of the National Spatial Data Infrastructure (NSDI). Participates in OGC, applies (profiles) OGC specs to U.S. environment. Open Source Open Geospatial Foundation (OSGEO) – New, not a standards organization (focus on open software) but acts as a coordinator and incubator for grassroots interoperability efforts. Note: Percentages based on the actual number of respondents to each question 62 Preservation Points of Engagement with the Open Geospatial Consortium (OGC) GML for archiving GeoDRM -- Adding preservation use cases Content Packaging -- Industry solution? Decision Support Systems – supporting past views of data Content Transfer Persistent Identifiers OGC Data Preservation Working Group formed in Dec. 2006 Note: Percentages based on the actual number of respondents to each question 63 Spatial Metaphor for Repository Search Beginning to see map-based interfaces using map APIs (Google Maps, Yahoo Maps) on top of repository software such as Dspace Gazetteer protocol work (UCSB, etc.) going back several years Text mining for place names (Metacarta, EDINA) Many other applications Note: Percentages based on the actual number of respondents to each question 64 Note: Percentages based on the actual number of respondents to each question 65 Note: Percentages based on the actual number of respondents to each question 66 Note: Percentages based on the actual number of respondents to each question 67 Questions? Contact: Steve Morris Head of Digital Library Initiatives NCSU Libraries [email protected] Phone: (919) 515-1361 http://www.lib.ncsu.edu/ncgdap Note: Percentages based on the actual number of respondents to each question 68 Note: Percentages based on the actual number of respondents to each question 69