KNB Overview for Information Managers

Download Report

Transcript KNB Overview for Information Managers

Geospatial Data and Spatial Data Analysis Tools
For Ecologists
University of California – Santa Barbara
www.nceas.ucsb.edu
Rick Reeves / March 17, 2005
Presentation Goals


Overview: Geospatial Data Analysis

Defining and distinguishing between spatial, geospatial, geographic data

Addressing the particular attributes of geospatial data
Inventory of Geospatial Data Types


Survey of Geoprocessing Software Tools


Key issues driving choice of geospatial processing software
A Tour of NCEAS Scientific Computing Web Site


Primary data types and common sources for data
Spatial Datasets, Tools, Tutorials, and Project Archives
Some Examples: Geospatial Data Analysis at NCEAS

From the Annals of the NCEAS Scientific Programmer: ‘Real World’
solutions to Ecological research challenges
Meet the Scientific Programmer

Rick’s Academic and Professional Background


Undergraduate: Environmental Remote Sensing
Graduate: Spatial Operations Research / Location-Allocation
Heuristic Development



Spatial Modeling branch of Geographic Data Analysis
Problem Domain: Transportation and Facility Location within
networks
Professional: Software Development, geospatial database
development, training curriculum development
Spatial Data: A Hierarchical Definition

Spatial Data

Observations are distributed in multidimensional space


Geospatial Data

Spatial Data with attached Geographic coordinates



Latitude / Longitude, UTM
Optional: data subjected to a map projection transformation
Geographic Data

Geospatial Data that captures ‘Earth System’ phenomena





X / Y / Z coordinates attached to each data element
Terrain height
Drainage Network
Land surface cover or urban Land Use
Meteorological / climate data forecasts
Ecologists may work with any or all during a project
Overview: Geospatial / Geographic Data

Two Broad Primary Categories

Raster: A multi dimensional, regularly-spaced grid of values (samples)



Vector: Three primary shapes stored in drawing-optimized format


Point, Line, Polygon, (TIN, vector field)
Thousands of datasets exist in hundreds of formats





Dimensions: Northing, Easting, Altitude, Time
Examples: Satellite Image, Digital Terrain, land surface cover maps
Remote Sensing Imagery / Digital Elevation Models
Surface Features (political, physiographic) as points/lines/polygons
Meteorological data (observed / forecasted (short-and long-term))
File format standards set by Industry, Government, user community
Data Ingestion: First Step in Geospatial Analysis

Data input / format conversion / spatial registration
Geospatial Data Analysis

Geospatial Information Analysis: 3 Categories



From O’Sullivan & Unwin (2003)
Spatial Data Manipulation: Investigate the relationships between
geographic dataset layers

Examples: ‘point-in-polygon’, buffer zones around spatial features

GIS software typically used to view/ manipulate / create layers
Spatial/Statistical Data Analysis: Descriptive and Explanatory:
What is there? How do we categorize it?


Data points treated as statistical ‘population’, compared to others
Spatial Modeling: Construct models to explore and understand
geospatial systems

Based on ‘abstraction’ of domain-specific problem into a systems
framework. Some examples:
 Predicting network flows; optimizing facility locations among demands
 Lessons learned building model as valuable as model’s ‘answers’
The Challenge of Geospatial Analysis

Geospatial Data violate some key statistical assumptions



Spatial Autocorrelation




Samples are NOT randomly selected from normally-distributed population
In fact, nearby samples more likely to be similar than distant ones
Autocorrelated data points introduce redundancy into the sample set
Spatial Scaling




Must be addressed in the experimental design and sampling scheme
Require specialized assessment techniques to factor out effects
AKA Modifiable Areal Unit Problem
Statistical relationships in an area may change at different aggregations
The placement of sampling grid can introduce artifacts
 Nonuniform sampling space, edge effects
Geospatial Data Attributes have explanatory power

Spatial relationships may be causes for observed phenomena
Selecting Geospatial Software Tools

Geospatial software: layered software architecture

Data layer: Efficiently store geospatial data


Feature Set + spatial coordinates
Analytic Layer: Spatial/statistical analysis algorithms

Statistical packages increasingly contain geospatial analysis
tools
Visualization Layer: Creates data views (AKA maps)
Geospatial tools broadly divided in two categories
 Geographic Information Systems (GIS)




Three software layers are each extensive, ‘feature rich’
Geospatial Analysis Packages


Data layer is ‘thinner’, Analytic layer ‘thicker’
Visualization layer built on existing data plotting tools
Geospatial Software Tools: GIS ‘Value Added’

Data layer is optimized for efficient geospatial
data storage/processing




Raster and Vector Data storage, ‘mixed mode’ operations
Georeferencing tools for data layer projection,
spatial registration
Map Algebra tools foster analysis and creation of
data layers
Comprehensive cartographic tools for output
map design
Geospatial Software Tools: GIS Caveats

Underdeveloped geostatistical processing tools
 Vendors pressured to include them in product

Often, these are critical tools for ecological analysis
Steep Learning Curve
 Identifying, mastering ‘essential’ features a challenge
Cost: GIS Software can be expensive
 Upfront purchase and yearly license fees
 Time investment in training and data maintenance
Workload
 If non-GIS must be used for part of analysis, time
must be spent moving between s/w packages




Yet validation data and algorithm details not available
Geospatial Software Tools: GIS Caveats

Underdeveloped geostatistical processing tools
 Vendors pressured to include them in product

Often, these are critical tools for ecological analysis
Steep Learning Curve
 Identifying, mastering ‘essential’ features a challenge
Cost: GIS Software can be expensive
 Upfront purchase and yearly license fees
 Time investment in training and data maintenance
Workload
 If non-GIS must be used for part of analysis, time
must be spent moving between s/w packages




Yet validation data and algorithm details not available
Geospatial Software Tools: Choosing

Some Suggested Selection Criteria
 Research Objectives should drive choice of tools


Identify the project’s core geospatial processing needs
Platform Flexibility

Select tools supported on multi-platforms (hardware/OpSys)


Solution ‘Visibility’



Widely supported/used platforms foster collaberation
Can you obtain the details of the algorithm?
Does the community recognize the accuracy of the algorithm?
Costs of implementing your research idea in software

Scripted solutions using integrated environments are best


R, SAS, MATLAB
Avoid development in high-level programming languages
Geospatial Software Tools: Choosing


Select GIS for core needs:
 Construct, compare, create multiple spatial data layers
 Simultaneously analyzing vector and raster data
 Creating detailed production quality study site maps
 Your data is exclusively in the GIS product format
 You require spatial analysis tools unavailable outside GIS
Select Geospatial Analysis tools for core needs:
 Spatial/Statistical data analysis is the focus
 Your mapping requirements are modest


two-dimensional data plots with geographic coordinates, legend
You need in-depth understanding of algorithms used

Or, you wish to extend / modify the algorithms
Sources for Geospatial Software Tools

Commercial Software Products
 For-profit corporations sell or license their software
 Major players produce comprehensive products


ESRI ArcGIS is the dominant GIS vendor
 Their goal: Provide solution for every geospatial application
Other vendors offer tailored solutions





Examples: ENVI / IDL, ERDAS: Remote Sensing oriented GIS
Example: S Plus Spatial Statistics: Geospatial statistics and spatial
data visualization enhancements to statistical package
Example: MATLAB has mapping and image processing toolkits
Example: SAS offers GIS, geospatial software tools
Commercial products often drive geospatial data formats

Example: ESRI Shape File, ERDAS IMG file
Sources for Geospatial Software Tools

Open Source Software
 Broad-based effort by worldwide scientific and research
community
 Distributed under General Public License (GPL)
 Software development and maintenance by the user
community


Most significant geospatial analysis products: R, GRASS GIS
Examples of others: PostGIS, GDAL libraries

Visit FreeGIS.org, or the open software foundation sites.
Tradeoffs: Commercial GIS Software

Centralized documentation and product support…..


At a price of $100s to $1000s per year
Comprehensive, integrated software product
 Data/Analytic/Visualization layers populated w/ features
 Steep learning curve: Where are my ‘essential features?’
 Training always available – at a cost….
 Details of proprietary geospatial algorithms usually
unavailable
Tradeoffs: Open Source GIS Software

Open Source Software
 Distributed under General Public License (GPL)
 Software development and maintenance by the user
community


Many applications available via the Internet but….

Quality, features, support, and documentation are inconsistent
Algorithms and even source code are freely available
Open Source software drawbacks are shrinking as
user support community evolves and matures
 But active participation in the community is advised for
those wishing to stay technically proficient


Most significant geospatial analysis products: R, GRASS GIS
Sources for Geospatial Data

Government Agencies

National Mapping and Survey Agencies: surface cover data


Research Centers: Climate forecasting models


NOAA, NASA, NCDC
For-Profit Corporations



USGS
The highest-quality UNCLASSIFIED imagery now acquired by the
private sector
Sometimes, no-cost government data is resold to public
Data widely available via the Internet

Many data sets available at no- or low-cost




Notable Exception: Satellite Remote Sensing data
Some discounts available to education and/or research entities
The best sites allow ‘search by geographic coordinates’
Examples from NCEAS Scientific Computing web site
Popular Geospatial Data Formats

Meteorological and Climatalogical Data





Political and Physiographic features





Historical measurements
Short-term model-based forecasts (3 – 10 days from now)
Long-term predictions (10 – 100 years): General Circulation Models
Widely-Used Formats: Gridded Binary (GRIB), NetCDF
Country Boundaries
Road Networks
Drainage Networks
Widely-Used Formats: Digital Line Graphs (DLG), ESRI Shape
Files (.shp)
Most GIS/Geospatial packages ingest these formats

Or conversion utilities are available to ingest them
Popular Geospatial Data Formats

Remote Sensing Imagery

Many operational systems provide many kinds of images





Multispectral Imagery: Landsat, SPOT, IKONOS
Data Formats tend to be sensor-specific
Most GIS can ingest most imagery types
Portal sites
Commercial: http://www.vterrain.org/Imagery/commercial.html
Govt: http://www.nationalgeographic.com/maps/map_links.html
Digital Terrain Models



Raster Grid datasets containing elevation measurements
Available for complete Earth land surface
Primary format: USGS Digital Elevation Model (DEM)


AKA National Elevation Dataset (NED)
Portal sites:
USGS: http://gisdata.usgs.net/Website/Seamless/
Terrainmap.org: http://www.terrainmap.org/
Tour of the Scientific Computing Web Site




Links to Data Sources
Links to Geospatial Software Sources
Links to Tutorials and Research Papers
Archive of NCEAS Research Projects
http://www.nceas.ucsb.edu/scicomp
Example: Spatial Modeling: Optimization




Route vehicles along network using
environmental costs as a metric
Simultaneously locate facilities along shipment
routes that mitigate environmental costs
Optimal Location of species reserve sites
Develop and compare performance of alternate
solution methods
 Mathematically optimal but operationally impractical
 Heuristically derived Near-optimal, usable solution
Spatial Modeling: The Problem Domain
Geospatial Dataset: Routes + Locations
Spatial Model Solution: Alternative Methods
Selecting Species Reserves Locations
Dr. Ross Gerrard, UCSB Biogeography Lab, 1996
Example: Spatial Data Manipulation


Elevation zone threshold calculation
 Digital Elevation Models for selected worldwide sites
 Classify sites into 100 meter ‘wide’ elevation zones
General Circulation Model climate data extraction
 Identify, obtain, import GCM data files
 Import the data into GIS as raster grid
 Overlay point file, extract matching climate values
Digital Elevation Data Ingestion / Clipping
Elevation Zone Data Analysis
General Circulation Model data extraction
Spatial Analysis: Arc GIS and R Platforms
• ESRI Shape files exported to the R programming environment
• R Geostatistical and Spatial Analysis methods can then be applied
A Sampling: R Geospatial Analysis packages

clim.pact: Climate data analysis and
downscaling tools

GeoR: Geostatistical Data Analysis: variograms,
et. al

maptools: read/manipulate polygon data (ESRI
.shp)

shapefiles: read/manipulate ESRI shape files

sgeostat: Geostatistical modeling code

splancs: Spatial and space-time point patterns

spstat: Spatial Point Pattern analysis
Concluding thoughts





NCEAS Associates are extensively use geospatial data in
many creative ways
Geospatial Data Analysis requires specialized techniques
GIS and geospatial analysis available from commercial
vendors and open source community
Choosing geospatial data and tools can be overwhelming
and distract from the primary ‘science mission’
Scientific Programming Team has geospatial expertise,
and can assist NCEAS Associates in this domain

Coming soon: Short course on the R Programming
Language!