KNB Overview for Information Managers
Download
Report
Transcript KNB Overview for Information Managers
Geospatial Data and Spatial Data Analysis Tools
For Ecologists
University of California – Santa Barbara
www.nceas.ucsb.edu
Rick Reeves / March 17, 2005
Presentation Goals
Overview: Geospatial Data Analysis
Defining and distinguishing between spatial, geospatial, geographic data
Addressing the particular attributes of geospatial data
Inventory of Geospatial Data Types
Survey of Geoprocessing Software Tools
Key issues driving choice of geospatial processing software
A Tour of NCEAS Scientific Computing Web Site
Primary data types and common sources for data
Spatial Datasets, Tools, Tutorials, and Project Archives
Some Examples: Geospatial Data Analysis at NCEAS
From the Annals of the NCEAS Scientific Programmer: ‘Real World’
solutions to Ecological research challenges
Meet the Scientific Programmer
Rick’s Academic and Professional Background
Undergraduate: Environmental Remote Sensing
Graduate: Spatial Operations Research / Location-Allocation
Heuristic Development
Spatial Modeling branch of Geographic Data Analysis
Problem Domain: Transportation and Facility Location within
networks
Professional: Software Development, geospatial database
development, training curriculum development
Spatial Data: A Hierarchical Definition
Spatial Data
Observations are distributed in multidimensional space
Geospatial Data
Spatial Data with attached Geographic coordinates
Latitude / Longitude, UTM
Optional: data subjected to a map projection transformation
Geographic Data
Geospatial Data that captures ‘Earth System’ phenomena
X / Y / Z coordinates attached to each data element
Terrain height
Drainage Network
Land surface cover or urban Land Use
Meteorological / climate data forecasts
Ecologists may work with any or all during a project
Overview: Geospatial / Geographic Data
Two Broad Primary Categories
Raster: A multi dimensional, regularly-spaced grid of values (samples)
Vector: Three primary shapes stored in drawing-optimized format
Point, Line, Polygon, (TIN, vector field)
Thousands of datasets exist in hundreds of formats
Dimensions: Northing, Easting, Altitude, Time
Examples: Satellite Image, Digital Terrain, land surface cover maps
Remote Sensing Imagery / Digital Elevation Models
Surface Features (political, physiographic) as points/lines/polygons
Meteorological data (observed / forecasted (short-and long-term))
File format standards set by Industry, Government, user community
Data Ingestion: First Step in Geospatial Analysis
Data input / format conversion / spatial registration
Geospatial Data Analysis
Geospatial Information Analysis: 3 Categories
From O’Sullivan & Unwin (2003)
Spatial Data Manipulation: Investigate the relationships between
geographic dataset layers
Examples: ‘point-in-polygon’, buffer zones around spatial features
GIS software typically used to view/ manipulate / create layers
Spatial/Statistical Data Analysis: Descriptive and Explanatory:
What is there? How do we categorize it?
Data points treated as statistical ‘population’, compared to others
Spatial Modeling: Construct models to explore and understand
geospatial systems
Based on ‘abstraction’ of domain-specific problem into a systems
framework. Some examples:
Predicting network flows; optimizing facility locations among demands
Lessons learned building model as valuable as model’s ‘answers’
The Challenge of Geospatial Analysis
Geospatial Data violate some key statistical assumptions
Spatial Autocorrelation
Samples are NOT randomly selected from normally-distributed population
In fact, nearby samples more likely to be similar than distant ones
Autocorrelated data points introduce redundancy into the sample set
Spatial Scaling
Must be addressed in the experimental design and sampling scheme
Require specialized assessment techniques to factor out effects
AKA Modifiable Areal Unit Problem
Statistical relationships in an area may change at different aggregations
The placement of sampling grid can introduce artifacts
Nonuniform sampling space, edge effects
Geospatial Data Attributes have explanatory power
Spatial relationships may be causes for observed phenomena
Selecting Geospatial Software Tools
Geospatial software: layered software architecture
Data layer: Efficiently store geospatial data
Feature Set + spatial coordinates
Analytic Layer: Spatial/statistical analysis algorithms
Statistical packages increasingly contain geospatial analysis
tools
Visualization Layer: Creates data views (AKA maps)
Geospatial tools broadly divided in two categories
Geographic Information Systems (GIS)
Three software layers are each extensive, ‘feature rich’
Geospatial Analysis Packages
Data layer is ‘thinner’, Analytic layer ‘thicker’
Visualization layer built on existing data plotting tools
Geospatial Software Tools: GIS ‘Value Added’
Data layer is optimized for efficient geospatial
data storage/processing
Raster and Vector Data storage, ‘mixed mode’ operations
Georeferencing tools for data layer projection,
spatial registration
Map Algebra tools foster analysis and creation of
data layers
Comprehensive cartographic tools for output
map design
Geospatial Software Tools: GIS Caveats
Underdeveloped geostatistical processing tools
Vendors pressured to include them in product
Often, these are critical tools for ecological analysis
Steep Learning Curve
Identifying, mastering ‘essential’ features a challenge
Cost: GIS Software can be expensive
Upfront purchase and yearly license fees
Time investment in training and data maintenance
Workload
If non-GIS must be used for part of analysis, time
must be spent moving between s/w packages
Yet validation data and algorithm details not available
Geospatial Software Tools: GIS Caveats
Underdeveloped geostatistical processing tools
Vendors pressured to include them in product
Often, these are critical tools for ecological analysis
Steep Learning Curve
Identifying, mastering ‘essential’ features a challenge
Cost: GIS Software can be expensive
Upfront purchase and yearly license fees
Time investment in training and data maintenance
Workload
If non-GIS must be used for part of analysis, time
must be spent moving between s/w packages
Yet validation data and algorithm details not available
Geospatial Software Tools: Choosing
Some Suggested Selection Criteria
Research Objectives should drive choice of tools
Identify the project’s core geospatial processing needs
Platform Flexibility
Select tools supported on multi-platforms (hardware/OpSys)
Solution ‘Visibility’
Widely supported/used platforms foster collaberation
Can you obtain the details of the algorithm?
Does the community recognize the accuracy of the algorithm?
Costs of implementing your research idea in software
Scripted solutions using integrated environments are best
R, SAS, MATLAB
Avoid development in high-level programming languages
Geospatial Software Tools: Choosing
Select GIS for core needs:
Construct, compare, create multiple spatial data layers
Simultaneously analyzing vector and raster data
Creating detailed production quality study site maps
Your data is exclusively in the GIS product format
You require spatial analysis tools unavailable outside GIS
Select Geospatial Analysis tools for core needs:
Spatial/Statistical data analysis is the focus
Your mapping requirements are modest
two-dimensional data plots with geographic coordinates, legend
You need in-depth understanding of algorithms used
Or, you wish to extend / modify the algorithms
Sources for Geospatial Software Tools
Commercial Software Products
For-profit corporations sell or license their software
Major players produce comprehensive products
ESRI ArcGIS is the dominant GIS vendor
Their goal: Provide solution for every geospatial application
Other vendors offer tailored solutions
Examples: ENVI / IDL, ERDAS: Remote Sensing oriented GIS
Example: S Plus Spatial Statistics: Geospatial statistics and spatial
data visualization enhancements to statistical package
Example: MATLAB has mapping and image processing toolkits
Example: SAS offers GIS, geospatial software tools
Commercial products often drive geospatial data formats
Example: ESRI Shape File, ERDAS IMG file
Sources for Geospatial Software Tools
Open Source Software
Broad-based effort by worldwide scientific and research
community
Distributed under General Public License (GPL)
Software development and maintenance by the user
community
Most significant geospatial analysis products: R, GRASS GIS
Examples of others: PostGIS, GDAL libraries
Visit FreeGIS.org, or the open software foundation sites.
Tradeoffs: Commercial GIS Software
Centralized documentation and product support…..
At a price of $100s to $1000s per year
Comprehensive, integrated software product
Data/Analytic/Visualization layers populated w/ features
Steep learning curve: Where are my ‘essential features?’
Training always available – at a cost….
Details of proprietary geospatial algorithms usually
unavailable
Tradeoffs: Open Source GIS Software
Open Source Software
Distributed under General Public License (GPL)
Software development and maintenance by the user
community
Many applications available via the Internet but….
Quality, features, support, and documentation are inconsistent
Algorithms and even source code are freely available
Open Source software drawbacks are shrinking as
user support community evolves and matures
But active participation in the community is advised for
those wishing to stay technically proficient
Most significant geospatial analysis products: R, GRASS GIS
Sources for Geospatial Data
Government Agencies
National Mapping and Survey Agencies: surface cover data
Research Centers: Climate forecasting models
NOAA, NASA, NCDC
For-Profit Corporations
USGS
The highest-quality UNCLASSIFIED imagery now acquired by the
private sector
Sometimes, no-cost government data is resold to public
Data widely available via the Internet
Many data sets available at no- or low-cost
Notable Exception: Satellite Remote Sensing data
Some discounts available to education and/or research entities
The best sites allow ‘search by geographic coordinates’
Examples from NCEAS Scientific Computing web site
Popular Geospatial Data Formats
Meteorological and Climatalogical Data
Political and Physiographic features
Historical measurements
Short-term model-based forecasts (3 – 10 days from now)
Long-term predictions (10 – 100 years): General Circulation Models
Widely-Used Formats: Gridded Binary (GRIB), NetCDF
Country Boundaries
Road Networks
Drainage Networks
Widely-Used Formats: Digital Line Graphs (DLG), ESRI Shape
Files (.shp)
Most GIS/Geospatial packages ingest these formats
Or conversion utilities are available to ingest them
Popular Geospatial Data Formats
Remote Sensing Imagery
Many operational systems provide many kinds of images
Multispectral Imagery: Landsat, SPOT, IKONOS
Data Formats tend to be sensor-specific
Most GIS can ingest most imagery types
Portal sites
Commercial: http://www.vterrain.org/Imagery/commercial.html
Govt: http://www.nationalgeographic.com/maps/map_links.html
Digital Terrain Models
Raster Grid datasets containing elevation measurements
Available for complete Earth land surface
Primary format: USGS Digital Elevation Model (DEM)
AKA National Elevation Dataset (NED)
Portal sites:
USGS: http://gisdata.usgs.net/Website/Seamless/
Terrainmap.org: http://www.terrainmap.org/
Tour of the Scientific Computing Web Site
Links to Data Sources
Links to Geospatial Software Sources
Links to Tutorials and Research Papers
Archive of NCEAS Research Projects
http://www.nceas.ucsb.edu/scicomp
Example: Spatial Modeling: Optimization
Route vehicles along network using
environmental costs as a metric
Simultaneously locate facilities along shipment
routes that mitigate environmental costs
Optimal Location of species reserve sites
Develop and compare performance of alternate
solution methods
Mathematically optimal but operationally impractical
Heuristically derived Near-optimal, usable solution
Spatial Modeling: The Problem Domain
Geospatial Dataset: Routes + Locations
Spatial Model Solution: Alternative Methods
Selecting Species Reserves Locations
Dr. Ross Gerrard, UCSB Biogeography Lab, 1996
Example: Spatial Data Manipulation
Elevation zone threshold calculation
Digital Elevation Models for selected worldwide sites
Classify sites into 100 meter ‘wide’ elevation zones
General Circulation Model climate data extraction
Identify, obtain, import GCM data files
Import the data into GIS as raster grid
Overlay point file, extract matching climate values
Digital Elevation Data Ingestion / Clipping
Elevation Zone Data Analysis
General Circulation Model data extraction
Spatial Analysis: Arc GIS and R Platforms
• ESRI Shape files exported to the R programming environment
• R Geostatistical and Spatial Analysis methods can then be applied
A Sampling: R Geospatial Analysis packages
clim.pact: Climate data analysis and
downscaling tools
GeoR: Geostatistical Data Analysis: variograms,
et. al
maptools: read/manipulate polygon data (ESRI
.shp)
shapefiles: read/manipulate ESRI shape files
sgeostat: Geostatistical modeling code
splancs: Spatial and space-time point patterns
spstat: Spatial Point Pattern analysis
Concluding thoughts
NCEAS Associates are extensively use geospatial data in
many creative ways
Geospatial Data Analysis requires specialized techniques
GIS and geospatial analysis available from commercial
vendors and open source community
Choosing geospatial data and tools can be overwhelming
and distract from the primary ‘science mission’
Scientific Programming Team has geospatial expertise,
and can assist NCEAS Associates in this domain
Coming soon: Short course on the R Programming
Language!