The Geographic Information System Tool (GIST): A New Tool

Download Report

Transcript The Geographic Information System Tool (GIST): A New Tool

The Analysis Engine –
A New Tool for Model Evaluation,
Sensitivity and Uncertainty
Analysis, and more…
Alison M. Eyth, Prashant P. Pai
Carolina Environmental Program
University of North Carolina at Chapel Hill
October 19, 2004
Carolina Environmental Program
UNC Chapel Hill
Background






Supports data analysis by creating plots and tables
“Analysis Configurations” facilitate repeated
analyses
Developed as part of the Multimedia Integrated
Modeling System (but can be used standalone)
Java application that runs on Windows, Linux,
Unix, and Mac OS X
Open source – available from
http://sourceforge.net/projects/mimsfw
Three main components:
– Table application
– Plotting engine
– Statistics package
Carolina Environmental Program
UNC Chapel Hill
Table Application



Provides the top level user interface
File menu accesses import and export functions
Currently supported file formats include:
–


Data files are imported as rows and columns
Multiple files can be loaded, with each file shown
in its own tab
–

Comma separated (.csv), Custom and tab delimited, Fixed
column width, SMOKE Report, and ARFF (to support data
mining with WEKA)
tabs include the file name, header, data table, and footer
Toolbar and pop up (i.e. right click) menus provide
access to functions such as sort, filter, top N
rows, format, plot, and statistics
Carolina Environmental Program
UNC Chapel Hill
Table Application GUI
Carolina Environmental Program
UNC Chapel Hill
Toolbar and Pop-up Menu Functions






Multi-column sort
Show only the rows with the Top N values
Show only the rows with the Bottom N values
Filter rows based on criteria (e.g. NOx > 500)
Show / hide columns
Format columns
–




e.g. color, width, font, number or date style
Create plots
Compute statistics
Edit analysis configuration
Reset
Carolina Environmental Program
UNC Chapel Hill
Filter Rows Dialog





Use Filter Rows to limit the rows shown in the table
Any number of criteria can be added
Each criterion has a column, operation, and value
Available operations are <, <=, >, >=, not =, starts with,
contains, ends with, does not start with, does not contain, ...
Select between showing rows matching ALL criteria or ANY
Carolina Environmental Program
UNC Chapel Hill
Plotting Options Dialog




Choose Plot type from Bar, Box, CDF, Discrete Category,
Histogram, Rank Order, XY (Scatter), Line, Time Series, and
Tornado
Select Data Columns to plot
Specify Units and one to three columns to use for labels
Selected data is passed to the plotting engine
Carolina Environmental Program
UNC Chapel Hill
Plot Properties are Specified using
the Analysis Engine GUI
Carolina Environmental Program
UNC Chapel Hill
Example Discrete Category Plot
Note: Plots are created using a custom Java interface to R
Carolina Environmental Program
UNC Chapel Hill
Statistics Dialog





Provides interface to the statistics package
Specify statistics to compute and data columns to analyze
Additional details are specified on other tabs
Statistics outputs appear as new tabs in the table application
Statistics are computed using Colt and Weka
Carolina Environmental Program
UNC Chapel Hill
Example of Histogram Statistics
Note: This is a new tab that supports all the standard
functions such as sort, filter, format, and plot
Carolina Environmental Program
UNC Chapel Hill
Analysis Configuration Dialog





The Analysis Configuration stores all the table settings and plots
that you have created during your session
The selected plots can be viewed, edited or deleted
Plots can be given new names by double clicking the name
Some (or all) of the settings can be saved to a configuration file
Configuration files can be loaded in future sessions or for other
data files in the current session
Carolina Environmental Program
UNC Chapel Hill
Automation

An optional command line interface may be used
specify:
–
–
–
–


Data files to load
Analysis configuration file to use
Type of plots to create (e.g., JPG, PDF, PNG)
Output directory for plots and tables
This allows plots and tables to be created in an
automated fashion
Standard analysis products may be created for
newly available data sets that have the same
format
Carolina Environmental Program
UNC Chapel Hill
Examples of Potential Applications

Model Evaluation
–
–
–

Sensitivity and Uncertainty Analysis
–
–

Perform linear regression and show in plots and tables
Compute correlation coefficients
Emissions Modeling Quality Assurance
–
–
–

Sort to find stations at which the error was the largest
Plot modeled and observed values on box plots, etc.
Create scatter plots of one species vs. another
Find states with top 10 emission values
Stacked bar charts to show total emissions
Compute histograms
General Data Analysis
–
Analyze data by sorting, filtering, and computing statistics
Carolina Environmental Program
UNC Chapel Hill
Future Directions


Initial version will be released on SourceForge by
10/31/04 (which is the end date for the current
funding for this work)
Many potential enhancements are listed on
SourceForge, e.g.:
–
–


Create new rows and columns using functions (e.g difference,
sum)
Create plots and tables with data from multiple tabs
Will likely be used as part of the new emissions
quality assurance tool
(http://sourceforge.net/projects/emisview)
Mr. Tommy Cathey will continue to develop the
custom Java interface to R at the EPA Scientific
Visualization Laboratory in FY05
Carolina Environmental Program
UNC Chapel Hill
References






MIMS Sourceforge page (for downloads):
http://sourceforge.net/projects/mimsfw
R (for plots): http://www.r-project.org
Colt (for basic statistics):
http://www-itg.lbl.gov/~hoschek/colt
Weka (for regression and correlation analysis):
http://www.cs.waikato.ac.nz/~ml/weka/
Carolina Environmental Program (for more
information): http://www.cep.unc.edu
Authors: [email protected], [email protected]
Carolina Environmental Program
UNC Chapel Hill