QuakeSim: Grid Computing, Web Services, and Portals for Earthquake Science Marlon Pierce Community Grids Lab Indiana University.

Download Report

Transcript QuakeSim: Grid Computing, Web Services, and Portals for Earthquake Science Marlon Pierce Community Grids Lab Indiana University.

QuakeSim: Grid Computing, Web
Services, and Portals for
Earthquake Science
Marlon Pierce
Community Grids Lab
Indiana University
Acknowledgements
 Prof. Geoffrey Fox, CGL Director
 Many external collaborators: Andrea Donnellan and team
(JPL), Yehuda Bock and team (Scripps/UCSD), Neil
Devadason, John Buechler, and David Coats (POLIS)
 Dr. Yili Gong
 Graduate Students
 Choonhan Youn (now with GEON project)*
 Galip Aydin*
 Harshawardhan Gadgil
 Mehmet S. Aktas
 Ahmet Sayar
 Zhigang Qi
 Zao Liu
 Jong Youl Choi
Grids and Cyberinfrastructure
Cyberinfrastructure is a term coined by the
National Science Foundation in the
famous “Atkins Report”.
http://www.nsf.gov/od/oci/reports/toc.jsp
Prof. Dan Atkins (UM) is now the head of
NSF’s Office of Cyberinfrastructure.
Roughly synonymous with
eScience (UK)
Grid Computing (DOE and NSF)
Global Information Grid (DOD), etc.
What Is CI, Really?
 Computing, Data Storage, Networking
 NSF TeraGrid (www.teragrid.org)
 Open Sciences Grid (www.opensciencegrid.org)
 Many international equivalents
 Middleware
 Globus: multi-institutional security, job management, file
transfer, data management, system monitoring
 Condor: Cycle-scavenging and job scheduling.
 And many others: see for example the TeraGrid’s Common
TeraGrid Software Stack, the OSG’s Virtual Data Toolkit and
the NMI Grids Center for composite releases.
 Scientific Gateways (like QuakeSim)
 Useful Online Services
 NIH’s PubMed, PubChem
 Most Grids are built these days with Web Services
QuakeSim Project
Requirements and
Architecture
Contributions from Choonhan
Youn, Ahmet Sayar, Galip Aydin,
Harsh Gadgil, and collaborators’
codes
Science Gateways
QuakeSim is an example of a science
gateway.
Google “TeraGrid Science Gateways” for
other examples.
Combines a Web portal and Web
services to access on-line data sources
and connect them to geophysical
applications running on computing
resources.
QuakeSim Applications and Their Data
 Pattern Informatics (UC-Davis)
Earthquake forecasting code, uses seismic archives as
input
 Regularized Dynamic Annealing Hidden Markov
Method (RDAHMM) (JPL)
Time series analysis code, can be applied to GPS and
seismic archives.
Identifies signal components (possibly associated with
underlying physical causes) with no fixed parameters.
 GeoFEST (JPL/CalTech)
Finite element code for detailed modeling of fault
stresses, seismic displacements, uses fault models as
input.
Data Requirements
 QuakeTables Fault Database
 QuakeSim’s fault repository for California.
 Compatible with GeoFEST, Disloc, VC
 GPS Data sources and formats (RDAHMM and others).
 JPL: ftp://sideshow.jpl.nasa.gov/pub/mbh
 SOPAC: ftp://garner.ucsd.edu/pub/timeseries
 USGS: http://pasadena.wr.usgs.gov/scign/Analysis/plotdata/
 Seismic Event Data (RDAHMM and others)
 SCSN: http://www.scec.org/ftp/catalogs/SCSN
 SCEDC: http://www.scecd.scec.org/ftp/catalogs/SCEC_DC
 Dinger-Shearer: http://www.scecdc.org/ftp/catalogs/dingershearer/dinger-shearer.catalog
 Haukkson:
http://www.scecdc.scec.org/ftp/catalogs/hauksson/Socal
My “octopus”
diagram, from the
archives.
Browser Interface
HTTP(S)
JSP + Client Stubs
SOAP/HTTP
WSDL WSDL WSDL WSDL
WSDL
WSDL WSDL
Job Sub/Mon
And File
Services
WSDL
Visualization
Or Map
Service
DB
Operating and
Queuing
Systems
DB
Host 1 (WFS)
Host 2 (Grid)
DB Service
JDBC
Host 3 (WMS)
GIS Services as a Data Grid
 We decided that the Data Grid components of SERVO is
best implemented using standard GIS services.
 Use Open Geospatial Consortium standards
 Maximize reusability in future QuakeSim projects
 Provide downloadable GIS software to the community as a side
effect of QuakeSim research.
 We implemented two cornerstone standards
 Web Feature Service (WFS): data service for storing abstract map
features
 Supports queries
 Faults, GPS, seismic records
 Web Map Service (WMS): generate interactive maps from WFS’s
and other WMS’s.
 We built these as Web Services
 WSDL and SOAP: programming interfaces and messaging formats
 You can work with the data and map services through programming
APIs as well as browser interfaces.
 See www.crisisgrid.org.
Plotting Google
satellite maps with
QuakeTables fault
overlays for Los
Angeles.
Pattern Informatics
This has been our simplest “proving
ground” example.
Integrates (streaming) WFS, WMS,
WS-Context, and HPSearch’s
WSProxy services (wraps PI
executable and helper format
conversion services).
This is basically a linear workflow
Whole earth seismic catalog plotted on
NASA map server. Combines
streaming feature server and map
server.
Pattern informatics results combined with
Feature and Map servers can be used to
forecast areas of increased earthquake
probability.
Data Flow or Event Flow?
 Octopus slide implies a sequential data flow between
applications on distributed hosts.
 Usually called “scientific workflow” in the CI community.
 See http://vtcpc.isi.edu/wiki/ for the an overview and players.
 See www.hpsearch.org for our work to using JavaScript as a
workflow language.
 This is not MPI or parallel programming. It’s more like a stone
age mash-up.
 Services don’t need to know much about each other.
 Don’t have to be from the same providers
 Loosely coupled.
 Transfer data (or URL pointers) as needed.
 Event flow and traditional message passing are better suited
for closely coupled applications.
 See for example DOE’s CCA project and NASA’s Earth System
Modeling Framework (ESMF).
Portlet Development
We use JSR 168 portlets to
build sharable portal plugins.
Portlets: Portal Components
 Web portals are essentially websites with
logins.
Personalization, content control, etc, derive from
this.
 Java portals are based on a standard
component/container model.
Componets are called portlets
JSR 168 is the standard
 Many TeraGrid and other science gateways
use this standard.
Portlet Summary
RDAHMM
Set up and run RDAHMM, query Scripps
GRWS GPS Service, maintain persistent
user sessions.
ST_Filter
Similar to RDAHMM portlet; ST_Filter has
much more input.
Station Monitor
Shows GPS stations on a Google Map,
displays last 10 minutes of data.
Real Time RDAHMM
Displays RDAHMM results of last 10
minutes of GPS data in a Google map.
Seismic Archive Query
Portlet
Google Map portlet that shows seismic
events based on your query.
Fault Query Portlet
Allows you to query the QuakeTables fault
data base for information on faults.
RDAHMM Portlet: Main
Navigation
RDAHMM Project Set Up
RDAHMM GRWS Query
Interface
RDAHMM Results Page
Real Time RDAHMM Portlet
Station Monitor Portlet
ST_Filter Portlets
Managing Real Time GPS
Data
Slides from Galip Aydin
California Real Time Network
Continuous GPS Stations (CGPS) are depicted as
triangles while the Real-Time stations are
represented as circles. Image is obtained from
SOPAC GPS Explorer at
http://sopac.ucsd.edu/projects/realtime
Message Format
Network Data Rates
CRTN GPS
Site Positions
(9 Stations)
Entire SCIGN
Network (250
stations)
Time
RYO
ASCII
GML
1 second
1.5KB
4.03KB
48.7KB
1 hour
5.31MB
14.18MB
171.31MB
1 day
127.44MB
340.38MB
4.01GB
1 month
3.8GB
9.97GB
123.3GB
1 year
45.8GB
119.67GB
1.41TB
1year
1.23TB
16.18TB
160TB
How does one manage all the data generated by the
85 stations? How can you get just the data you want?
Note this is fundamentally different from traditional
request/response style Web Services.
Processing Real-Time GPS Streams
ascii2gm
l
ryo2as
cii
RYO
Ports
ascii2po
s
7010
Raw Data
Scripp
s
RTD
Server
7011
NB
Server
ryo2nb
Single
Station
7012
Displaceme
nt Filter
GPS Networks
RDAHMM
Filter
Raw
Data
ryo2nb
ryo2as
cii
ascii2po
s
Station
Health
Filter
Single
Station
RDAHMM
Filter
/SOPAC/GPS/CRTN01/R
YO
/SOPAC/GPS/CRTN01/A
SCII
/SOPAC/GPS/CRTN01/P
OS
/SOPAC/GPS/CRTN01/DS
ME
A Complete Sensor Message Processing Path, including a data analysis application.
27
Application Integration with Real-Time Filters



RDAHMM
Station
Monitor
Filter records
Filter
records real-time
real-time
positions for 10
positionsand
minutes
for invokes
10 minutes
and calculates
RDAHMM
application
position
changes
which
determines state
changes
in theApplication
XYZ signal.
Graph Plotter
createsPlotter
Graph
visual Application
representation
creates
visual of the
positions.
representation
of the
RDAHMM output.
28
2 – Multiple Publishers Test
Multiple Publishers Test
6
5
2
1
Time Of The Day
Simple
Filter
Transfer Time
 We add more GPS networks by running more publishers.
 The results show that 1000 publishers can be supported
with no performance loss. This is an operating system
limit.
29
22:30
21:00
19:30
18:00
16:30
15:00
13:30
12:00
10:30
9:00
7:30
6:00
0
4:30
Topi
c 1B
RYO
Publisher
n
0:00
RYO To
ASCII
Converter
NB Topi
cn
Server
3
3:00
Topi
c 1A
4
1:30
Topi
c2
Time (ms)
RYO
Publisher
1
RYO
Publisher
2
4 – Multiple Brokers Test
 NaradaBrokering allows
RYO
Publisher
RYO To
ASCII
Converter
Topi
c 1A
NB
Server
1
Topi
c 1B
Simpl
e
Filter
Simpl
1
e
Filter
2
Simple
Filter
750
Simple
Filter
751
NB
Serv
er 2
NB
Server
2
Simple
Filter
752
Topi
c 1B
Simple
Filter
1500
creation of Broker networks.
 We create a two-broker
network.
 Messages published to first
broker can be received from
the second broker.
 We take timings on each
broker.
 We connect 750 clients to
each broker and run for 24
hours. We chose 750 clients to
stay well below the saturation
limit.
 The results show that the
performance is very good and
similar to single broker test. 30
Supporting Geographical
Information Systems
Slides courtesy of Zao Liu
Integrating Map Servers



Geographical Information Systems combine online dynamic
maps and databases.
Many GIS software packages exist
GIS servers around state of Indiana




ESRI ArcIMS and ArcMap Server (Marion, Vanderburgh,
Hancock, Kosciusco, Huntington, Tippecanoe)
Autodesk MapGuide (Hamilton, Hendricks, Monroe,
Wayne)
WTH Mapserver™ Web Mapping Application (Fulton,
Cass, Daviess, City of Huntingburg) based on several
Open Source projects (Minnesota Map Server)
Challenge: make 17 different county map servers from different
companies work together.

92 counties in Indiana, so potentially 92 different map
servers.
Considerations
 We assume heterogeneity in GIS map and feature
servers.
 GIS services are organized bottom-up rather than top-down.
 Local city governments, 92 different county governments,
multiple Indiana state agencies, inter-state (Ohio, Kentucky)
consideration, federal government data providers (Hazus).
 Must find a way to federate existing services.
 We must reconcile ESRI, Autodesk, OGC, Google Map,
and other technical approaches.
 Must try to take advantage of Google, ESRI, etc rather than
compete.
 We must have good performance and interactivity.
 Servers must respond quickly--launching queries to 20 different
map servers is very inefficient.
 Clients should have simplicity and interactivity of Google Maps
and similar AJAX style applications.
Caching and Tiling Maps
 Federation through caching:
 WMS and WFS resources are queried and results are stored on the
cache servers.
 WMS images are stored as tiles.
 These can be assembled into new images on demand (c. f. Google
Maps).
 Projections and styling can be reconciled.
 We can store multiple layers this way.
 We build adapters that can work with ESRI and OGC products; tailor to
specific counties.
 Serving images as tiles
 Client programs obtain images directly from our tile server.
 That is, don’t go back to the original WMS for every request.
 Similar approaches can be used to mediate WFS requests.
 This works with Google Map-based clients.
 The tile server can re-cache and tile on demand if tile sections are
missing.
Google Maps Server
Marion County
Map Server
(ESRI ArcIMS)
Must provide adapters
for each Map Server
type .
Tile Server requests
map tiles at all zoom
levels with all layers.
These are converted
to uniform projection,
indexed, and stored.
Overlapping images
are combined.
Hamilton
County Map
Server
(AutoDesk)
Adapter
Adapter
Adapter
Tile Server
Cache Server
Browser +
Google Map
API
Cass County
Map Server
(OGC Web Map
Server)
Browser client fetches
image tiles for the
bounding box using
Google Map API.
The cache server
fulfills Google map
calls with cached tiles
at the requested
bounding box that fill
the bounding box.
35
Map Server Example
Marion and Hancock
county parcel plots and
IDs are overlaid on IU
aerial photographic
images that are
accessed by this
mashup using Google
Map APIs.
We cache and tile all
the images from several
different map servers.
(Marion and Hancock
actually use different
commercial software.)
Final Thoughts
It’s the Data, Stupid
 Grids have been distracted by complicated security
issues.
 Accounts, allocations, authentication, etc on
supercomputers.
 It assumes a lot of people actually want to do this.
 But arguably most people really want access to data
and results, not computers.
 Ex: PubChem has properties on 12 million drug-like
molecules online, can be browsed for free.
 The Grid security model is equivalent to actually giving you a
key to the lab.
 My suggestion: leave the Grid to the experts and try
to think of as many online data services that can be
created using results from TeraGrid resources.
 Challenge: use all of the TeraGrid, NASA, Open
Science Grid, China National Grid, etc, etc to
Multiple Grid Job Execution
Web 2.0?
 QuakeSim and many similar science gateways
have generally correct approach...
Web Services, online components.
 ...but arguably the details need to be changed.
 We have been following the Enterprise model
(IBM, HP, MS, Sun).
 JSR 168, WSRP, WSDL, SOAP, WS-*
 Maybe time to switch to the Internet model
 Google desktop, Netvibes startpage
 Programmable Web, mash ups, AJAX, REST, etc.
More Information
[email protected]
www.crisisgrid.org
www.quakesim.org (being updated)
The End
http://www.tryscience.org/grid/master/mas
ter.html
Web Map
Client
WSDL
Aggregating
WMS
Stubs
Stubs
HTTP
SOAP
WSDL
WFS
+
Seismic Rec.
WSDL
“REST”
WFS
+
State Bounds
WMS
+
OnEarth
Or
Google Maps
…
Tying It All Together:
 HPSearch is an engine
for orchestrating distributed Web Service
HPSearch
interactions




 It uses an event system and supports both file transfers and data
streams.
 Legacy name
HPSearch flows can be scripted with JavaScript
 HPSearch engine binds the flow to a particular set of remote
services and executes the script.
HPSearch engines are Web Services, can be distributed
interoperate for load balancing.
 Boss/Worker model
ProxyWebService: a wrapper class that adds notification and
streaming support to a Web Service.
More info: http://www.hpsearch.org
SensorGrid Architecture




Major components:
 Real-Time filters
 Publish-Subscribe System
 Information Service
Filters can be run as Web
Services to create workflows.
Filter Chains can be deployed
for complex processing.
Streaming messaging provide
high-performance transfer
options.
46