QuakeSim Project: Portals and Web Services for Geophysics Marlon Pierce Indiana University [email protected] QuakeSim Project Summary  Goal is to provide a distributed environment for connecting scientific.

Download Report

Transcript QuakeSim Project: Portals and Web Services for Geophysics Marlon Pierce Indiana University [email protected] QuakeSim Project Summary  Goal is to provide a distributed environment for connecting scientific.

QuakeSim Project: Portals
and Web Services for
Geophysics
Marlon Pierce
Indiana University
[email protected]
QuakeSim Project Summary
 Goal is to provide a distributed environment for
connecting scientific computing and data
resources with Web based user interfaces.
 QuakeSim’s IT development includes
 Portals for user interfaces.
 Web Services for running remote applications
and accessing databases
 Databases for semantic fault models (USC)
 This talk reviews major revisions that we have
undertaken since 2006.
 Almost a complete rewrite of 2004-2006
system.
My “octopus”
diagram, from the
archives.
Browser Interface
HTTP(S)
Portlets + Client Stubs
SOAP/HTTP
WSDL WSDL WSDL WSDL
WSDL
WSDL WSDL
Job Sub/Mon
And File
Services
WSDL
Visualization
Or Map
Service
DB
Operating and
Queuing
Systems
DB
Host 1 (Quaketables)
Host 2 (Grid)
DB Service
JDBC
Host 3 (G Maps)
Some Design Choices
 Build portals out of portlets (Java Standard)
 Reuse capabilities from our Open Grid Computing Environments (OGCE)
project, the REASoN GPS Explorer project, and many TeraGrid Science
Gateways.
 Decorate with Google Maps, Yahoo UI gadgets, etc.
 Use Java Server Faces to build individual component portlets.
 Build standalone tools, then convert to portlets at the very end.
 Use simple Web Services for accessing codes and data.
 Keep It Stateless …
 Use Globus job and file management services for interacting with high
performance computers.
 Favor Google Maps and Google Earth for their simplicity, interactivity
and open APIs.
 Generate KML and GeoRSS
 Use Apache Maven based build and compile system
Some QuakeSim Applications and Their Data
 Disloc, Simplex
 Fault models are used to calculate surface displacements
(Disloc) using Okada method.
 Simplex is the inverse.
 GeoFEST (JPL/CalTech)
 Finite element code for detailed modeling of fault stresses,
seismic displacements, uses fault models as input.
 Coupled to mesh generation tools
 Regularized Dynamic Annealing Hidden Markov Method
(RDAHMM) (JPL)
 Time series analysis code, can be applied to GPS and seismic
archives.
 Identifies signal components (possibly associated with
underlying physical causes) with no fixed parameters.
QuakeSim, Version 1
Reason to Revise
QuakeSim, Version 2
Application Web Service for
wrapping a.out executables.
Execution management
service built with Apache
Ant.
Services too coupled to
portal; no simple WSDL
programming interface; could
not be used in workflow
engines; not self contained
Give each code a proper
service interface. Retain
Apache Ant core but extend.
Keep WSDL message
structure simple (Strings,
ints, doubles, URLs), wrapped
as Java Beans
File Management Service
Unnecessary, too coupled to
Apache Axis 1.0
HTTP GET, URLs
Context Management
Service manages persistent
portal sessions using
recursive XML structure.
Too slow (file system); didn’t Using DB40; all services
scale; XML databases didn’t
communicate with easily
mature; Object-Relational
XML serializable JavaBeans.
Mappings (ORM) not efficient
OGC-compatible map and
data services
Too complicated; ORM is a
big overhead.
Google Maps, KML
generating services
Serial job submission
NSF TeraGrid and Open
Science Grid run full time
production Grids for HPC.
Condor-G/Birdbath based
job management extensions
to GeoFEST service.
Daily RDAHMM Updates
TeraGrid Supercomputing Resources (GPIR)
Queue Prediction Service (QBETS)
Forecasts time you will
wait in the queue on
various TG super
computers. Inherited
from OGCE project.
GeoFEST Finite
Element Modeling
portlet and plotting
tools
Disloc output
converted to
KML and
plotted.
Web 2.0 for Science Gateways
Enterprise Approach
Web 2.0 Approach
JSR 168 Portlets
Gadgets, Widgets
Server-side integration and
processing
AJAX, client-side integration and
processing, JavaScript
SOAP
RSS, Atom, JSON
WSDL
REST (GET, PUT, DELETE, POST)
Portlet Containers
Open Social Containers (Orkut,
LinkedIn, Shindig); Facebook;
StartPages
User Centric Gateways
Social Networking Portals
Workflow managers (Taverna,
Kepler, etc)
Mash-ups
Grid computing: Globus, condor,
etc
Cloud computing: Amazon WS
Suite, Xen Virtualization
More Information
 Email: [email protected]
 QuakeSim Web Site:
 www.quakesim.org
 Portal URL:
 http://gf7.ucs.indiana.edu:8080/gridsphere
 Portal SourceForge Page:
 https://sourceforge.net/projects/crisisgrid
 Code SVN:
 http://crisisgrid.svn.sourceforge.net/viewvc/crisisgrid/
Acknowledgments
 QuakeSim work is funded by NASA AIST (A. Donnellan,
PI) and ACCESS (Y. Bock, PI) programs.
 Indiana University developers: Galip Aydin, Xiaoming
Gao, Zhigang Qi
 Robert Granat (JPL), Jay Parker (JPL), Maggi Glasscoe
(JPL), John Rundle (UC-Davis), Harout Nazerian (JPL),
Rami Al-Ghanmi (USC), Dennis Mcleod (USC), Paul
Jamason (Scripps), Ruey-Juin Chang (Scripps), Gerry
Simila (CSUN)
Grid Job Submission
 Globus provides a universal queuing system interface.
 PBS, LoadLeveler, Sun Grid Engine, LSF
 We chose Condor-G as our job management software for submitting
jobs to HPC queuing systems.
 University of Wisconsin
 Works with Globus, Matlab DCE, Unicore, etc.
 We co-locate Condor-G with our GeoFEST Web Service.
 Communication is through Birdbath, Condor’s Web Service interface.
 So GeoFEST service API is more or less the same, just now Grid enabled.
 We also plan to release a general version of this service.
 Condor command line and Birdbath have different names for job
description parameters.
 Big Easter Egg hunt to find this, but now we know.
Portlet Summary
RDAHMM
Set up and run RDAHMM, query Scripps
GRWS GPS Service, maintain persistent user
sessions.
ST_Filter
Similar to RDAHMM portlet; ST_Filter has
much more input.
Station Monitor
Shows GPS stations on a Google Map,
displays last 10 minutes of data.
Real Time RDAHMM
Displays RDAHMM results of last 10 minutes
of GPS data in a Google map.
Daily RDAHMM
Calculates, updates RDAHMM event
classifications with daily updated GPS data
from SOPAC’s GRWS service (14 day delay,
but uses all the data).
GeoFEST
Create input geometries, generate FE meshes,
run parallel FEM solvers.
Disloc, Simplex
Calculate service displacements from fault
models.
Security Concerns
They’ll see the Big Board!
QuakeSimDistributed Environment for Modeling Observations
Managing Real Time GPS Data
Slides from Galip Aydin
California Real Time Network
Continuous GPS Stations (CGPS) are depicted as
triangles while the Real-Time stations are
represented as circles. Image is obtained from
SOPAC GPS Explorer at
http://sopac.ucsd.edu/projects/realtime
Message Format
Network Data Rates
CRTN GPS
Site Positions
(9 Stations)
Entire SCIGN
Network (250
stations)
Time
RYO
ASCII
GML
1 second
1.5KB
4.03KB
48.7KB
1 hour
5.31MB
14.18MB
171.31MB
1 day
127.44MB
340.38MB
4.01GB
1 month
3.8GB
9.97GB
123.3GB
1 year
45.8GB
119.67GB
1.41TB
1year
1.23TB
16.18TB
160TB
How does one manage all the data generated by the
85 stations? How can you get just the data you want?
Note this is fundamentally different from traditional
request/response style Web Services.
Processing Real-Time GPS Streams
ascii2gm
l
ryo2asc
ii
RYO
Ports
ascii2pos
7010
Raw Data
Scripps
RTD
Server
7011
NB
Server
ryo2nb
Single
Station
7012
Displaceme
nt Filter
GPS Networks
RDAHMM
Filter
Raw
Data
ryo2nb
ryo2asc
ii
ascii2pos
Station
Health
Filter
Single
Station
RDAHMM
Filter
/SOPAC/GPS/CRTN01/RY
O
/SOPAC/GPS/CRTN01/AS
CII
/SOPAC/GPS/CRTN01/PO
S
/SOPAC/GPS/CRTN01/DSM
E
A Complete Sensor Message Processing Path, including a data analysis application.
25
Application Integration with Real-Time
Filters



26
RDAHMM
Station
Monitor
Filter records
Filter
records real-time
real-time
positionspositions
for 10
for 10 minutes
minutes
and invokes
and
calculates position
RDAHMM
application
changes
which
determines
state
changes in
Graph Plotter
Application
the
XYZ visual
signal.
creates
representation
Graph
Plotter Application
of the
positions.
creates
visual
representation of the
RDAHMM output.
27
2 – Multiple Publishers Test
5
2
1
22:30
21:00
19:30
18:00
16:30
15:00
13:30
12:00
10:30
9:00
7:30
6:00
4:30
0
3:00
RYO
Publisher
n
3
0:00
Topi
c 1B
NB Topi
cn
Server
4
1:30
Topi
c2
Topi
c 1A
RYO To
ASCII
Converter
6
Time (ms)
RYO
Publisher
1
Multiple Publishers Test
RYO
Publisher
2
Time Of The Day
Simple
Filter
Transfer Time
Standard Deviation
 We add more GPS networks by running more publishers.
 The results show that 1000 publishers can be supported
with no performance loss. This is an operating system
28
4 – Multiple Brokers Test
RYO
Publisher
 NaradaBrokering allows
creation of Broker networks.
RYO To
ASCII
Converter
Topi
c 1A
NB
Server
1
Topi
c 1B
Simpl
e
Filter
Simpl
1
e
Filter
2
Simple
Filter
750
Simple
Filter
751
NB
Serve
r2
NB
Server
2
Simple
Filter
752
Topi
c 1B
Simple
Filter
1500
 We create a two-broker
network.
 Messages published to first
broker can be received from
the second broker.
 We take timings on each
broker.
 We connect 750 clients to
each broker and run for 24
hours. We chose 750 clients to
stay well below the saturation
limit.