Cyberinfrastructure to integrate simulation, data and sensors for collaborative eScience in CRESIS CERSER and CRESIS http://nia.ecsu.edu/ Elizabeth City State University October 19 2006 Geoffrey Fox Computer Science,

Download Report

Transcript Cyberinfrastructure to integrate simulation, data and sensors for collaborative eScience in CRESIS CERSER and CRESIS http://nia.ecsu.edu/ Elizabeth City State University October 19 2006 Geoffrey Fox Computer Science,

Cyberinfrastructure to integrate
simulation, data and sensors for
collaborative eScience in CRESIS
CERSER and CRESIS http://nia.ecsu.edu/
Elizabeth City State University
October 19 2006
Geoffrey Fox
Computer Science, Informatics, Physics
Pervasive Technology Laboratories
Indiana University Bloomington IN 47401
[email protected]
http://www.infomall.org
1
Abstract





Cyberinfrastructure supports eScience or
collaborative science with distributed scientists,
computers, data repositories and sensors.
We describe the emerging Grid software for
eScience and the underlying Cyberinfrastructure
such as the TeraGrid.
We give one examples in detail: iSERVO – the
International Solid Earth Research Virtual
Organization supporting Earthquake Science
This illustrates Computing Grids, Geographical
Information System Grids, Sensor Grids
We suggest implications for CReSIS – Center for
Remote Sensing of Ice Sheets
2
Why Cyberinfrastructure Useful







Supports distributed science – data, people, computers
Exploits Internet technology (Web2.0) adding management,
security, supercomputers etc.
It has two aspects: parallel – low latency (microseconds)
between nodes and distributed – highish latency (milliseconds)
between nodes
Parallel needed to get high performance on individual 3D
simulations, data analysis etc.; must decompose problem
Distributed aspect integrates already distinct components
Cyberinfrastructure is in general a distributed collection of
parallel systems
Grids are made of services that are “just” programs or data
sources packaged for distributed access
3
e-moreorlessanything and the Grid







‘e-Science is about global collaboration in key areas of science,
and the next generation of infrastructure that will enable it.’ from
its inventor John Taylor Director General of Research Councils
UK, Office of Science and Technology
e-Science is about developing tools and technologies that allow
scientists to do ‘faster, better or different’ research
Similarly e-Business captures an emerging view of corporations as
dynamic virtual organizations linking employees, customers and
stakeholders across the world.
• The growing use of outsourcing is one example
The Grid provides the information technology e-infrastructure for
e-moreorlessanything.
A deluge of data of unprecedented and inevitable size must be
managed and understood.
People, computers, data and instruments must be linked.
On demand assignment of experts, computers, networks and
storage resources must be supported
4
TeraGrid: Integrating NSF Cyberinfrastructure
Buffalo
Wisc
UC/ANL
Utah
Cornell
Iowa
PU
NCAR
IU
NCSA
Caltech
PSC
ORNL
USC-ISI
UNC-RENCI
SDSC
TACC
TeraGrid is a facility that integrates computational, information, and analysis resources at the
San Diego Supercomputer Center, the Texas Advanced Computing Center, the University of
Chicago / Argonne National Laboratory, the National Center for Supercomputing Applications,
Purdue University, Indiana University, Oak Ridge National Laboratory, the Pittsburgh
Supercomputing Center, and the National Center for Atmospheric Research.
Today 100 Teraflop; tomorrow a petaflop; Indiana 20 teraflop today.
Virtual Observatory Astronomy Grid
Integrate Experiments
Radio
Far-Infrared
Visible
Dust Map
Visible + X-ray
6
Galaxy Density Map
Grid Capabilities for Science







Open technologies for any large scale distributed system that is adopted by
industry, many sciences and many countries (including UK, EU, USA, Asia)
• Security, Reliability, Management and state standards
Service and messaging specifications
User interfaces via portals and portlets virtualizing to desktops, email,
PDA’s etc.
• ~20 TeraGrid Science Gateways (their name for portals)
• OGCE Portal technology effort led by Indiana
Uniform approach to access distributed (super)computers supporting single
(large) jobs and spawning lots of related jobs
Data and meta-data architecture supporting real-time and archives as well
as federation
• Links to Semantic web and annotation
Grid (Web service) workflow with standards and several successful
instantiations (such as Taverna and MyLead)
Many Earth science grids including ESG (DoE), GEON, LEAD, SCEC,
SERVO; LTER and NEON for Environment
• http://www.nsf.gov/od/oci/ci-v7.pdf
7
APEC Cooperation for Earthquake Simulation

ACES is a seven year-long collaboration among scientists
interested in earthquake and tsunami predication
• iSERVO is Infrastructure to support
work of ACES
• SERVOGrid is (completed) US Grid that is
a prototype of iSERVO
• http://www.quakes.uq.edu.au/ACES/

Chartered under APEC –
the Asia Pacific Economic
Cooperation of 21 economies
8
Repositories
Federated Databases
Database
Sensors
Streaming
Data
Field Trip Data
Database
Sensor Grid
Database Grid
Research
SERVOGrid
Education
Compute Grid
Data
Filter
Services Research
Simulations
?
GIS
Discovery Grid
Services
Customization
Services
From
Research
to Education
Analysis and
Visualization
Portal
Grid of Grids: Research Grid and Education Grid
Education
Grid
Computer
Farm 9
SERVOGrid and Cyberinfrastructure


Grids are the technology based on Web services that implement
Cyberinfrastructure i.e. support eScience or science as a team
sport
• Internet scale managed services that link computers data
repositories sensors instruments and people
There is a portal and services in SERVOGrid for
• Applications such as GeoFEST, RDAHMM, Pattern
Informatics, Virtual California (VC), Simplex, mesh
generating programs …..
• Job management and monitoring web services for running
the above codes.
• File management web services for moving files between
various machines.
• Geographical Information System services
• Quaketables earthquake specific database
• Sensors as well as databases
• Context (dynamic metadata) and UDDI system long term
metadata services
• Services support streaming real-time data
10
a
Site-specific Irregular
Scalar Measurements
Ice Sheets
Constellations for Plate
Boundary-Scale Vector
Measurements
a
a
Volcanoes
PBO
Greenland
Long Valley, CA
Topography
1 km
Stress Change
Northridge, CA
Earthquakes
Hector Mine, CA
11
Some Grid Concepts I


Services are “just” (distributed) programs sending and
receiving messages with well defined syntax
Interfaces (input-output) must be open; innards can be
open source (allowing you to modify) or proprietary
• Services can be any language from Fortran, Shell scripts, C,
C#, C++, Java, Python, Perl – your choice!!
• Web Services supported by all vendors (IBM, Microsoft …)

Service overhead will be just a few milliseconds (more
now) which is < typical network transit time
• Any program that is distributed can be a Web service
• Any program taking execution time ≥ 20ms can be an
efficient Web service
12
Web services

resources
Humans
service logic
BPEL, Java, .NET
Databases
Programs
Computational resources
message processing

Web Services build
loosely-coupled,
distributed
applications, (wrapping
existing codes and
databases) based on the
SOA (service oriented
architecture) principles.
Web Services interact
by exchanging messages
in SOAP format
The contracts for the
message exchanges that
implement those
interactions are
described via WSDL
interfaces.
SOAP and WSDL

Devices
<env:Envelope>
<env:Header>
...
</env:header>
<env:Body>
...
</env:Body>
</env:Envelope>
SOAP messages
13
A typical Web Service


In principle, services can be in any language (Fortran .. Java ..
Perl .. Python) and the interfaces can be method calls, Java RMI
Messages, CGI Web invocations, totally compiled away (inlining)
The simplest implementations involve XML messages (SOAP) and
programs written in net friendly languages like Java and Python
Web Services
WSDL interfaces
Portal
Service
Security
WSDL interfaces
Web Services
Payment
Credit Card
Catalog
Warehouse
Shipping
control
14
Some Grid Concepts II

Systems are built from contributions from many different groups
– you do not need one “vendor” for all components as Web
services allow interoperability between components
• One reason DoD likes Grids (called Net-Centric computing)

Grids are distributed in services and data allowing anybody to
store their data and to produce “their” view
• Some think that University Library of future will curate/store data of
their faculty



“2 level programming model”: Classic programming of services
and services are composed using workflow consistent with
industry standards (BPEL)
Grid of Grids: (System of Systems) Realistically Grid-like
systems will be built using multiple technologies and “standards”
–integrate separate Grids for Sensors, GIS, Visualization,
computing etc. with OGSA (Open Grid Service Architecture
from OGF) system Grid (Security, registry) into a single Grid
Existing codes UNCHANGED; wrap as a service with metadata 15
TeraGrid User Portal
16
LEAD Gateway Portal
NSF Large ITR and Teragrid Gateway
- Adaptive Response to Mesoscale
weather events
- Supports Data exploration,Grid Workflow
Grid Workflow Data Assimilation in Earth Science

Grid services triggered by abnormal events and controlled by workflow process real
time data from radar and high resolution simulations for tornado forecasts
Use a Portlet-based user portal to access
and control services and workflow
18
SERVOGrid has a portal
The Portal is built from portlets
– providing user interface
fragments for each service
that are composed into the
full interface – uses OGCE
technology as does planetary
science VLAB portal with
University of Minnesota
19
GIS and Sensor Grids







OGC has defined a suite of data structures and services to
support Geographical Information Systems and Sensors
GML Geography Markup language defines specification of georeferenced data
SensorML and O&M (Observation and Measurements) define
meta-data and data structure for sensors
Services like Web Map Service, Web Feature Service, Sensor
Collection Service define services interfaces to access GIS and
sensor information
Grid workflow links services that are designed to support
streaming input and output messages
We built Grid (Web) service implementations of these
specifications for NASA’s SERVOGrid
Use Google maps as front end to WMS and WFS
20
Grid Workflow Datamining in Earth Science

NASA GPS

Work with Scripps Institute
Grid services controlled by workflow process real time
data from ~70 GPS Sensors in Southern California
Earthquake
Streaming Data
Support
Transformations
Data Checking
Hidden Markov
Datamining (JPL)
Display (GIS)
21
Earthquake
SERVOGrid
…
Earthquake Data,
Filters &
Simulation
Services
Tornado
Grid
Collaboration Grid
Sensor Grid
Registry
…
Portals
GIS Grid
Data Access/Storage
Ice Sheet PolarGrid
Ice Sheet Sensors,
SAR, Filters, EM,
Glacier Simulations
Visualization Grid
Compute Grid
Metadata
Core Grid Services
Security
Notification
Workflow
Messaging
Physical Network
Earth/Atmosphere Grids built as Grids of (library) Grids
22
CReSIS PolarGrid

Important CReSIS-specific Cyberinfrastructure
components include
• Managed data from sensors and satellites
• Data analysis such as SAR processing – possibly with parallel
algorithms
• Electromagnetic simulations (currently commercial codes) to
design instrument antennas
• 3D simulations of ice-sheets (glaciers) with non-uniform
meshes
• GIS Geographical Information Systems

Also need capabilities present in many Grids
• Portal i.e. Science Gateway
• Submitting multiple sequential or parallel jobs
23
What should we do?

Identify existing programs that should be wrapped as Grid services
• One can do this even for commercial codes as one keeps existing codes (Fortran,
C++) unchanged and constructs a “metadata” wrapper defining where programs
and its data are located and how to invoke.

Identify where parallel versions needed and if help needed in creating these
• Parallel codes can be Grid services
• Electromagnetic codes are commercial – in principle parallel
• Ice sheet models can be parallelized for high resolution simulations


Scope out system; Computational needs -Identify value of TeraGrid; data
storage needs; network requirements
Examine data model and produce a data Grid architecture
• Use databases? Distributed? Metadata? Files? What are key performance issues?






Examine integration of GIS with Grid Services
Design and implement Science Gateway
Are there important visualization requirements outside GIS?
Are there key issues from security?
Bring up core services such as registries
Need infrastructure to run services (Linux PC)
24
Benefits of CReSIS PolarGrid






Shared resources support collaboration among CReSIS scientists
Integration of Polar related data with appropriate compute
resources enabling research on specific topics and studies across
topics
Polar Science Gateway accessing common services (programs),
data and their integration as workflow
Access to TeraGrid with same interface for large scale
simulations
Can share common capabilities (SAR analysis, GIS) with related
Grids such as SERVOGrid, GEON, LEAD etc.
Modular Grid services allow exchange of new capabilities
preserving systems
• e.g. Change EM Simulation service

Management of dynamic heterogeneous data
25
SERVO/QuakeSim Services Eye Chart
Service
Description
Job Management
SERVO wraps Apache Ant as a web service and uses it to launch jobs. For a particular application, we design
a build.xml template. The interface is simply a string array of build properties called for by the template.
We’ve also built a simple generic “template engine” version of this.
Specific Applications: Virtual
California, Geofest, Park,
RDAHMM ..
These can be all launched by a single Job Management service or by custom instances of this with metadata
preset to a particular application
Context Data Service
We store information gathered from users’ interactions with the portal interface in a generic, recursively
defined XML data structure. Typically we store input parameters and choices made by the user so that
we can recover and reload these later. We also use this for monitoring remote workflows. We have
devoted considerable effort into developing WS-Context to support the generalization of this initial
simple service.
Application and Host Metadata
Service
We have an Application and a Host Descriptor service based on XML schema descriptors. Portlet interfaces
allow code administrators to make applications available through the browser.
File Services
We built a file web service that could do uploads, downloads, and crossloads between different services.
Clearly this supports specific operations such as file browsing, creation, deletion and copying.
Portal
We use an OGCE based portal based on portlet architecture
Authentication and Authorization
This uses capabilities built into portal. Note that simulations are typically performed on machines where user
has accounts while data services are shared for read access
Information Service
We have built data model extensions to UDDI to support XPath queries over Geographical Information System
capability.xml files. This is designed to replace OGC (Open Geospatial Consortium) Web registry
service
Web Map Service
We built a Web Service version of this Open Geospatial Consortium specification. The WMS constructs
images out of abstract feature descriptions.
Web Feature Service
We’ve built a Web Service version of this OGC standard. We’ve extended it to support data streaming for
increased performance.
Service Eye Chart Continued
Workflow/Monitoring/Management
Services
The HPSearch project uses HPSearch Web Services to execute JavaScript workflow descriptions. It has more
recently been revised to support WS-Management and to support both workflow (where there are many
alternatives) and system management (where there is less work). Management functions include life cycle of
services and QoS for inter-service links
Sensor Grid Services
We are developing infrastructure to support streaming GPS signals and their successive filtering into different
formats. This is built over NaradaBrokering (see messaging service). This does not use Web Services as such
at present but the filters can be controlled by HPSearch services.
Messaging Service
This is used to stream data in workflow fed by real-time sources. It is based on NaradaBrokering which can
also be used in cases just involving archival data
Notification Service
This supplies alerts to users when filters (data-mining) detects features of interest
QuakeTables Database Services
The USC QuakeTables fault database project includes a web service that allows you to search for Earthquake
faults.
Scientific Plotting Services
We are developing Dislin-based scientific plotting services as a variation of our Web Map Service: for a given
input service, we can generate a raster image (like a contour plot) which can be integrated with other scientific
and GIS map plot images.
Data Tables Web Service
We are developing a Web Service based on the National Virtual Observatory’s VOTables XML format for
tabular data. We see this as a useful general format for ASCII data produced by various application codes in
SERVO and other projects.
Key interfaces/standards/software
Used
GML WFS WMS
WSDL XML Schema with pull parser XPP
SOAP with Axis 1.x
UDDI WS-Context
JSR-168 JDBC Servlets
WS-Management VOTables in Research
Key interfaces/standards/software
NOT Used (often just for historical
reasons as project predated standard)
WS-Security JSDL WSRF BPEL OGSA-DAI
Key GIS and Related Services
Component
Description
HPSearch
Support for streaming data between services; supports
scriptable workflows so not limited to DAGs;
implementation of WS-Distributed Management
WS-Context
Contexts can be used to hold arbitrary content (XML,
URIs, name-value pairs); can be used to support
distributed session state as well as persistent data;
currently researching scalability.
Web Feature
Service
Supports both streaming and non-streaming returns of
query results.
Web Map
Services
Supports integration of local and remote map services;
treats Google maps as an OGC-compliant map server;
Sensor Grid
Publish/subscribe system allows data streams to be
reorganized using topics.
28