Building Science Gateways Marlon Pierce Community Grids Laboratory Indiana University Tutorial Overview Type Title Presenter Talk Gateways overview Marlon Talks OGCE overview Marlon Talk TeraGrid: Resources Overview Simms Break Demo LEAD Portal and workflows Suresh Demo GridChem Workflow Suresh Demo OGCE and TGUP Portals Marlon Lunch.

Download Report

Transcript Building Science Gateways Marlon Pierce Community Grids Laboratory Indiana University Tutorial Overview Type Title Presenter Talk Gateways overview Marlon Talks OGCE overview Marlon Talk TeraGrid: Resources Overview Simms Break Demo LEAD Portal and workflows Suresh Demo GridChem Workflow Suresh Demo OGCE and TGUP Portals Marlon Lunch.

Building Science Gateways
Marlon Pierce
Community Grids Laboratory
Indiana University
Tutorial Overview
Type
Title
Presenter
Talk
Gateways overview
Marlon
Talks
OGCE overview
Marlon
Talk
TeraGrid: Resources
Overview
Simms
Break
Demo
LEAD Portal and workflows
Suresh
Demo
GridChem Workflow
Suresh
Demo
OGCE and TGUP Portals
Marlon
Lunch
There’s More
Type
Title
Hands OGCE, LEAD, and TGUP
On
portals and workflows
Talk/H Building the OGCE Portal
O
Presenter
Marlon, Suresh
Marlon
Talk/H Building gadgets with GTLAB Marlon
O
Break (2:00-2:30)
Talk
Web 2.0 for Science
Gateways (Optional)
Marlon
HO
Continue hands on work
Suresh, Marlon
Slides and Demo Site
• Tutorial slides are available from http://www.collabogce.org/ogce/index.php/Tutorials
• We run a permanent demo portal at
https://community.ucs.indiana.edu:8443/gridsphere/
– Also aliased as https://ogceportal.iu.teragrid.org:8443/gridsphere
• Portal accounts train01-train30 have been created for the
workshop. Password is the same as the account name.
– Also train31-train49 from TG08 workshop.
• We also have TeraGrid training accounts with names
train01-train30 that can be used to retrieve TG proxy
credentials.
 These should be active all week.
 You can also log into the TeraGrid User Portal with this account and
the secret password.
Concept #1: Web Portal
• Web container that
aggregates content
from multiple sources
into a single display.
o “Start Pages”
• Typically consume
RSS/Atom news feeds.
• More powerful versions
these days support
Flickr, calendars,
games, etc.
o Gadgets, widgets
• Examples: iGoogle,
Netvibes, My Yahoo!
Gadget
RSS Feeds
Concept #2: Grid Computing
 Grid computing software is designed to integrate large
supercomputing facilities.
 TeraGrid, Open Science Grid, EGEE, etc.
 This is done via network services
 Software providers in the US include Globus and Condor
 Key Service Components (and example services)
 Authentication and authorization framework (MyProxy)
 Remote process access and control (GRAM, Condor)
 Remote file, I/O access (GridFTP, SRB, RFT)
 Additional Services
 Information services, replica management, database federation,
storage management, schedulers, etc.
 Example Grid Software Stacks: CTSS and VDT
 For TeraGrid and Open Science Grid, respectively
 Being pushed by Cloud Computing (Amazon, Google,
Microsoft, others)
Science Portals and Gateways
• Science Gateways adapt Web portal technology
to build user interfaces to the Grid.
• Science portals resemble standard portals, but
must also
– Support access to computing and storage resources.
– Allow users remote, direct access to these resources.
• You often want to run applications and access data that you
own directly.
– Provide access to science applications and data sets.
• And we must provide value added services as
well as user interfaces.
Example Science Gateways
• Many listed here:
– http://www.teragrid.org/programs/sci_gateways/
• Cover many different scientific fields:
– Atmospheric science, geophysics, computational
chemistry, bioinformatics, etc
• See also GCE08 workshop at SC08 and earlier
proceedings
– http://www.collab-ogce.org/gce08/index.php/Main_Page
– GCE05-07 also linked.
TeraGrid Science Gateways Program
Slides courtesy of Nancy Wilkins-Diehr
TeraGrid Area Director for Science Gateways
[email protected]
Today, there are approximately 29
gateways using the TeraGrid
Does a gateway have to use TeraGrid to
be a gateway?
• No, but the TeraGrid does fund the development
and support of these gateways
– Using high end resources is more work and is not
recommended unless it serves a demonstrated need
•Gateways are an excellent way to extend the impact of
high-end resources
• Are they all funded by TeraGrid?
– Can TeraGrid claim success for all gateways?
•No, we don’t make the gateways you use, we make the
gateways you use better
– TeraGrid does fund a small number of developers to
provide advanced support.
•More later.
Why are gateways worth the effort?
=======
# Full path to executable
executable=/users/wilkinsn/tutorial/bin/mcell
• Increasing range of
expertise needed to tackle
the most challenging
scientific problems
– How many details do you
want each individual scientist
to need to know?
•PBS, RSL, Condor
•Coupling multi-scale codes
•Assembling data from multiple
sources
•Collaboration frameworks
#! /bin/sh# Working directory, where Condor-G will write
# its output and error files on the local machine.
#PBS -q dque
initialdir=/users/wilkinsn/tutorial/exercise_3
#PBS -l nodes=1:ppn=2
#PBS -l walltime=00:02:00
# To set the working directory of the remote job, we
#PBS -o pbs.out
# specify it in this globus RSL, which will be appended
#PBS -e pbs.err
# to the RSL that Condor-G generates
#PBS -V globusrsl=(directory='/users/wilkinsn/tutorial/exercise_3')
cd /users/wilkinsn/tutorial/exercise_3
../bin/mcell
nmj_recon.main.mdl
# Arguments
to pass to executable.
arguments=nmj_recon.main.mdl
# Condor-G can stage the executable
transfer_executable=false
&(resourceManagerContact="tglogin1.sdsc.teragrid.org/jobmanager-pbs")
# Specify the globus resource to execute the job
(executable="/users/birnbaum/tutorial/bin/mcell")
globusscheduler=tg-login1.sdsc.teragrid.org/jobmanager(arguments=nmj_recon.main.mdl)
(count=128)
pbs
(hostCount=10)
(maxtime=2)
# Condor has multiple universes, but Condor-G always
(directory="/users/birnbaum/tutorial/exercise_3")
uses globus
(stdout="/users/birnbaum/tutorial/exercise_3/globus.out")
(stderr="/users/birnbaum/tutorial/exercise_3/globus.err")
universe=globus
)
+(
# Files to receive sdout and stderr.
output=condor.out
error=condor.err
# Specify the number of copies of the job to submit to the
condor queue.
queue 1
Not just ease of use
What can scientists do that they
couldn’t do previously?
• LEAD - access to radar data
• NVO – access to sky surveys
• OOI – access to sensor data
• PolarGrid – access to polar ice sheet data
• SIDGrid – analysis tools
• GridChem – developing multiscale coupling
• How would this have been done before gateways?
Gateways Greatly Expand Access
• Almost anyone can investigate scientific questions using
high end resources
– Not just those in the research groups of those who request
allocations
– Gateways allow anyone with a web browser to explore
•Opportunities can be uncovered via google
–Nancy’s 11-year-old son discovered nanoHUB.org himself while his class was
studying Bucky Balls
• Fosters new ideas, cross-disciplinary approaches
• Encourages students to experiment
• But used in production too
– Significant number of papers resulting from gateways including
GridChem, nanoHUB
– Scientists can focus on challenging science problems rather than
challenging infrastructure problems
TeraGrid Pathways Activities
• Program funding to involve MSI communities
• 2 Gateway components
– Adapt gateways for educational use by underrepresented
communities
•GEON – SDSC, Navajo Tech
– Teach participants from underrepresented communities
how to build gateways
•PolarGrid – IU, ECSU
Navajo Technical College and gateways
•Incorporating the use of gateways
in their curricula
•GEON, GISolve areas of initial
interest
PolarGrid
• Cyberinfrastructure Center
for Polar Science (CICPS)
– Experts in polar science,
remote sensing and
cyberinfrastructure
– Indiana, ECSU, CReSIS
• Satellite observations show
disintegration of ice shelves
in West Antarctica and
speed-up of several glaciers
in southern Greenland
– Most existing ice sheet
models, including those used
by IPCC cannot explain the
rapid changes
http://www.polargrid.org/polargrid/images/4/42/C0050polargrid-big.m4v
Source: Geoffrey Fox
• Components of PolarGrid
– Expedition grid consisting of ruggedized laptops in a field grid linked
to a low power multi-core base camp cluster
– Prototype and two production expedition grids feed into a 17
Teraflops "lower 48" system at Indiana University and Elizabeth City
State (ECSU) split between research, education and training.
– Gives ECSU a top-ranked 5 Teraflop MSI high performance
computing system
• Access to expensive data
• High-end resources for analysis
• MSI student involvement
Source: Geoffrey Fox
Recent Gateways using TeraGrid
Significantly
• SCEC
• SIDGrid
• CIG
SCEC using gateway to produce hazard map
• PSHA hazard map for
California using newly
released Earthquake
Rupture Forecast
(UCERF2.0) calculated
using SCEC Science
Gateway
• Warm colors indicate
regions with a high
probability of experiencing
strong ground motion in the
next 50 years.
• High resolution map,
significant CPU use
Social Informatics Data Grid
• Heavy use of “multimodal”
data.
– Subject might be viewing a
video, while a researcher
collects heart rate and eye
movement data.
• Events must be
synchronized for analysis,
large datasets result
• Extensive analysis
capabilities are not
something that each
researcher should have to
create for themselves.
http://www.ci.uchicago.edu/research/files/sidgrid.mov
• Social scientists have traditionally worked in isolated labs without the
capability to share data or insights with others.
• SIDGrid enables a number of capabilities.
– Data that is expensive to collect can now be shared with others, increasing
the potential for scientific impact.
– Geographically distant researchers can collaborate on the analysis of the
same data set.
– Complex analysis tools and workflows are now available for all to use, rather
than having each lab duplicate efforts.
– All researchers now have access to the highest quality computational
resources
•SIDGrid uses TeraGrid resources for computationally-intensive tasks such as media
transcoding algorithms for pitch analysis of audio tracks and fMRI image analysis
• SIDGrid is unique among social science data archive projects
– Focused on streaming data which change over time
– Provides the ability to investigate multiple datasets, collected at different
time scales, simultaneously
• Active users of the SIDGrid system include a human neuroscience group
and linguistic research groups from the University of Chicago and the
University of Nottingham, UK
• 40 institutional members
– 9 foreign affiliates
• Researchers request
synthetic seismograms for
any given earthquake
– Allows scientists to
understand the ground motion
associated with any given
earthquake
• Requested and received
advanced support from
TeraGrid
Talks at E-Science
• See the PSE Workshop:
http://escience2008.iu.edu/workshops/innovative/i
ndex.shtml
– Friday, 10:00 am-4:30 pm
• Nancy Wilkins-Diehr will have more to say about
some of these gateways.
• See also Rich Wolski’s keynote on cloud computing.
Next generation gateways will (need to) support
cloud computing and virtual machine-based
backends.
– Purdue’s NanoHUB and HUB0 software have done this for
some time.
Getting Started Building a
Gateway
Should you? And how can you get
help?
When might a gateway be appropriate?
• Researchers using defined sets of tools in different ways
– Same executables, different input
•GridChem, CHARMM
– Creating multi-scale or complex workflows
– Datasets
• Common data formats
– National Virtual Observatory
– Earth System Grid
– Some groups have invested significant efforts here
•caBIG, extensive discussions to develop common terminology and formats
•BIRN, extensive data sharing agreements
• Difficult to access data/advanced workflows
– Sensor/radar input
•LEAD, GEON
Advanced support for OCI resources
Including gateway integration
• Same peer review process used to request
resources
– 30,000 CPUs
– + 6 months of Nancy
Or someone really talented
• Reviews based on appropriate use of resources,
science is not reviewed if already funded
• Petascale
• Multisite workflows
• Gateways
• Domain expertise
Support is Very Targeted
• Start with well-defined objectives
– Focus on efficient or novel use of OCI resources
• Access to minimum 0.25 FTE for months to a year
– Enough investment to really understand and help solve complex
problems
• Must have commitment from PIs
– Want to make sure work is incorporated into production codes and
gateways
• Good candidates for targeted support include:
– Large, high impact projects
– Ability to influence new communities
• Lessons learned move into training and documentation
GATEWAYS UNDER THE HOOD
My 2002 “octopus” SOA
diagram, from the
archives.
Browser Interface
HTTP(S)
Portlets + Client Stubs
SOAP/HTTP
WSDL
DB Service
WSDL WSDL WSDL WSDL
WSDL WSDL
WSDL
Job Sub/Mon
And File
Services
WSDL
Visualization
Service
JDBC
DB
Operating and
Queuing
Systems
DB
Host 1
Host 2
Host 3
Terminology
• Portlet: this is a standard Java component that
generates HTML and can also act as a client to a
remote service.
– Lives in a portal container.
– I will also use this term generically.
• Web Service: a remotely invoke-able function on the
Internet.
– SOAP: the XML message envelop for carrying commands
over HTTP.
– WSDL: describes the service’s API in XML.
– REST: A variation of this approach.
• Lots more info:
http://grids.ucs.indiana.edu/ptliupages/presentation
s/I590WebService.ppt
But Why?
• Three-tiered Service Oriented Architecture is the
network equivalent of the the famous Model-ViewController design pattern.
– View: the user interface components.
– Controller: Web service middleware
– Model: the backend resources.
• Independence of tiers gives flexibility
– Services can be reused with alternative user
interfaces
•Workflow composers like Taverna, Xbaya, Kepler
– User interfaces can work with different service
implementations.
• Drawback: reliability and robustness are issues.
Two Approaches to the Middle
Tier
Fat Client
Thin Client
Portal Comp.
Portal Comp.
Grid Client
Grid Protocol
(SOAP)
HTTP + SOAP
Web Service
Grid Client
Grid Protocol
(SOAP)
Grid Service
Backend
Resource
Grid Service
Backend
Resource
Managing Scientific Workflows
A Preview for Suresh’s Talks and Demos
Scientific Workflows
•Portal interfaces encode scientific use cases.
•If you have a rich set of services, it is a lot
of work to make portlets for all possible use
cases.
•And power users will have always want
something more.
•Example: our CICC project has dozens of
chemical informatics Web services.
–http://www.chembiogrid.org.wiki
•Workflow composers can simplify this.
–Allow users to encode and execute their own use
cases.
Web Services and Workflows
• Perform a similarity
search on the NIH
DTP Human Tumor
data.
• Filter the results based
on Pharmacokinetic
properties (FILTER)
• Convert to 3D
(OMEGA)
• Docking into a predefined protein (FRED)
• Visualize (JMOL).
Taverna workflow connects remote services.
OGCE’s XBaya
Workflow
Composer
Updating the
Octopus
Browser Interface
HTTP(S)
Social Gadgets+AJAX
RSS,JSON/HTTP
REST
DB Service
REST REST REST REST
REST WSDL
REST
Job Sub/Mon
And File
Services
REST
Visualization
Service
JDBC
DB
Operating and
Queuing
Systems
DB
Host 1
Host 2
Host 3
Enterprise Approach
Web 2.0 Approach
JSR 168 Portlets
Gadgets, Widgets
Server-side integration and
processing
AJAX, client-side integration and
processing, JavaScript
SOAP
RSS, Atom, JSON
WSDL
REST (GET, PUT, DELETE, POST)
Portlet Containers
Open Social Containers (Orkut,
LinkedIn, Shindig); Facebook;
StartPages
User Centric Gateways
Social Networking Portals
Workflow managers (Taverna,
Kepler, etc)
Mash-ups
Grid computing: Globus, condor, etc Cloud computing: Amazon WS Suite,
Xen Virtualization
Semantic Web: RDF, OWL,
ontologies
Microformats, folksonomies
Sample Grid Gadgets in iGoogle
Microformats,
KML, and GeoRSS feeds
used to
deliver SAR data to
multiple clients.
More Information
• Contact me: [email protected]
• See what I’m up to:
http://communitygrids.blogspot.com/
• OGCE software: http://collab-ogce.org/
• Lots of people worked on all of these.
Tremendous Opportunities Using the Largest Shared Resources -
Challenges too!
• What’s different when the resource doesn’t belong just to
me?
–
–
–
–
Resource discovery
Accounting
Security
Proposal-based requests for resources (peer-reviewed access)
•Code scaling and performance numbers
•Justification of resources
•Gateway citations
• Tremendous benefits at the high end, but even more work
for the developers
• Potential impact on science is huge
– Small number of developers can impact thousands of scientists
– But need a way to train and fund those developers and provide them
with appropriate tools
Gateways can further investments in
other projects
• Increase access
– To instruments
• Increase capabilities
– To analyze data
• Improve workforce development
– For underserved populations
• Increase outreach
• Increase public awareness
– Public sees value in investments in large facilities