Building Science Gateways Marlon Pierce Community Grids Laboratory Indiana University Tutorial Overview Type Title Presenter Talk Gateways overview Marlon Talks OGCE overview Marlon Talk TeraGrid: Resources Overview Simms Break Demo LEAD Portal and workflows Suresh Demo GridChem Workflow Suresh Demo OGCE and TGUP Portals Marlon Lunch.
Download ReportTranscript Building Science Gateways Marlon Pierce Community Grids Laboratory Indiana University Tutorial Overview Type Title Presenter Talk Gateways overview Marlon Talks OGCE overview Marlon Talk TeraGrid: Resources Overview Simms Break Demo LEAD Portal and workflows Suresh Demo GridChem Workflow Suresh Demo OGCE and TGUP Portals Marlon Lunch.
Building Science Gateways Marlon Pierce Community Grids Laboratory Indiana University Tutorial Overview Type Title Presenter Talk Gateways overview Marlon Talks OGCE overview Marlon Talk TeraGrid: Resources Overview Simms Break Demo LEAD Portal and workflows Suresh Demo GridChem Workflow Suresh Demo OGCE and TGUP Portals Marlon Lunch There’s More Type Title Hands OGCE, LEAD, and TGUP On portals and workflows Talk/H Building the OGCE Portal O Presenter Marlon, Suresh Marlon Talk/H Building gadgets with GTLAB Marlon O Break (2:00-2:30) Talk Web 2.0 for Science Gateways (Optional) Marlon HO Continue hands on work Suresh, Marlon Slides and Demo Site • Tutorial slides are available from http://www.collabogce.org/ogce/index.php/Tutorials • We run a permanent demo portal at https://community.ucs.indiana.edu:8443/gridsphere/ – Also aliased as https://ogceportal.iu.teragrid.org:8443/gridsphere • Portal accounts train01-train30 have been created for the workshop. Password is the same as the account name. – Also train31-train49 from TG08 workshop. • We also have TeraGrid training accounts with names train01-train30 that can be used to retrieve TG proxy credentials. These should be active all week. You can also log into the TeraGrid User Portal with this account and the secret password. Concept #1: Web Portal • Web container that aggregates content from multiple sources into a single display. o “Start Pages” • Typically consume RSS/Atom news feeds. • More powerful versions these days support Flickr, calendars, games, etc. o Gadgets, widgets • Examples: iGoogle, Netvibes, My Yahoo! Gadget RSS Feeds Concept #2: Grid Computing Grid computing software is designed to integrate large supercomputing facilities. TeraGrid, Open Science Grid, EGEE, etc. This is done via network services Software providers in the US include Globus and Condor Key Service Components (and example services) Authentication and authorization framework (MyProxy) Remote process access and control (GRAM, Condor) Remote file, I/O access (GridFTP, SRB, RFT) Additional Services Information services, replica management, database federation, storage management, schedulers, etc. Example Grid Software Stacks: CTSS and VDT For TeraGrid and Open Science Grid, respectively Being pushed by Cloud Computing (Amazon, Google, Microsoft, others) Science Portals and Gateways • Science Gateways adapt Web portal technology to build user interfaces to the Grid. • Science portals resemble standard portals, but must also – Support access to computing and storage resources. – Allow users remote, direct access to these resources. • You often want to run applications and access data that you own directly. – Provide access to science applications and data sets. • And we must provide value added services as well as user interfaces. Example Science Gateways • Many listed here: – http://www.teragrid.org/programs/sci_gateways/ • Cover many different scientific fields: – Atmospheric science, geophysics, computational chemistry, bioinformatics, etc • See also GCE08 workshop at SC08 and earlier proceedings – http://www.collab-ogce.org/gce08/index.php/Main_Page – GCE05-07 also linked. TeraGrid Science Gateways Program Slides courtesy of Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways [email protected] Today, there are approximately 29 gateways using the TeraGrid Does a gateway have to use TeraGrid to be a gateway? • No, but the TeraGrid does fund the development and support of these gateways – Using high end resources is more work and is not recommended unless it serves a demonstrated need •Gateways are an excellent way to extend the impact of high-end resources • Are they all funded by TeraGrid? – Can TeraGrid claim success for all gateways? •No, we don’t make the gateways you use, we make the gateways you use better – TeraGrid does fund a small number of developers to provide advanced support. •More later. Why are gateways worth the effort? ======= # Full path to executable executable=/users/wilkinsn/tutorial/bin/mcell • Increasing range of expertise needed to tackle the most challenging scientific problems – How many details do you want each individual scientist to need to know? •PBS, RSL, Condor •Coupling multi-scale codes •Assembling data from multiple sources •Collaboration frameworks #! /bin/sh# Working directory, where Condor-G will write # its output and error files on the local machine. #PBS -q dque initialdir=/users/wilkinsn/tutorial/exercise_3 #PBS -l nodes=1:ppn=2 #PBS -l walltime=00:02:00 # To set the working directory of the remote job, we #PBS -o pbs.out # specify it in this globus RSL, which will be appended #PBS -e pbs.err # to the RSL that Condor-G generates #PBS -V globusrsl=(directory='/users/wilkinsn/tutorial/exercise_3') cd /users/wilkinsn/tutorial/exercise_3 ../bin/mcell nmj_recon.main.mdl # Arguments to pass to executable. arguments=nmj_recon.main.mdl # Condor-G can stage the executable transfer_executable=false &(resourceManagerContact="tglogin1.sdsc.teragrid.org/jobmanager-pbs") # Specify the globus resource to execute the job (executable="/users/birnbaum/tutorial/bin/mcell") globusscheduler=tg-login1.sdsc.teragrid.org/jobmanager(arguments=nmj_recon.main.mdl) (count=128) pbs (hostCount=10) (maxtime=2) # Condor has multiple universes, but Condor-G always (directory="/users/birnbaum/tutorial/exercise_3") uses globus (stdout="/users/birnbaum/tutorial/exercise_3/globus.out") (stderr="/users/birnbaum/tutorial/exercise_3/globus.err") universe=globus ) +( # Files to receive sdout and stderr. output=condor.out error=condor.err # Specify the number of copies of the job to submit to the condor queue. queue 1 Not just ease of use What can scientists do that they couldn’t do previously? • LEAD - access to radar data • NVO – access to sky surveys • OOI – access to sensor data • PolarGrid – access to polar ice sheet data • SIDGrid – analysis tools • GridChem – developing multiscale coupling • How would this have been done before gateways? Gateways Greatly Expand Access • Almost anyone can investigate scientific questions using high end resources – Not just those in the research groups of those who request allocations – Gateways allow anyone with a web browser to explore •Opportunities can be uncovered via google –Nancy’s 11-year-old son discovered nanoHUB.org himself while his class was studying Bucky Balls • Fosters new ideas, cross-disciplinary approaches • Encourages students to experiment • But used in production too – Significant number of papers resulting from gateways including GridChem, nanoHUB – Scientists can focus on challenging science problems rather than challenging infrastructure problems TeraGrid Pathways Activities • Program funding to involve MSI communities • 2 Gateway components – Adapt gateways for educational use by underrepresented communities •GEON – SDSC, Navajo Tech – Teach participants from underrepresented communities how to build gateways •PolarGrid – IU, ECSU Navajo Technical College and gateways •Incorporating the use of gateways in their curricula •GEON, GISolve areas of initial interest PolarGrid • Cyberinfrastructure Center for Polar Science (CICPS) – Experts in polar science, remote sensing and cyberinfrastructure – Indiana, ECSU, CReSIS • Satellite observations show disintegration of ice shelves in West Antarctica and speed-up of several glaciers in southern Greenland – Most existing ice sheet models, including those used by IPCC cannot explain the rapid changes http://www.polargrid.org/polargrid/images/4/42/C0050polargrid-big.m4v Source: Geoffrey Fox • Components of PolarGrid – Expedition grid consisting of ruggedized laptops in a field grid linked to a low power multi-core base camp cluster – Prototype and two production expedition grids feed into a 17 Teraflops "lower 48" system at Indiana University and Elizabeth City State (ECSU) split between research, education and training. – Gives ECSU a top-ranked 5 Teraflop MSI high performance computing system • Access to expensive data • High-end resources for analysis • MSI student involvement Source: Geoffrey Fox Recent Gateways using TeraGrid Significantly • SCEC • SIDGrid • CIG SCEC using gateway to produce hazard map • PSHA hazard map for California using newly released Earthquake Rupture Forecast (UCERF2.0) calculated using SCEC Science Gateway • Warm colors indicate regions with a high probability of experiencing strong ground motion in the next 50 years. • High resolution map, significant CPU use Social Informatics Data Grid • Heavy use of “multimodal” data. – Subject might be viewing a video, while a researcher collects heart rate and eye movement data. • Events must be synchronized for analysis, large datasets result • Extensive analysis capabilities are not something that each researcher should have to create for themselves. http://www.ci.uchicago.edu/research/files/sidgrid.mov • Social scientists have traditionally worked in isolated labs without the capability to share data or insights with others. • SIDGrid enables a number of capabilities. – Data that is expensive to collect can now be shared with others, increasing the potential for scientific impact. – Geographically distant researchers can collaborate on the analysis of the same data set. – Complex analysis tools and workflows are now available for all to use, rather than having each lab duplicate efforts. – All researchers now have access to the highest quality computational resources •SIDGrid uses TeraGrid resources for computationally-intensive tasks such as media transcoding algorithms for pitch analysis of audio tracks and fMRI image analysis • SIDGrid is unique among social science data archive projects – Focused on streaming data which change over time – Provides the ability to investigate multiple datasets, collected at different time scales, simultaneously • Active users of the SIDGrid system include a human neuroscience group and linguistic research groups from the University of Chicago and the University of Nottingham, UK • 40 institutional members – 9 foreign affiliates • Researchers request synthetic seismograms for any given earthquake – Allows scientists to understand the ground motion associated with any given earthquake • Requested and received advanced support from TeraGrid Talks at E-Science • See the PSE Workshop: http://escience2008.iu.edu/workshops/innovative/i ndex.shtml – Friday, 10:00 am-4:30 pm • Nancy Wilkins-Diehr will have more to say about some of these gateways. • See also Rich Wolski’s keynote on cloud computing. Next generation gateways will (need to) support cloud computing and virtual machine-based backends. – Purdue’s NanoHUB and HUB0 software have done this for some time. Getting Started Building a Gateway Should you? And how can you get help? When might a gateway be appropriate? • Researchers using defined sets of tools in different ways – Same executables, different input •GridChem, CHARMM – Creating multi-scale or complex workflows – Datasets • Common data formats – National Virtual Observatory – Earth System Grid – Some groups have invested significant efforts here •caBIG, extensive discussions to develop common terminology and formats •BIRN, extensive data sharing agreements • Difficult to access data/advanced workflows – Sensor/radar input •LEAD, GEON Advanced support for OCI resources Including gateway integration • Same peer review process used to request resources – 30,000 CPUs – + 6 months of Nancy Or someone really talented • Reviews based on appropriate use of resources, science is not reviewed if already funded • Petascale • Multisite workflows • Gateways • Domain expertise Support is Very Targeted • Start with well-defined objectives – Focus on efficient or novel use of OCI resources • Access to minimum 0.25 FTE for months to a year – Enough investment to really understand and help solve complex problems • Must have commitment from PIs – Want to make sure work is incorporated into production codes and gateways • Good candidates for targeted support include: – Large, high impact projects – Ability to influence new communities • Lessons learned move into training and documentation GATEWAYS UNDER THE HOOD My 2002 “octopus” SOA diagram, from the archives. Browser Interface HTTP(S) Portlets + Client Stubs SOAP/HTTP WSDL DB Service WSDL WSDL WSDL WSDL WSDL WSDL WSDL Job Sub/Mon And File Services WSDL Visualization Service JDBC DB Operating and Queuing Systems DB Host 1 Host 2 Host 3 Terminology • Portlet: this is a standard Java component that generates HTML and can also act as a client to a remote service. – Lives in a portal container. – I will also use this term generically. • Web Service: a remotely invoke-able function on the Internet. – SOAP: the XML message envelop for carrying commands over HTTP. – WSDL: describes the service’s API in XML. – REST: A variation of this approach. • Lots more info: http://grids.ucs.indiana.edu/ptliupages/presentation s/I590WebService.ppt But Why? • Three-tiered Service Oriented Architecture is the network equivalent of the the famous Model-ViewController design pattern. – View: the user interface components. – Controller: Web service middleware – Model: the backend resources. • Independence of tiers gives flexibility – Services can be reused with alternative user interfaces •Workflow composers like Taverna, Xbaya, Kepler – User interfaces can work with different service implementations. • Drawback: reliability and robustness are issues. Two Approaches to the Middle Tier Fat Client Thin Client Portal Comp. Portal Comp. Grid Client Grid Protocol (SOAP) HTTP + SOAP Web Service Grid Client Grid Protocol (SOAP) Grid Service Backend Resource Grid Service Backend Resource Managing Scientific Workflows A Preview for Suresh’s Talks and Demos Scientific Workflows •Portal interfaces encode scientific use cases. •If you have a rich set of services, it is a lot of work to make portlets for all possible use cases. •And power users will have always want something more. •Example: our CICC project has dozens of chemical informatics Web services. –http://www.chembiogrid.org.wiki •Workflow composers can simplify this. –Allow users to encode and execute their own use cases. Web Services and Workflows • Perform a similarity search on the NIH DTP Human Tumor data. • Filter the results based on Pharmacokinetic properties (FILTER) • Convert to 3D (OMEGA) • Docking into a predefined protein (FRED) • Visualize (JMOL). Taverna workflow connects remote services. OGCE’s XBaya Workflow Composer Updating the Octopus Browser Interface HTTP(S) Social Gadgets+AJAX RSS,JSON/HTTP REST DB Service REST REST REST REST REST WSDL REST Job Sub/Mon And File Services REST Visualization Service JDBC DB Operating and Queuing Systems DB Host 1 Host 2 Host 3 Enterprise Approach Web 2.0 Approach JSR 168 Portlets Gadgets, Widgets Server-side integration and processing AJAX, client-side integration and processing, JavaScript SOAP RSS, Atom, JSON WSDL REST (GET, PUT, DELETE, POST) Portlet Containers Open Social Containers (Orkut, LinkedIn, Shindig); Facebook; StartPages User Centric Gateways Social Networking Portals Workflow managers (Taverna, Kepler, etc) Mash-ups Grid computing: Globus, condor, etc Cloud computing: Amazon WS Suite, Xen Virtualization Semantic Web: RDF, OWL, ontologies Microformats, folksonomies Sample Grid Gadgets in iGoogle Microformats, KML, and GeoRSS feeds used to deliver SAR data to multiple clients. More Information • Contact me: [email protected] • See what I’m up to: http://communitygrids.blogspot.com/ • OGCE software: http://collab-ogce.org/ • Lots of people worked on all of these. Tremendous Opportunities Using the Largest Shared Resources - Challenges too! • What’s different when the resource doesn’t belong just to me? – – – – Resource discovery Accounting Security Proposal-based requests for resources (peer-reviewed access) •Code scaling and performance numbers •Justification of resources •Gateway citations • Tremendous benefits at the high end, but even more work for the developers • Potential impact on science is huge – Small number of developers can impact thousands of scientists – But need a way to train and fund those developers and provide them with appropriate tools Gateways can further investments in other projects • Increase access – To instruments • Increase capabilities – To analyze data • Improve workforce development – For underserved populations • Increase outreach • Increase public awareness – Public sees value in investments in large facilities