Portal Web Services: Support of DOE SciDAC Collaboratories Texas Advanced Computing Center Community Grids Laboratory, Indiana University San Diego Supercomputing Center General Atomics http://www.doeportals.org/ PI: Mary Thomas Presenter:

Download Report

Transcript Portal Web Services: Support of DOE SciDAC Collaboratories Texas Advanced Computing Center Community Grids Laboratory, Indiana University San Diego Supercomputing Center General Atomics http://www.doeportals.org/ PI: Mary Thomas Presenter:

Portal Web Services: Support of
DOE SciDAC Collaboratories
Texas Advanced Computing Center
Community Grids Laboratory, Indiana University
San Diego Supercomputing Center
General Atomics
http://www.doeportals.org/
PI: Mary Thomas
Presenter: Geoffrey Fox (co-I)
SciDAC Portal Team Members
• Texas Advanced Computing Center
– Mary Thomas, Jay Boisseau, Eric Roberts, Akhil Seth
• Community Grids Lab
– Geoffrey Fox, Marlon Pierce, Wenjun Wu, Shrideep
Pallickara with help from Dennis Gannon, Beth Plale
• San Diego Supercomputing Center
– Reagan W. Moore, Wayne Schroeder, Arcot Rajasekar
• General Atomics
– David Schissel, Qian Peng, Gheni Abla
DOE Web Portal Services Project
Mission
• The Portals Web Service group develops general
purpose portal containers, components, and Grid
services that can be used in DOE SciDAC projects.
– End-to-end user interface through back-end resource re-using
existing Grid Services and building new ones as needed for
particular applications
• We also develop portal features in support of specific
SciDAC projects
– Currently working with the Fusion Grid
– Serve specific this DOE project team and specific DOE users.
– Dave Schissel, Fusion Grid PI is Co-I on Portal Services
Project
Project Fiscal Information
• Project dates: 9/1/02-8/30/05. 3 years
• Subcontracts signed:
– IU, SDSC: January 2003
– GA: April 2003
• Total budget, Year 1: $718,000
• Budget breakdown for Year 1
–
–
–
–
–
TACC: $260,000
GA: $101,00
IU: $200,000 (4 students and faculty)
SDSC: $157,000
Total about 3.5 FTE and 4 IU Students
http://danube.ucs.indiana.edu:4080/jetspeed (just for the day)
Collaborations and Partnerships
•
•
•
•
•
Indiana University: Dennis Gannon, Beth Plale
NCSA: Jay Alameda, Al Rossi, Shawn Hampton
ANL: Gregor von Laszewski
Michigan: Charles Severance, Joseph Hardin
Global Grid Forum: Grid Computing Environments
Research Group: co-leaders – Geoffrey Fox, Dennis
Gannon, Mary Thomas
– Best Practice Information document and journal publication of
28 projects
– Continuing workshops and summaries for portlets and
workflow
• European portal/portlet activities: Potsdam group with
GridSphere and recent e-Science meeting in Edinburgh
External Portal Efforts
• Alliance Portal
– NCSA/NSF funded portal project led by IU and including NCSA and CoGkits
from Argonne
– Building several Grid portlets.
– Several portlets reusable in Fusion Grid portal.
• CHEF (Michigan)
– Jetspeed-based portal system with collaboration tools
– NEESgrid (NSF) and CMCS (DOE) portals built with CHEF
• NSF NMI Grid Portals Group includes TACC, IU, NCSA, Michigan,
and UC/ANL
– Will ensure compatibility between SciDAC, NCSA Alliance, and CHEF portal
efforts
• DoD Online Knowledge Center OKC for HPCMPO users
– Original Jetspeed Portal built by IU(CGL) in 2001/2002
– Production version run by ERDC Vicksburg https://okc.erdc.hpc.mil/index.jsp
• NASA iSERVO (International Solid Earth Research Virtual
Observatory) Grid for Earthquake Simulation
– Prototype http://complexity.ucs.indiana.edu:8282/jetspeed/index.jsp
– Australia Japan USA - ACES Asia Pacific Consortium
Project Scope
• Portal Web Services architecture falls into six major areas
–
–
–
–
–
–
Clients: browsers, shells, GUIs
Aggregation portals: Jetspeed
Portlet types/classes: WebFormPortlet
Portlet Integration: ProxyManagers, etc.
Client-building libraries/toolkits: GridPort, Java COG, Gateway, etc.
Services: SRB, GPIR, IDL services, etc.
• Work divides into two categories
– General purpose components with FG Interface (Year 1 Focus)
• Built on top of standard grid services.
• May be inherited from other projects (if we have the right architecture)
• May be reused in other SciDAC collaboratories
– Specific Fusion Grid services (Started but Year 2-3 Focus)
•
•
•
•
TRANSP and other application submissions
IDL services and clients
SRB/MDSplus integration
Logbook 2.0 and Monitor 2.0; Web Services and Portlets
Portal Architecture
Largely taken
from other projects
Clients
Portlet Class:
WebForm
Aggregation and Rendering
Clients (Pure HTML, Java Applet ..)
Emphasis
Portlet Class:
IFramePortlet
Portlet Class:
JspPortlet
Portlet Class:
VelocityPortlet
Jetspeed
Internal
Services
Portal
Portlets
(Jetspeed)
Gateway
(IU)
Remote
or Proxy
Portlets
Web/Grid
service
Computing
Web/Grid
service
Data Stores
Web/Grid
service
Instruments
GridPort
TACC
(Java)
COG Kit
Local
Portlets
Libraries
Hierarchical
arrangement
Services
Resources
Portal Development Categories
Clients
Portlets
TACC
IU
SDSC
GA
GCE Shell
AV Applets
PDA Clients
WebForm
Integrated
Portal
Capabilities
Libraries or Services
Toolkits
Login Services
GridPort.
See table
later
Login Services
Gateway
See table
later
SRB Web
Service
Clients
SRB
(SVG Container)
CGI
MDSplus
IDL
Applications
Fusion Grid Overview and
Requirements
Set the stage: who are we working
with and why
VISION FOR THE FUSION GRID
• Data, Codes, Analysis Routines, Visualization Tools should be
thought of as network accessible services
• Shared security infrastructure
• Collaborative nature of research requires shared visualization
applications and widely deployed collaboration technologies
– Integrate geographically diverse groups
• Not focused on CPU cycle scavenging or “distributed”
supercomputing (typical Grid justifications)
– Optimize the most expensive resource - people’s time
• Access is stressed rather than portability
• Users are shielded from implementation details
• Transparency and ease–of–use are crucial elements
• Shared toolset enables collaboration between sites
and across sub–disciplines
• Knowledge of relevant physics is still required of course
THE NFC PROJECT HAS A DIVERSE TEAM








ANL: Distributed Systems Lab
— Kate Keahey, Ian Foster, Sam Lang, Sam Meder, Von Welch
ANL: Futures Lab
— Mike Papka, Justin Binns, Ti Leggett, Rick Stevens
General Atomics: DIII-D Fusion Lab
— David Schissel, Gheni Abla, Justin Burruss, Sean Flanagan, Qian Peng
LBNL: Distributed Systems
— Mary Thompson, Abdeliah Essiari
MIT: C–Mod Fusion Lab
— Martin Greenwald, Tom Fredian, Josh Stillerman
Princeton Computer Science
— Adam Finkelstein, Kai Li, Grant Wallace
Princeton Plasma Physics Lab: NSTX Fusion Lab
— Doug McCune, Eliot Feibush, Tina Ludescher, Scott Klasky, Lew Randerson
U. of Utah: Scientific Computing and Imaging
— Allen Sanderson, Chris Johnson
We are working (indirectly) with many people!
NFC’S TOOLS AND TECHNOLOGIES
Secure MDSplus using Globus GSI available
— Authentication and Authorization using DOE CA
 TRANSP available for worldwide usage on FusionGrid
— Beowulf cluster, client application, complete job monitoring
— Secure access by Globus GSI, Akenti, DOE Grids CA
 Personal Access Grid software and specifications available
— Installed at MIT and GA; PPPL has large AG node
 SCIRun for 3D visualization including MDSplus stored Fusion data
 Toolkits for sharing visualization wall to wall and on AG
— Tiled walls at GA and PPPL

Fusion Grid Electronic Logbook
FUSION GRID MONITOR: A FAST AND EFFICIENT
MONITORING SYSTEM FOR THE GRID ENVIRONMENT




Derivative from DIII–D between pulse data analysis monitoring system
Users track and monitor the state of applications on FusionGrid
— Output dynamically via HTML, Detailed log run files accessible
Code maintenance notification
— Users notified, queuing turned off, code rebuilt, queue restarted
Built as a Java Servlet (using JDK2.1)
ReviewPlus: GENERAL DATA VISUALIZATION TOOL
DEVELOPED IN THE FUSION COMMUNITY

IDL based

Data combinations

Overplotting

Any y versus any X

Math functions

2D and 3D coupling

Automatic updating
Portal Service Requirements for the Fusion
Grid
• Support ubiquitous access for users
– Provide alternatives, supplements to IDL-based tools currently available.
– Enable researchers to work from anywhere, so they are not tied Grid-enabled
hosts with IDL licenses.
• Provide easy access to FG resources for users behind firewalls, NATs,
etc.
– Joint European Taurus (JET) users behind NAT, can’t use FG resources
• Provide a framework that allows us to easily combine FG and third
party web tools with our own.
– Example: FG web logbook
– We should further provide best practices, examples, and tool APIs that will
allow the FG team to work independently.
• Support small and new user groups
– Shield groups with small infrastructure support from installing and maintaining
Grid tools.
– Remove the licensing barrier to using the Fusion Grid tools
• Make using the Fusion Grid as simple as possible.
Portal Service Requirements II
• Provide views of MDSplus data should be content rather than site
driven.
– Current MDSplus server installations are located at three primary sites.
– Users should be provided with a unified view of the data in these installations.
• Provide users with sophisticated, web accessible data services
– Combine features of MDSplus and SRB.
• Provide well defined ways for accessing information content as well
as computational services through the browser
– Web content like documentation, users’ guides, FAQs, etc, are provided by other
groups.
– The portal services group should support the simple integration of this material
into the portal.
• Support remote collaboration
• Provide additional information management and sharing tools
– Document archives, search, retrieval, message boards, announcements
– Also support FG and third party tools.
Portal Architecture and Service
Review
Describe how we can meet FG
requirements through general portalcomponent-service model
Meeting Fusion Grid and SciDAC
Collaboratory Requirements
• Portals and portal services are well tested solutions to
several of the FG requirements.
– Ubiquitous access
– Thin clients (browsers)
– Client groups “outsource” grid infrastructure maintenance to the
portal providers.
– Firewall friendly
• Problems exist in some current portal architectures
– No standards for common portal services —authentication,
content access control, layout management.
– No well defined way to do distributed development
• How do TACC, IU, and SDSC combine forces together and with FG?
– No well defined way to reuse and share portal services
• How do we reuse FG capabilities in other portals?
Portlets: Reusing and Sharing Portal
Components
• Basing the Fusion Grid Portal on a portlet-containerservice
• Portlets allow simple sharing of web tools.
– Distributed groups can work semi-independently.
– Can quickly assemble a portal from everyone’s contributions.
– Proven approach in previous work: Alliance Portal, latest
HotPage, OKC, QuakeSim (iSERVO)
• Baseline Grid technologies allow us to build general
purpose components
– We have components to work with GT2, GT3, and SRB.
– Java COG kit and GridPort
– FG currently based on GT2.
• Jetspeed is not superstar technology but with care, we
should be able to move portlets to next generation
aggregation portals
Portal Architecture
• Portlets are managed
by a portal container
– Provides interfaces to
login, management,
and other services.
• Can have “out of
band” client—service
connections
Portal
– Act as clients to the
Grid through the
COG or implement
proxies
– Generate HTML and
other displays
Local
Portlets
Java
COG
API
HTTP
Proxy
Portlets
Java
CoG
Kit
Remote
Interfaces
• Portlets are Java
components
Grid
Protocols
Grid Services
GRAM,
MDS-LDAD
MyProxy
CoG
Grid Services
Stubs
SOAP
Jetspeed
Internal
Services
Other Services
Portal Service Components
Portlet Container
SRB
Client
Portal Proxy TRANSP File
LDAP
GPIR
Login Manager Submit Manager Browser
WSDL
GridPort
Java COG
WSDL
COG/GP
Java COG
WSDL
SRB
Grid
Auth
MDSPlus
TRANSP
FG Applications
GRAM
Grid
FTP
GRIS/
GIIS
WSDL
IDL
Clients
WSDL
WSDL
WSDL
GPIR
IDL
Services
…
FG Hardware
…
Third Party Baseline Technologies
• Fusion Grid uses Globus Toolkit 2.4
• GridPort currently supports Globus 2.2/2.4
– GridPort 3.0 will support GT 3
• Indiana Grid Portlets built on top of Java CoG
– Support GT2.2 and GT3.0 through CoG
• IU exploring GT3 portlets in other projects
– OGSI portlets
– OGSA-DAI portlets
Portal Service Component List:
Plugging Capability into the Portal
•
•
•
•
GPIR: information services
GridPort: access to Globus and SRB
SRB: Storage Resource Broker
Grid portlets: MyProxy login, GridFTP, LDAP
browsing, Job Launching
• Communication: topic groups and reference
management
• Collaboration: Access Grid and other A/V portlets
• Firewall Tunneling: NaradaBrokering enabled Portlets
Portal Security
• Fusion Grid portal supports server side
key storage (Gridport) and MyProxy
storage (Alliance Portal+COG).
– Need to integrate
• Fusion Grid CA issues browser and
server certs, so we can also
authenticate Web browsers and servers.
• We have Kerberos “web-kinit” portlets
as well, but not integrated into FG.
• Future Portlets can use Web service
security
– Developed prototypical SAML+SOAP
system with Kerberos context
encryption.
• Abstract Jetspeed security/user profiles
as a Web (Grid) service
Browser
HTTPS +Certificates
Portal
Server
My
Proxy
GSI+SSL
Fusion
Grid
SRB
Portal Capabilities
Provider
Description
Status
Next Steps
Grid portal login
TACC
Get Grid proxy cert when logging
into Jetspeed.
1/Prototype Available
Integrate with IU
portlet-based MyProxy
Grid Proxy Certificate
Manager
IU
Get MyProxy certs after logging in.
1/Prototype Available
Integrate with TACC
authentication.
GPIR Portlets
TACC
View, interact with GPIR data.
1/Prototype Available
Point to FG resources
MDS/LDAP Browsers
IU (ANL)
Basic Globus MDS browsing and
navigating
1/Prototype Available
Integrate with Fusion
Grid GRIS/GIIS.
SRB Browser proxy
SDSC, TACC
Web interface to SRB
1,2/Prototype Available
More closely integrate
Web interface into
Portal container.
Display MDSplus
information.
Newsgroups and citation
portlets
IU
Provide discussion forums, feature
requests, etc. Uses NaradaBroker,
JMS/RSS/Semantic Grid services
1/Prototype Available
More closely integrate
with portal container
(Jetspeed).
TRANSP Job Submission
TACC
Support simple job launching and
monitoring of TRANSP
1,2/In progress
Support input file
authoring. Integrate
with MDSplus and
IDL services
File Transfer
IU
Support access to GridFTP
1/Prototype Available
Integrate with FG.
Access Grid Portlets
IU
Display and manage Access Grid
clients (Vic)
Integrates with XGSP H323/SIP
Web Service
1,2/Demo
Integrate with
improved client
libraries and
Autonomic messaging.
OGSA-DAI Portlets
IU
User interface to OGSA/DAI
services
1,2,3/External Demo
Consider bridges to
SRB/MDSplus
SVG Portlets
IU
User interface to 2D Vector
Graphics SVG services
2,3/Planning
Examine use of SVG
for analysis clients
(IDL), White Board
Services
Provider
Description
Year/Current Status
Next Steps
GPIR Ingesters and
Query services
TACC
XML and Web service-based
system for describing HPC
resources: nodes, loads, jobs, etc.
1/Demo on TACC
resources
Install/port services to
fusion grid resources.
SRB as Web Service
SDSC
Develop WSDL interface to SRB
1/Available
Integrate with
MDSplus
SRB- GA MDSplus
Integration
SDSC
Provide SRB integration layer to
MDSplus to leverage SRB
capabilities
1,2/In progress
Complete initial
integration for
SC2003. Plan deeper
integration.
IDL Services
IU, TACC
Develop services for generating
IDL scripts, running IDL remotely
2,3/Planning
Begin initial
development
XGSP AV Services
IU
Integrate/bridge H.323 and AG
protocols
1,2/External demo
Address multicast and
firewall issues
XML Metadata
Messaging
IU
Provide discussion forums, feature
requests, etc.
1/Available
More closely integrate
with portal container
(Jetspeed).
Distributed
event/message delivery
IU
Reliably delivery events and
messages between
1,2,3/Externally
developed
Integrate with Web
services for events,
reliability.
TACC Highlights I: GridPort
• GridPort Grid Portal Toolkit http://gridport.npaci.edu
–
–
–
–
–
–
–
–
–
–
Joint project with SDSC
NPACI, PACI, TACC HotPages
Telescience / BIRN, GAMESS portals
Perl CoG – Perl Commodity Grid Toolkit
CogUtil – external utilities that the other modules use
NWS – standalone NWS wrapper modules
SRB – standalone SRB functionality
GSI authentication and proxy management
File transfer
Job Submission extended to support Fusion Grid code
TRANSP
TACC Highlights II: TRANSP Portal
• Prototype running on Fusion Grid Jetspeed Portal
http://danube.ucs.indiana.edu:4080/jetspeed
• TRANSP as Grid (Web) service enables PPPL to maintain a
support a single production code on a certified platform
• Job submission portlet enables submission and monitoring of
a TRANSP job without IDL-based tools
– Submits runs that are pre-loaded into the run management
database and MDSplus
• Future plans for TRANSP Grid service
– Allow web-based pre-processing of inputs
– Allow a user to upload data files to the portal
– Interact with the run management database to facilitate
easier run creation
– Use MDSplus (integrated with SRB) to store run data
– Some tricky security issues where best approach under
study
TACC Highlights III: GPIR
• GridPort Information Repository GPIR
http://www.tacc.utexas.edu/grid/gpir
• GPIR is a web service enabled information service evolved
from various HotPage, GridPort, TACC and GCE-RG
information and web services projects (IAWS)
• Scalable with relational database backend and extensible to
general XML schema
• Built as two services: 1: Ingester WS
– Accepts XML documents containing updates to Grid status
and ingests them into a database
• 2: Query WS
– Provides XML containing query specific information
• Current TACC specific GPIR will be ported to Fusion Grid
after security issues (see TRANSP) addressed and support
static and dynamic Fusion Grid machine data
SDSC: SRB MDSplus Integration
• Provide support for large data sets
– Support migration of data to archives
• Provide access to data stored in archives and support
replication
• Support distribution of MDSplus data across storage systems
– Support dynamic addition of storage resources
• Provide WSDL interface between Portals and data handling
system
OAI
DLL /
Linux
Java, NT
C, C++
Unix
Access
Libraries
I/O
Shell
Browsers
Python
GridFTP
WSDL
Consistency Management / Authorization-Authentication
Logical Name
Space
Latency
Management
Catalog Abstraction
Databases
DB2, Oracle, Sybase,
SQLServer, Informix
Data
Transport
Metadata
Transport
Storage Abstraction
Archives
File Systems Databases
HPSS, ADSM, HRM
UniTree, DMF
Unix, NT,
Mac OSX
DB2, Oracle,
Postgres
APIs
Prime
Server
Servers
SRB MDSplus Integration
• SRB provides many supplemental capabilities to MDSplus,
making it possible to:
– create single logical name space for data stored in MDSplus
– replicate data objects to secondary resources
– store data into archival storage systems such as HPSS
– easily integrate new disk and other storage resources
– replicate data to tape subsystems
– secure data via encryption
– maintain precise access control and an audit trail
• Integration efforts will
– Maintain MDSplus API to clients
– Make this available at administrator level, transparent to
users
– Allow for evolving usage over time
– Implement SRB as a Data Grid underneath MDSplus
• Prototype before SC03
Storage Resource Broker at SDSC
Project Instance
NPACI
Digsky
DigEmbryo
HyperLter
Hayden
Portal
SLAC
NARA/Collection
NSDL/SIO Exp
TRA
DTF
BIRN
AfCS
UCSDLib
NSDL/CI
TOTAL
~
As of 12/22/2000
As of 1/9/2002
As of 5/02/2003
Data_size (in
Data_size (in
Data_size (in
Count (files)
Count (files)
Count (files)
GB)
GB)
GB)
46,844
329.63
2,828.18
807,737
4,480.00
1,818,530
7,599.00
3,630,300
10,565.00
5,079,883
33,930.00
5,292,161
124.30
2,479
227.77
16,629
658.00
43,326
28.94
69
147.50
1,694
207.00
4,473
3,917.80
18,112
7,078.00
59,399
7.40
443
880.00
24,521
434.80
9,905
1,663.00
236,688
0.02
301
47.00
34,077
65.12
7,614
91.07
2,371
477.86
9,368
87.42
177,612
65.80
11,654
1,084.00
138,413
177.20
775,959
8,081.87
3,679,692
18,128.47
5,934,704
50,991.47
8,636,166
8 TB
3.7 million
18 TB
6 million
50 Terabytes and Counting
http://www.npaci.edu/DICE/
~51 TB
8.64 million
Multicast Multistream AG Portlet
• Java applet supports
multicast AG with multiple
streams
• In Jetspeed, easiest to have
fixed size but this doesn’t fit
well natural range of 1-20
separate streams
• Unicast uni-stream Vic applet
running
• Prototype multicast multistream Vic/Rat applet under
test and integration with
NaradaBrokering
• Add RealMedia, WMF,
Polycom portlet clients
• See http://www.infomall.org
Polycom vic (AG) and
RealVideo views of
multiple streams using
XGSP Web Service
integrating SIP and H323
Collaborative SVG Chess
Game in Batik Browser
Players
Observers
PDA-CellPhone-PC Collaborative Clients
Scalable Vector Graphics (SVG) via “Shared Web Service”
Summary
• Identified Initial Fusion Grid Requirements
• “Current best practice” portlet and Grid service
based approach seems to be applicable
• Initiated activities to give FG useful portlets and
services
–
–
–
–
–
–
–
Security architecture
MDSplus with SRB
TRANSP as a Grid Service
Logbook and Monitor portlets
GPIR status
A/V portlets
Other general services such as File manipulation
• Planning new visualization and IDL portlets/services
Further Details on
GridPort
Grid Portal Toolkit
Texas Advanced Computing Center
What Is GridPort?
• Provides remote access to compute, storage and
visualization resources through various Perl
modules
• Create science portals on computational grids
• Web interface to applications
• Simple Perl and HTML programming
• GridPort is used on these projects:
–
–
–
–
NPACI, PACI HotPages
TACC HotPage
Telescience / BIRN portal
GAMESS portal
GridPort Features
• GSI authentication and session tracking
• A web portal accesses GridPort for
functionality
• GridPort modules:
–
–
–
–
–
Perl CoG – Perl Commodity Grid Toolkit
CogUtil – external utilities that the other modules use
NWS – standalone NWS wrapper modules
SRB – standalone SRB functionality
GridPort – portal API to GridPort, SRB and NWS
GridPort Capabilities
• Authentication
– Handles portal authentication, GSI login, MyProxy
login, logout, session information and proxy
management
• Proxy forward
– Handles forwarding a proxy to a remote system.
• File transfer
– Handles transferring files to and from the portal to a
remote system and third party transfer
GridPort Capabilities
• Job
– Job submission and remote command execution
• SRB
– SRB functionality through the portal
• NWS
– Access to NWS data through GridPort subroutines
• GridPort Website
– http://gridport.npaci.edu
Further Details on
GPIR
GridPort Information Repository
Texas Advanced Computing Center
Origins
• HotPage Informational Data
–
–
–
–
Load, MOTD, Node Map, etc.
Obtained from customized data gathering scripts
MDS 2.0 where available
“Static” VO configuration data
• Identified interest in recording historical grid data in
support of
– Workflow / Decision-making
– Job schedulers / Brokers
– Histograms
• Sought to move towards a web services model using
– XML schema
– Removes the need to write customized implementations for
each new resource
GridPort Information Repository
(GPIR)
• Implementation of web service enabled information service
• Evolved from various HotPage, GridPort, TACC and GCE-RG
information and web services projects (IAWS)
• Concept demonstrated at SC 02 for TeraGrid, PACI
(NPACI/Alliance) resources
–
–
–
–
Called Information Archival Web Service (IAWS)
Based on XML documents stored on a file server
Thin clients (Java / Perl) pushed data into repository
Contained XML documents for current grid status as well as archived
historical data (HotPage information other)
– The IAWS was conceptualized in collaboration with SDSC and NCSA
Design Philosophy
• “Aggressive Practicality”
– Works today with what’s available today
– Comprehensive Portal-centric data set
– Intended to support the GridPort GCE framework and it’s
data requirements.
• As web service, can be repurposed to any grid data needs.
• Follow Standards
– OGSI (Grid Services)
– Emerging Data Schema (GLUE?)
• Scalable
– Relational Database back-end
• Extensible
– Easy to add new XML Queries, format as needed
GPIR Architecture
Resources
dB
Information
Providers
Clients
Portals
Perl
Client
Ingester WS
Java
Client
edu.tacc.GPIR
Query WS
MDS
GPIR
Web
Scraping
MySQL
PostgreSQL
OGSA
(Future)
Other
SOAP-XML
HTTP
JDBC
Other
Middleware
Portlets
GPIR Web Services
• Ingester WS
– Accepts XML documents containing updates to
Grid status and ingests them into a database
• Query WS
– Provides XML containing query specific
information
Future Directions
• Intend to implement GPIR as a grid service
– OGSA/OGSI Compliance
• Integration into GridPort 3.0
– J2EE Implementation
– Treat GPIR Entities as real objects rather than table rows
•
•
•
•
Significant expansion to the data being gathered
Administration Client
Use data for reporting and decision making
Code available at:
http://www.tacc.utexas.edu/grid/gpir
GPIR Integration with the Fusion
Grid
•
GPIR database and web services already installed on
Indiana dev machine, danube
Write GPIR data harvester scripts to harvest dynamic
data, periodically, about the machines and publish it to
public web space.
Write GPIR ingester scripts to scrape the data from webspace and ‘ingest’ it into the GPIR DB.
•
•
–
•
•
Currently there is a problem logging into the pppl machines
because of the firewall authentication process mentioned above–
it’s not yet clear how to automate this process.
Enter static machine information (name, system type, #
cpus, etc…) into GPIR database on danube.
Change GPIR portlets in portal to display Fusion VO
machines instead of TACC machines once machines are
in DB
Further Details on
TRANSP
Job Submission Portal
and Grid Service
Texas Advanced Computing Center
TRANSP SERVICE

Advantages of Grid implementation
— Remote sites avoid costly installation and code maintenance
— PPPL maintains and supports a single production version of
code on well characterized platform
— Trouble–shooting occurs at central site where TRANSP
experts reside
— Benefits to users and service providers alike

Production system: since October 1, 2002
— 16 processor Linux cluster
— Dedicated PBS queue
— Tools for job submission, cancellation, monitoring
— 21 Grid certificates issued, 7 institutions (1 in Europe)

Four European machines interested
— JET, MAST, ASDEX–U, TEXTOR
— Concerns: understanding software, firewalls, grid security
TRANSP Job Submission Portlet Overview
• Enables user to submit a TRANSP job
through a web-based interface.
• Eliminates the need for IDL-based tools to
do TRANSP job submission
• Emulates General Atomics PreTRANSP
IDL-based application
TRANSP Job Submission Portlet –
Status
• Current
– Early demonstration; proof of concept
– Implements secure submission of a TRANSP job through a web
interface
– Submits runs that are pre-loaded into the run management database
and MDSplus
• Future
– Allow web-based pre-processing of inputs
– Allow a user to upload data files to the portal
– Interact with the run management database to facilitate easier run
creation
– Use MDSplus (integrated with SRB) to store run data
•
•
Cluster account set up at PPPL
Must solve security problem for portal server at IU to access
–
–
–
PPPL hosts require additional authentication with security cards
Applies to all transpgrid.pppl.gov machines
You can authenticate through a web interface
•
•
•
•
•
prompted for username
prompted for password (4 digit pin, followed by 6-digit secure id token
(concatenated together))
then you can proceed to ssh via terminal session
Access also available through telnet terminals
This problem could be solved by making an exceptions in the
firewall to allow dedicated resources to login and collect
information.
TRANSP Portlet Next Steps
•
•
•
Allow web-based pre-processing of inputs
Allow a user to upload data files to the portal
Will interact with the run management database to
facilitate easier run creation
Will use MDSplus (integrated with SRB) to store run
data
Will dispatch messages to the Fusion Grid Monitor
during TRANSP job submission phase
Allow a user to abort a submitted TRANSP run
Complete by October, begin testing with trial users
•
•
•
•
–
Supercomputing 2003
Further Details on
Porting MDS+ onto the
Storage Resource Broker Data Grid
Wayne Schroeder
Reagan W. Moore
Arcot Rajasekar
San Diego Supercomputer Center
{schroede, moore, sekar}@sdsc.edu
http://www.npaci.edu/DICE/
Objectives
• Provide support for large data sets
– Support migration of data to archives
• Provide access to data stored in archives
– Support replication
• Support distribution of MDS+ data across
storage systems
– Support dynamic addition of storage resources
• Provide WSDL interface between Portals
and data handling system
Storage Resource Broker
• Data management technology that is used to
implement
– Data grids - data sharing
– Digital libraries - data publication
– Persistent archives - data preservation
• Interoperates with grid technology
– Grid Security Infrastructure
– GridFTP transport version to be released in Dec 2003
• Provides standard interfaces
– C library, shell command, WSDL, web, Java, Perl, OAI
Storage Resource Broker
• Supports access to distributed data
– Archives - HPSS
– File systems - Unix, Windows, Mac OS X, Linux
– Databases - Oracle, DB2, PostgreSQL, Informix
• Organizes data as a logical collection
– Logical name space (location independent naming)
– Descriptive metadata attributes
• Supports bulk operations
– Register, Load, Unload
SDSC Storage Resource Broker & Meta-data Catalog
Application
C, C++
Libraries
Linux
I/O
Unix
Shell
Java, NT
Browsers
DLL /
GridFTP
Python
OAI
WSDL
Consistency Management / Authorization-Authentication
Logical Name
Space
Latency
Management
Catalog Abstraction
Databases
DB2, Oracle, Sybase,
SQLServer, Informix
Data
Transport
Metadata
Transport
Storage Abstraction
Archives
File Systems Databases
HPSS, ADSM, HRM
UniTree, DMF
Unix, NT,
Mac OSX
DB2, Oracle,
Postgres
Access
APIs
Prime
Server
Servers
SRB and MDSplus: Data
Management
• Address data management needs of the
fusion research community as
–
–
–
–
Experimental data collections continue to grow
Simulation data collections exceed the disk size
Integrated access to archives is needed
Requirements grow for extensive distributed
data management
SRB/MDSplus Integration
• SRB provides many supplemental capabilities to MDSplus, making it
possible to:
–
–
–
–
–
–
–
–
create single logical name space for data stored in MDSplus
replicate data objects to secondary resources
store data into archival storage systems such as HPSS
easily integrate new disk and other storage resources
replicate data to tape subsystems
secure data via encryption
maintain precise access control
maintain an audit trail
• Integration efforts will
– Maintain MDSplus API to clients
– Make this available at administrator level, transparent to users
– Allow for evolving usage over time
SRB/MDSplus Integration Status
• Integration alternatives considered
• A draft plan developed for MDSplus on top
of SRB
• Discussions proceeding between SRB and
MDSplus developers
• Working prototype expected by late October
Integrate Storage Resource Broker as
a Data Grid Beneath the MDS+
Management System and APIs
• Investigating three integration methods
– Use srbio.h interface, file system call translator
– Replace MDS+ file system calls with SRB C
Library calls
– Use same WSDL interface as used for Portal.
This requires agreement on WSDL interfaces.
• Pick implementation that provides the
desired performance
Deploy Infrastructure
• Establish data grid linking General Atomics,
U Texas, LBNL, U Indiana
• Status
– Storage Resource Broker version 2.1.1 is
installed at
• U Texas, SDSC
– Installation is proposed for
• Indiana, LBL, GA
WSDL Interface - Matrix
• Change permission - Change access permission on a datagrid collection or a data
set.
• Copy - Copy contents of datagrid collection or a dataset to new resource
• Create - Create a new container or a collection
• Ingest data set - Insert a data set presented as an attachment to the datagrid
request
• Download data set - Download a dataset as an attachment to a datagrid response
• Delete - Delete a datagrid collection or a dataset
• List - List the contents of collection or a container
• Prepare ticket - Prepare a new Grid Ticket
• Rename - Rename a collection or a data set
• Replicate - Replicate the contents of a collection or a dataset
• SeekN'Read - Seek to a point in a data set, send specified bytes as an attachment
• SeekN'Write - Seek to a point in a data set, write the bytes present in the
attachment
Community Grids Laboratory
Futures
Fusion Grid Overview and
Requirements
Further Detail
FUSION COLLABORATORY’S GOAL IS TO ADVANCE
SCIENTIFIC UNDERSTANDING & INNOVATION

Enable more efficient use of existing experimental facilities through
more powerful analysis between pulse data resulting in a greater
number of experiments at less cost

Allowing more transparent access to analysis and simulation codes,
data, and visualization tools, resulting in more researchers having
having access to more resources

Enable more effective integration of experiment, theory, & modeling

Facilitate multi–institution collaborations

Create a standard tool set for data access, security, and visualization
allowing more researchers to build these services into their tools
NFC PROJECT IS CREATING & DEPLOYING
COLLABORATIVE SOFTWARE TOOLS FOR THE FUSION
COMMUNITY

Create transparent & secure access to local/remote computation,
visualization, and data servers

Develop collaborative visualization that allows interactive sharing
of graphical images among control room display devices, meeting
room displays, and with offices over a wide area network
— 3 large fusion machines, ~$1B replacement value
— ~40 research sites in U.S., ~1500 scientists

Enable real–time access to high–power remote computational services
allowing such capabilities as between pulse analysis of experimental
data and advanced scientific simulations
— Experiments pulsed every ~20 minutes, time critical analysis
— Can we do between pulses what today we do the next day?
POTENTIAL NEW CUSTOMERS BEYOND OUR PRESENT
SET
INTERNATIONAL THERMONUCLEAR EXPERIMENTAL
REACTOR
Next Generation of Fusion
Experiments
~$5B class device, over 20 countries
— Thousands of scientists, US
rejoining
 Pulsed experiment with simulations
— ~TBs of data in 30 minutes
 International collaboration
— Productive engaging work
environment for off–site
personnel
 Successful operation requires
— Large simulations, shared vis,
decisions back to the control room
— Remote Collaboration!

Summary of the Fusion Grid
• The FG provides the hardware, software, and
grid infrastructure for nationally distributed
facilities and research teams.
• Would like to involve a larger fusion research
community.
– National and international user community
• Need to be able to support next generation of
fusion experiments.
– Thousands of users across the globe
– Terabytes of data
Portal Screen Shots
Anonymous Page with GPIR Summary
Anonymous View
of Second Tab
Anonymous View:
Machine Summary
Jobs Running on Longhorn
GPIR: Node
map.
Mpierce login screen.
File Management Portlets
SRB and BIRN
Portals
FG Logbook
Fusion Grid Monitor
TRANSP Submission Portlet
Alliance
Portlets
Alliance GridFTP
Some
QuakeSim
portlets
Newsgroup
Some GridPort portals