zhengc_pragam_grid_usm.ppt

Download Report

Transcript zhengc_pragam_grid_usm.ppt

The PRAGMA Testbed
Building a Multi-Application International Grid
Cindy Zheng
Pacific Rim Application and Grid Middleware Assembly
University of California, San Diego
San Diego Supercomputer Center
http://www.pragma-grid.net
Cindy Zheng, Pragma Grid, 5/30/2006
Overview
• PRAGMA
• PRAGMA Grid testbed
• Routine-basis experiments
– Applications
– Grid middleware
– Grid infrastructure software
• Grid interoperation
• Lessons Learned
Cindy Zheng, Pragma Grid, 5/30/2006
PRAGMA
• PRAGMA (2002 - )
– Open international organization
– Grid applications, practical issues
– Build international scientific collaborations
• Characteristics
– No central funding, but mutual interests
– Friendship, trust, help among people
– Doers
• Working groups
– Bio, telescience, data, Geon, resources
• Meetings
Cindy Zheng, Pragma Grid, 5/30/2006
Resources working group
• Improve
– middleware interoperability
– Global grid usability and productivity
– Grid interoperability
• How to make a global grid easy to use?
–
–
–
–
–
For applications. Let applications drive
More organized testbed operation
Full-scale and integrated testing/research
Long daily application runs
Find problems, develop/research/test solutions
Cindy Zheng, Pragma Grid, 5/30/2006
Routine-basis Experiments
http://goc.pragma-grid.net
• Run applications while building testbed
–
–
–
–
–
•
•
•
•
Started 2004
Grass-roots, PRAGMA membership not necessary
Voluntary contribution of resources/work
long term, persistent
General grid
Coordinator
Site supporters
Application drivers
Developers
Cindy Zheng, Pragma Grid, 5/30/2006
How We Operate
• Heterogeneity
– fundings, policies,
environments
• Motivation
– learn, develop, test, interop
• Communication
– email, VTC, Skype,
workshop, timezone,
language
• Create operation
procedures
– joining testbed
– running applications
• http://goc.pragma-grid.net
– resources, contacts,
requirements, instructions,
monitoring, status, tools,
etc.
Cindy Zheng, Pragma Grid, 5/30/2006
How We Operate
http://goc.pragma-grid.net/pragma-grid-status/work.htm
Cindy Zheng, Pragma Grid, 5/30/2006
PRAGMA Grid Testbed
UZurich, Switzerland
KISTI, Korea
JLU, China
TITECH, Japan
CNIC, China
AIST, Japan
SDSC, USA
KU, Thailand
UoHyd, India
NCSA, USA
OSAKAU, Japan
NCHC, Taiwan
USM, Malaysia
MIMOS, Malaysia
BII, Singapore
NGO, Singapore
Cindy Zheng, Pragma Grid, 5/30/2006
CICESE, Mexico
ASCC, Taiwan
UNAM, Mexico
IOIT-HCM, Vietnam
QUT, Australia
MU, Australia
UChile, Chile
PRAGMA Grid resources
http://goc.pragma-grid.net/pragma-doc/resources.html
Cindy Zheng, Pragma Grid, 5/30/2006
Software Layers
•
•
Globus 2, 3, 4
GT4 pre-WS, 9 sites
Cindy Zheng, Pragma Grid, 5/30/2006
•
•
GT4 WS, 1
Moving requirements
Trust
• Trust all site CAs
– tarball
• Experimental -> production
• Setup PRAGMA CA
– GAMA/Naregi-CA
• APGRID PMA, IGTF (5 accr.)
Cindy Zheng, Pragma Grid, 5/30/2006
Applications
http://goc.pragma-grid.net
•
Real science, multiple applications
– Resource sharing
• Mpich-g2
• Reservation and meta-scheduling
–
–
–
–
–
–
–
–
–
–
TDDFT: quantum-chemistry, AIST, Japan
Savannah: climate Model, MU, Australia
QM-MD: quantum-mechanic, AIST, Japan
iGAP: bioinformatic, UCSD, USA
Gamess-APBS: organic chemistry, UZurich,
Switzerland
Siesta: molecular simulation, UZurich,
Switzerland
Amber: molecular simulation, USM, Malaysia
FMO: quantum-mechanics, AIST, Japan
HPM: Genomics, IOIT-HCM, Vietnam
(GEON, Sensor, … <data, sensor>)
Cindy Zheng, Pragma Grid, 5/30/2006
Middleware
• Application middleware
– enable application to run
in grid
– Ninf-G
• AIST, Japan
• TDDFT, QM/MD, FMO
– Nimrod/G
• MU, Australia
• Savannah, Siesta,
Gamess
– Mpich-Gx
• KISTI, Korea
• MM5, CICESE, Mexico
Cindy Zheng, Pragma Grid, 5/30/2006
• Infrastructure
middleware
– provide grid services
– Gfarm
• AIST, Japan
• iGAP, testbed, 6 sites
– SCMSWeb
• KU, Thailand
• Testbed, 20 sites
– MOGAS
• NTU, Singapore
• Testbed, 14 sites
GridRPC: A Programming Model based on RPC
GridRPC API is a proposed recommendation at the GGF
Three components
Information Manager - Manages and provides interface info
Client Component - Manages remote executables via function handles
Remote Executables - Dynamically generated on remote servers
Built on top of Globus Toolkit (MDS, GRAM, GSI)
Simple and easy-to-use programming interface
Hiding complicated mechanism of the grid
Providing RPC semantics
・・・・
Func. Handle
・・・・
Server
・・・・
Client
Compuer
・・・・
Server
・・・・
Client
Component
・・・・
Info. Manager
Cindy Zheng, Pragma Grid, 5/30/2006
Remote Executables
Server
Nimrod Development Cycle
Sent to available machines
Prepare Jobs using Portal
Jobs Scheduled Executed Dynamically
Cindy Zheng, Pragma Grid, 5/30/2006
Results displayed
& interpreted
Application Middleware
• Ninf-G <http://ninf.apgrid.org>
–
–
–
–
Support GridRPC model which will be a GGF standard
Integrated to NMI release 8 (first non-US software in NMI)
Ninf roll for Rocks 4.x is also available
On PRAGMA testbed, TDDFT and QM/MD application achieved
long time executions (1 week ~ 50 days runs).
• Nimrod <http://www.csse.monash.edu.au/~davida/nimrod>
– Supports large scale parameter sweeps on Grid infrastructure
• Study the behaviour of some of the output variables against a range
of different input scenarios.
• Computer parameters that optimize model output
• Computations are uncoupled (file transfer)
• Allows robust analysis and more realistic simulations
• Very wide range of applications from quantum chemistry to public
health policy
– Climate experiment ran some 90 different scenarios of 6 weeks
each
Cindy Zheng, Pragma Grid, 5/30/2006
Fault-Tolerance Enhanced
• Ninf-G monitors each RPC call
– Return error code for failures
• Explicit Faults : Server down, Disconnection of network
• Implicit Faults : Jobs not activated, unknown faults
• Timeout - grpc_wait*()
– Retry/restart
• Nimrod/G monitors remote services and restarts failed jobs
– Long jobs are split into many sequentially dependent jobs which can
be restarted
• using sequential parameters called seqameters
• Improvement in the routine-basis experiment
– developers test code on heterogeneous global grid
– results guide developers to improve detection and handle faults
Cindy Zheng, Pragma Grid, 5/30/2006
Application Setup and Resource Management
• Heterogeneous platforms
– Manual build, deploy applications, manage resources
• Labor intensive, time consuming, tidious
• Middleware solutions
– For deployment
• Automatic distribution of executables use staging functions
– For resource management
• Ninf-G client configuration allow description of server attributes
– Port number of the Globus gatekeeper
– Local scheduler type
– Queue name for submitting jobs
– Protocol for data transfer
– Library path for dynamic linking
• Nimrod/G portal allows a user to generate a testbed and helps
maintain information about resources, including use of different
certificates.
Cindy Zheng, Pragma Grid, 5/30/2006
Gfarm – Grid Virtual File System
http://datafarm.apgrid.org/
-
High transfer rate (parallel transfer, localization)
Scalable
File replication – user/application setup, fault tolerance
Support Linux, Solaris; also scp, gridftp, SMB
POSIX compliant
Gfarm-FUSE
6 sites, 3786 GBytes, 1527 MB/sec (70 I/O nodes)
Cindy Zheng, Pragma Grid, 5/30/2006
Application Benefit
• No modification required
– Existing legacy application can access files in Gfarm
file system without any modification
• Easy application deployment
– Install Application in Gfarm file system, run
everywhere
• It supports binary execution and shared library loading
• Different kinds of binaries can be stored at the same
pathname, which will be automatically selected depending on
client architecture
• Fault tolerance
– Automatic selection of file replicas in access time
tolerates disk and network failure
• File sharing – Community Software Area
Cindy Zheng, Pragma Grid, 5/30/2006
Performance Enhancements
Directory listing of 16,393 files
Performance for small files
– Improve meta-cache
management
– add meta-cache server
Cindy Zheng, Pragma Grid, 5/30/2006
Original
Improved
metadata
management
W/ metadata
cache server
44.0
3.54
1.69
SCMSWeb
http://www.opensce.org/components/SCMSWeb
• Web-based monitoring system for clusters and grid
– System usage
– Performance metrics
• Reliability
– Grid service monitoring
– Spot problems at a glance
Cindy Zheng, Pragma Grid, 5/30/2006
PRAGMA-Driven Development
• Heterogeneity
– Add platform support
• Solaris (CICESE, Mexico)
• IA64 (CNIC, China)
• Software deployment
– NPACI Rocks Roll
• Support ROCKS 3.3.0 – 4.1
– Native Linux RPM for various Linux platform
• Enhancement
– Hierarchical monitoring on large scale Grid
– Compress data exchange between Grid side
• For some site with slow network
– Better and cleaner graphics user interfaces
• Standardize & more collaboration
– GRMAP (Grid Resource Management & Account Project)
– Collaboration between NTU and TNGC
– GIN (Grid Interoperation Now) Monitoring – standardize
data exchange between monitoring softwares
Cindy Zheng, Pragma Grid, 5/30/2006
Multi-organisation Grid Accounting System
http://ntu-cg.ntu.edu.sg/pragma
Cindy Zheng, Pragma Grid, 5/30/2006
MOGAS Web information
Information for grid resource managers/administrators:
– Resource usage based on organization
– Daily, weekly, monthly, yearly records
– Resource usage based on project/individual/organisation
– Individual log of jobs
– Metering and charging tool, can decide a pricing system, e.g.
Price = f(hardware specifications, software license, usage
measurement)
Cindy Zheng, Pragma Grid, 5/30/2006
PRAGMA MOGAS status
(27/3/2006)
KISTI, Korea
NCSA, USA
AIST, Japan
CNIC, China
TITECH, Japan
UoHyd, India
NCHC, Taiwan
SDSC, USA
CICESE, Mexico
ASCC, Taiwan
KU, Thailand
IOIT-HCM
MIMOS
UNAM, Mexico
USM, Malaysia
BII, Singapore
QUT
NGO, Singapore
UChile, Chile
MU, Australia
Cindy Zheng, GGF13,
Pragma3/14/05
Grid, 5/30/2006
modified by A/Prof. Bu-Sung Lee
GT4
GT2
Integrations and Collaborations
• Naregi-CA (AIST, Japan) and
Gama (SDSC, USA) Integration
• Rocks (SDSC, USA) and SCE
(KU, Thailand), Ninf-G (AIST),
Gfarm (AIST), KISTI etc.
• PRAGMA and NLANR
• PRAGMA and GEON
–
–
–
–
–
PRAGMA grid testbed
UMC, SDSC (USA)
GSCAS, CNIC (China)
UoHyd (India)
AIST (Japan)
• PRAGMA and sensor networks
– PRAGMA grid testbed
– NCHC, Taiwan
– Binghamton University, NY, USA
Cindy Zheng, Pragma Grid, 5/30/2006
GAMA
Grid Interoperation Now (GIN)
• GIN testbed (started Feb. 2006)
– PRAGMA
– TeraGrid
– EGEE
• Fist application: TDDFT/Ninf-G
– Lead: Yoshio Tanaka, Yusuke Tanimura (AIST)
– Deployed and run
• PRAGMA - AIST, NCSA, SDSC
• TeraGrid – ANL
– Working on deployment to EGEE – LCG
• Middleware interoperability problem
– Assumptions by middleware about local architecture
– Standard protocol
Cindy Zheng, Pragma Grid, 5/30/2006
Lessons Learned, Issues and Work (1)
• Authentication
– User obtain initial access
• Process documented by Cindy Zheng, http://pragmagoc.rocksclusters.org/gin/gin-egee.htm
• Not easy, not simple
• Need documentation to guide users
• Develop software to simply the process
– DN incompatibility
• Summarized by Oscar Koeroo, http://goc.pragmagrid.net/gin/Cert-probs-GIN.pdf
• Commented by Charles Bacon (Globus),
http://goc.pragma-grid.net/gin/DN_Charles-Bacon.htm
• Need both standard and flexibility
• Voms server is modified to handle both styles of DN
strings
Cindy Zheng, Pragma Grid, 5/30/2006
Lessons Learned, Issues and Work (2)
• Software stack and Community
Software Area (CSA)
– Software stack is different among grids.
Problems with conflicting requirements.
• CSA as a solution for users to deploy their substack and share installed software
– Near term - work on CSA within each grid
• Gfarm-FUSE
– Need focused discussion on solution for
GIN
Cindy Zheng, Pragma Grid, 5/30/2006
Lessons Learned, Issues and Work (3)
• Cross-grid monitoring
– Summary by Somsak Sriprayoonsakul,
http://goc.pragma-grid.net/gin/ginmonitor.htm
• Get some monitoring software
together, develop a common schema
– Wiki - http://wiki.pragmagrid.net/index.php?title=GIN_%28Grid_I
nter-operation_Now%29_Monitoring
Cindy Zheng, Pragma Grid, 5/30/2006
Lessons Summary
http://goc.pragma-grid.net/applications/tddft/Lessons.htm
• Problems and solutions
–
–
–
–
–
–
–
–
–
Information sharing (pragma-goc)
Trust and access (Naregi-CA, GAMA, myproxy)
Resource requirements (INCA)
User/application environment (Gfarm)
Job submission (Portal/service/middleware)
System/job monitoring (SCMSWeb, +)
Network monitoring (APAN, NLANR)
Resource/job accounting (SCMSWeb, NTU)
Fault tolerance (Ninf-G, Nimrod)
• Publications
– Infrastructure, applications, software integration,
organization
Cindy Zheng, Pragma Grid, 5/30/2006
Pointers
• PRAGMA: http://www.pragma-grid.net
• PRAGMA Testbed: http://goc.pragma-grid.net
• “PRAGMA: Example of Grass-Roots Grid
Promoting Collaborative e-science Teams.
CTWatch. Vol 2, No. 1 Feb 2006
• “The PRAGMA testbed – Building a Multiapplication International Grid”, CCGrid2006
• “Deploying Scientific Applications to the
PRAGMA Grid Testbed: Strategies and
Lessons”, CCGrid2006
• MOGAS: “Analysis of Job in a MultiOrganizational Grid Test-bed”, CCGrid2006
Thank You
Cindy Zheng, Pragma Grid, 5/30/2006