Submitting locally and running globally – The GLOW and OSG Experience

Download Report

Transcript Submitting locally and running globally – The GLOW and OSG Experience

Submitting locally
and running globally –
The GLOW and OSG
Experience
Miron Livny
Computer Sciences Department
University of Wisconsin-Madison
[email protected]
High Throughput Computing
is a
24-7-365
activity
FLOPY  (60*60*24*7*52)*FLOPS
www.cs.wisc.edu/~miron
2
Leads to a
“bottom up”
approach to building
and operating HTC
systems
www.cs.wisc.edu/~miron
3
My jobs should run …
› … on my laptop if it is not connected to the
›
›
›
network
… on my group resources if my certificate
expired
... on my campus resources if the meta
scheduler is down
… on my national resources if the transAtlantic link was cut by a submarine
* Same for “my resources”
www.cs.wisc.edu/~miron
4
www.cs.wisc.edu/~miron
5
“ … We claim that these mechanisms, although
originally developed in the context of a cluster
of workstations, are also applicable to
computational grids. In addition to the required
flexibility of services in these grids, a very
important concern is that the system be robust
enough to run in “production mode” continuously
even in the face of component failures. … “
Miron Livny & Rajesh Raman, "High Throughput Resource
Management", in “The Grid: Blueprint for
a New Computing Infrastructure”.
www.cs.wisc.edu/~miron
6
The
Open Science Grid
(OSG)
Miron Livny - OSG PI & Facility Coordinator,
Computer Sciences Department
University of Wisconsin-Madison
Supported by the Department of Energy Office of Science SciDAC-2 program from the High Energy Physics, Nuclear
Physics and Advanced Software and Computing Research programs, and the National Science Foundation Math and
Physical Sciences, Office of CyberInfrastructure and Office of International Science and Engineering Directorates.
Three Building Blocks
The OSG organization, management
and operation is structured around three
components:
 the Consortium
 the Project
 the Facility
5/23/2016
8
The Evolution of the OSG
LIGO operation
LIGO preparation
LHC construction, preparation
LHC Ops
iVDGL(NSF)
GriPhyN(NSF)
Trillium Grid3
PPDG (DOE)
DOE Science Grid
1999
2000
2001
2002
2003
OSG (DOE+NSF)
(DOE)
2004
2005
2006
2007
2008
2009
European Grid + Worldwide LHC Computing Grid
Campus, regional grids
]
9
The Open Science Grid vision
Transform processing and data
intensive science through a crossdomain self-managed national
distributed cyber-infrastructure that
brings together campus and
community infrastructure and
facilitating the needs of Virtual
Organizations (VO) at all scales
]
10
D0 Data Re-Processing
Total Events
12 sites
contributed
up to 1000
jobs/day
OSG CPUHours/Week
160,000
140,000
120,000
100,000
80,000
60,000
40,000
20,000
0
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23
Week in 2007
]
CIT_CMS_T2
FNAL_GPFARM
MIT_CMS
NERSC-PDSF
OU_OSCER_CONDOR
UCSDT2
USCMS-FNAL-WC1-CE
FNAL_DZEROOSG_2
GLOW
MWT2_IU
OSG_LIGO_PSU
Purdue-RCAC
UFlorida-IHEPA
FNAL_FERMIGRID
GRASE-CCR-U2
Nebraska
OU_OSCER_ATLAS
SPRACE
UFlorida-PG
2M
CPU hours
286M
events
286K
Jobs on
OSG
48TB Input data
22TB Output data
11
OSG Principles
• Characteristics  Provide guaranteed and opportunistic access to shared
resources.

Operate a heterogeneous environment both in services available at any site and for any VO, and
multiple implementations behind common interfaces.
 Interface to Campus and Regional Grids.
 Federate with other national/international Grids.

Support multiple software releases at any one time.
• Drivers  Delivery to the schedule, capacity and capability of LHC and LIGO:
 Contributions to/from and collaboration with the US ATLAS, US CMS,
LIGO software and computing programs.



Support for/collaboration with other physics/non-physics communities.
Partnerships with other Grids - especially EGEE and TeraGrid.
Evolution by deployment of externally developed new services and technologies:.
5/23/2016
12
OSG challenges
• Develop the organizational and management
structure of a consortium that drives such a Cyber
Infrastructure
• Develop the organizational and management
structure for the project that builds, operates and
evolves such Cyber Infrastructure
• Maintain and evolve a software stack capable of
offering powerful and dependable capabilities that
meet the science objectives of the NSF and DOE
scientific communities
• Operate and evolve a dependable and well managed
distributed facility
5/23/2016
13
The OSG Consortium
• > 20 Scientific Virtual Organizations: LHC,
STAR, LIGO, NanoHub etc.
• > 25 Resource Providers: DOE National
Labs, University Facilities etc.
 10 Storage focused resources
• >10 Software Providers (including
External Projects): Condor, Globus, Storage
Resource Manager, Internet2, ESNET,
CEDPS, Fermilab Accounting etc.
• >4 Partners - Ex-Officio: EGEE, TeraGrid,
NWICG, TIGRE, APAC etc.
5/23/2016
14
The OSG Project
• Co-funded by DOE and NSF at an annual
rate of ~$6M for 5 years starting FY-07.
• 16 institutions involved – 4 DOE Labs and 12
universities
• Currently main stakeholders are from physics
- US LHC experiments, LIGO,
STAR experiment, the Tevatron Run II and
Astrophysics experiments
• A mix of DOE-Lab and campus resources
• Active “engagement” effort to add new
domains and resource providers to the OSG
consortium
5/23/2016
15
The Project
• Annual Project Plan (including WBS) gives details
of deliverables and timeline for the year.
 Deliverables driven by the science stakeholders.
 Buy-in through “Science Milestones” deliverables owned by the stakeholders and included in the plan.
 Area Coordinators responsible for milestones and
deliverables under their branch.
 Well defined Software releases part of the plan.
• WBS updated by Area Coordinators quarterly.
Missed milestones subject to discussion.
• Adjust plans based on experience, requests,
feedback and problems.
• Change request process for project deliverables
5/23/2016
16
OSG Facility Organization
•The facility effort is organized in six groups:
 Engagement – Identify and support new
communities (now a NSF CI-Team)
 Integration – Transition the OSG software stack
to deployment
 Operation – Monitor activities and support VOs
and sites
 Security – Define, implement and monitor the
security plan of the OSG
 Software – Evolve, package and support the VDT
 Troubleshooting – Work with sites and VOs to
resolve “problems” in end-to-end functionality
5/23/2016
17
OSG Middleware Evolution
Domain science requirements
Condor, Globus,
Privilege,
EGEE, …
OSG stakeholders and middleware
developer (joint) projects.
Test on “VO specific grid”
Integrate into VDT Release.
Deploy on OSG integration grid
Interoperability testing
Provision in OSG release &
deploy on OSG sites.
5/23/2016
18
How much software?
5/23/2016
19
Building the VDT
• We support 19 platforms (OSG is
heterogeneous!)











Debian 3.1
Fedora Core 4 (x86, x86-64)
RedHat Enterprise Linux 3 (x86, x86-64, ia64)
RedHat Enterprise Linux 4 (x64, x86-64)
RedHat Enterprise Liunux 5 (x86, x8-64)
CentOS 5
Scientific Linux 3
Scientific Linux 4 (x86, x86-64, ia64)
ROCKS Linux 3.3
SUSE Linux 9 (x86-64, IA-64)
Mac OS 10.4
• We build ~40 components on 11 platforms
 We can reuse binaries for some platforms
 Using Metronome system for build and tests
5/23/2016
20
The Three Cornerstones
National
]
Campus
Need to be
harmonized into a well
integrated whole.
Community21
Who are you?
• A resource can be accessed by a user
via the campus, community or national
grid.
• A user can access a resource with a
campus, community or national grid
identity.
5/23/2016
22
OSG is VO Centric
•
OSG brings together many VOs for
•
A Virtual Organization (VO) is a collection of people (VO members),
•
A Site is a collection computing/storage resources (sites) and services
(e.g., databases). and the terms "Site," "Computing Element" ("CE"),
and/or "Storage Element" ("SE") to refer to the resources owned and
operated by a VO or other organization.
 opportunistic sharing of resources in a grid environment
 allowing for more effective use of their collective resources
 Allowing easier use of dedicated/allocated resources that are distributed
 A VO's member structure may include groups, subgroups and/or roles into
which it divides its members according to their responsibilities and tasks,
such that they are accorded appropriate levels of authorization.
 In order to receive the appropriate authorization at another VO's site, a
user's grid job must be able to present an authentication token along with a
token indicating the desired computing privileges.
 Use of the resources at a site is determined by a combination of the site's
local policies and the user VO's policies. VOs are responsible for
contracting individually with each other for guaranteed access to resources.
 Groups the provide software are also known as resource providers
5/23/2016
23
Periodic
VOMRS
Server
VOMS
VOMS
Server
Server
Synchronization
Periodic
Synchronization
Site Wide
Step 3 – user submits their grid job via
globus-job-run, globus-job-submit, or condor-g
GUMS
GUMS
Server
Server
Gratia
Gateway
ReSS
FERMIGRID SE
clusters send ClassAds
to the site wide gateway
(dcache SRM)
SAZ
SAZ
Server
Server
BlueArc
CMS
WC1
April 30, 2008
CMS
WC2
CMS
WC3
CDF CDF CDF
D0
D0
OSG1 OSG2 OSG3/4 CAB1 CAB2
Condor in the Fermilab Grid Facilities
Condor
GP
Grid
24


As data volumes increase, demands for increased data
processing power has grown beyond the ability to
support in a diverse disconnected environment.
FermiGrid is a strategic cross campus grid supporting
the experiments at Fermilab
 11,786 cores amongst the clusters
• 7586 managed by Condor
 Access to several Petabytes (1015) of storage
 O(1000) users
 Provides a common mechanism for supporting the users
while minimizing the expense of service delivery.
April 30, 2008
Condor in the Fermilab Grid Facilities
25
Condor is the underlying batch system on most of the clusters
The Site-Wide Gateway is a natural pairing of Condor-G and a
hierarchical grid deployment
Gateway uses jobmanager-cemon
 Based on jobmanager-condor
 Matches jobs against cluster information via a local condor matchmaking
service (ReSS)
 Forwards to matched sub-cluster
 Enhances scalability
 Fault tolerance under development
Additional Gateways will provide job access to Teragrid resources
Significant scalability work done by Condor team to support usage
patterns
April 30, 2008
Condor in the Fermilab Grid Facilities
26
BoilerGrid
• Purdue Condor Grid (BoilerGrid)
– Comprised of Linux HPC clusters, student labs,
machines from academic department, and Purdue
regional campuses
• 8900 batch slots today..
• 14,000 batch slots in a few weeks
• 2007 - Delivered over 10 million CPU-hours to
high-throughput science to Purdue and national
community through Open Science Grid and
TeraGrid
5/23/2016
27
BoilerGrid - Growth
BoilerGrid Pool Size
14000
12000
10000
Cores
8000
Pool Size
6000
4000
2000
0
2003
2004
2005
2006
2007
2008
2009
Year
5/23/2016
28
BoilerGrid - Results
BoilerGrid - Jobs Completed
12,000,000
BoilerGrid - Unique Users per year
10,000,000
140
8,000,000
120
6,000,000
Jobs
100
BoilerGrid - Hours Delivered
4,000,000
80
Users
12,000,000
2,000,000
Unique Users
60
10,000,000
0
2003
2004
2005
40
2006
2007
2008
Year
8,000,000
20
6,000,000
Hours Delivered
0
2003
2004
2005
20064,000,000
2007
2008
Year
2,000,000
0
2003
2004
2005
2006
2007
2008
Year
5/23/2016
29
Clemson Campus Condor Pool
• Machines in 27 different
locations on Campus
• ~1,700 job slots
• >1.8M hours served in
6 months
• users from Industrial and
Chemical engineering, and
Economics
• Fast ramp up of usage
• Accessible to the OSG
through a gateway
30
UW Madison Campus Grid
• Condor pools in various departments (more than
4000 “cores”), made accessible via Condor
‘flocking’
– Users submit jobs to their own private or department
Condor scheduler.
– Jobs are dynamically matched to available machines.
• Crosses multiple administrative domains.
– No common uid-space across campus.
– No cross-campus NFS for file access.
• Users rely on Condor remote I/O, file-staging, AFS, SRM,
gridftp, etc.
5/23/2016
31
Submitting Jobs within UW
Campus Grid
UW HEP User
HEP matchmaker
CS matchmaker
condor_submit
GLOW
matchmaker
schedd
(Job caretaker)
startd
(Job Executor)
5/23/2016
32
Grid Laboratory of Wisconsin
2003 Initiative funded by NSF(MIR)/UW at $1.5M. Second
phase funded in 2007 by NSF(MIR)/UW at $1.5M.
Six Initial GLOW Sites
• Computational Genomics, Chemistry
• Amanda, Ice-cube, Physics/Space Science
• High Energy Physics/CMS, Physics
• Materials by Design, Chemical Engineering
• Radiation Therapy, Medical Physics
• Computer Science
Diverse users with different deadlines and usage
patterns.
5/23/2016
33
GLOW Evolution
• First machines arrived in 01/04
• First job completed 14 hours after
machines arrived
• Today we have more than 2000 “cores”
• ~50% of these cores funded by nonGLOW money (mainly Capital Exercise
and UW funds)
• One Group (UW-ATLAS) joined in 05
5/23/2016
34
CPU status 05-07
5/23/2016
35
GLOW Usage - between 2004-01-31 and
GLOW
Usage
4/04-11/07
2007-11-08
other
13%
Plasma
Physics
1%
CMPhysics
1%
Atlas
20%
MultiScalar
1%
Over 35M
CPU hours
served!
MedPhysics
4%
LMCG
18%
ChemE
18%
IceCube
5%
5/23/2016
CS
2%
CMS
17%
36
Housing the Machines
• Condominium Style
– centralized computing center
– space, power, cooling, management
– standardized packages
• Neighborhood Association Style
– each group hosts its own machines
– each contributes to administrative effort
– base standards (e.g. Linux & Condor) to make
easy sharing of resources
• GLOW has elements of both, but leans
towards neighborhood style
5/23/2016
37
GLOW Architecture in a
Nutshell
• One big Condor pool
• But backup central manager runs at each site
(Condor HAD service)
• Users submit jobs as members of a group
(e.g. “CMS” or “MedPhysics”)
• Computers at each site give highest priority to
jobs from same group (via machine RANK)
• Jobs run preferentially at the “home” site, but
may run anywhere when machines are
available
5/23/2016
38
Design highlights of HAD
• Modified version of Bully algorithm
– For more details: H. Garcia-Molina. Elections in a Distributed
Computing System., IEEE Trans. on Computers, C31(1):48.59, Jan 1982.
• One HAD leader + many backups
• HAD as a state machine
• “I am alive” messages from leader to backups
– Detection of leader failure
– Detection of multiple leaders (split-brain)
• “I am leader” messages from HAD to replication
5/23/2016
39
The value of the big G
• Our users want to collaborate outside the
bounds of the campus (e.g. Atlas and CMS
are international).
• We also don’t want to be limited to sharing
resources with people who have made
identical technological choices.
• The Open Science Grid (OSG) gives us the
opportunity to operate at both scales, which is
ideal.
5/23/2016
40
Submitting jobs through OSG to
UW Campus Grid
Open Science Grid User
HEP matchmaker
CS matchmaker
Globus gatekeeper
GLOW
matchmaker
condor_submit
schedd
(Job caretaker)
condor
gridmanager
5/23/2016
schedd
(Job caretaker)
startd
(Job Executor)
41
Elevating from GLOW to OSG
Job 1
Job 2
Job 3
Job 4
Job 5
…
Job 4*
Schedd
On The
Side
job queue
Schedd
5/23/2016
42
What is “job routing”?
original (vanilla) job
Universe = “vanilla”
Executable = “sim”
Arguments = “seed=345”
Output = “stdout.345”
Error = “stderr.345”
ShouldTransferFiles = True
WhenToTransferOutput = “ON_EXIT”
routed (grid) job
JobRouter
Routing Table:
Site 1
…
Site 2
…
Universe = “grid”
GridType = “gt2”
GridResource = \
“cmsgrid01.hep.wisc.edu/jobmanager-condor”
Executable = “sim”
Arguments = “seed=345”
Output = “stdout”
Error = “stderr”
ShouldTransferFiles = True
WhenToTransferOutput = “ON_EXIT”
final status
www.cs.wisc.edu/condor
43
Routing is just site-level
matchmaking
› With feedback from job queue
• number of jobs currently routed to site X
• number of idle jobs routed to site X
• rate of recent success/failure at site X
› And with power to modify job ad
• change attribute values (e.g. Universe)
• insert new attributes (e.g. GridResource)
• add a “portal” grid proxy if desired
www.cs.wisc.edu/condor
44
Routing Jobs from
UW Campus Grid to OSG
HEP matchmaker
CS matchmaker
GLOW matchmaker
condor_submit
Grid
JobRouter
schedd
(Job caretaker)
globus
gatekeeper
Combining both worlds:
•simple, feature-rich local mode
•when possible, transform to
grid job for traveling globally
5/23/2016
condor
gridmanager
45
Yes,
there is not much we can do without
CPUs/cores, memory and disks, and
networks of all kinds (cluster,
departmental, campus, state, national
and international ) hold the key to what
we are trying to accomplish. In other
words, we need hardware!
www.cs.wisc.edu/~miron
46
“ … Since the early days of mankind the
primary motivation for the establishment of
communities has been the idea that by being
part of an organized group the capabilities
of an individual are improved. The great
progress in the area of inter-computer
communication led to the development of
means by which stand-alone processing subsystems can be integrated into multicomputer ‘communities’. … “
Miron Livny, “ Study of Load Balancing Algorithms for
Decentralized Distributed Processing Systems.”,
Ph.D thesis, July 1983.
www.cs.wisc.edu/~miron
47
However,
What makes or breaks CI are people
(sociology). For CI to be a
transformative force in scientific
discovery we have to learn how to:
 Collaborate
 Share
 Contribute to end-to-end solutions
 Agree on policies
…
www.cs.wisc.edu/~miron
48
From a grid of
one
to a grid of
many
www.cs.wisc.edu/~miron
49