Leveraging HTC for UK eScience with Very Large power into results.

Download Report

Transcript Leveraging HTC for UK eScience with Very Large power into results.

Environment from the Molecular Level
A NERC eScience testbed project
Leveraging HTC for UK eScience with Very Large
Condor Pools: Demand for transforming untapped
power into results.
Paul Wilson1, John Brodholt1, and Wolfgang Emmerich2.
1. Department of Earth Sciences, University College London, Gower Street,
London WC1E 6BT, UK
2. Department of Computer Science, University College London, Gower Street,
London WC1E 6BT, UK
Environment from the Molecular Level
A NERC eScience testbed project
This talk: Part 1
1. The eMinerals problem area
2. The Computational job-types this generates
3. How Condor can help to sort these jobs out
4. What we gain from Condor and where to go next
5. UK Institutional Condor programmes and the road ahead.
This talk: Part 2
1. Condor’s additional features and how we use them.
2. The eMinerals mini grid.
3. Conclusion.
Environment from the Molecular Level
A NERC eScience testbed project
THE PROBLEM AREA.
1. Simulation of pollutants in the environment
Binding of heavy metals and organic molecules in soils.
2. Studies of materials for long-term nuclear waste encapsulation
Radiocactive waste leaching through ceramic storage media.
3. Studies of weathering and scaling
Mineral/water interface simulations, e.g oil well scaling.
Codes relying on empirical descriptions of interatomic forces:
DL-POLY - molecular dynamics simulations
GULP – lattice energy/lattice dynamics simulations
METADISE – interface simulations
Codes using a quantum mechanical description of interactions between atoms:
CRYSTAL – Hartree-Fock implementation.
SIESTA – Density Function Theory, numerical basis sets to describe electronic wave
function.
ABINIT - DFT, plane wave descriptions of electronic wave functions
WHAT TYPE OF JOBS WILL THESE PROBLEMS BE MANIFESTED AS?
Environment from the Molecular Level
A NERC eScience testbed project
2 TYPES OF JOB:
1) High to mid performance:
Requiring powerful resources, potential process intercommunication, long
execution times, CPU and memory intensive.
2) Low performance/high throughput:
Requiring access to many hundreds or thousands of PC-level CPU’s. No
process intercommunication, short execution times, low memory usage.
WHERE CAN WE GET THE POWER?
TYPE 1 JOB:
Masses of UK HPC resources around- it seems that UK grid resources are largely
HPC!
TYPE 2 JOB:
????????
THERE HAS GOT TO BE A BETTER WAY TO OPTIMISE TYPE 2 JOBS!
Environment from the Molecular Level
A NERC eScience testbed project
…AND THERE IS: WE USE WHAT’S ALREADY THERE:
930 win2K PC’s (1GHz P3, 256/512Mb Ram, 1Gbit e-net.) clustered in 30 student
cluster rooms across every department on the UCL campus, with the potential to scale
up to ~3000 PC’s.
These machines waste 95% of their CPU cycles 24/7:
A MASSIVE UNTAPPED RESOURCE- A COUP FOR eMINERALS!
This is where Condor enters the scene.
THE ONLY AVAILABLE FREE, OFF-THE-SHELF
RESOURCE MANAGEMENT AND JOB BROKER FOR
WINDOWS:
Install Condor on our clusters, and we harness 95% of the
power of 930+ machines 24 hours a day, without spending
any money.
Is it really this simple?
Environment from the Molecular Level
A NERC eScience testbed project
YES! It has surpassed all expectations, with diverse current use and ever-rising demand.
- 15 smiley happy people ( our current group of users, and increasing monthly.):
eMinerals project, eMaterials project, UCL Computer Science, UCL medical school, University of
Marburg, Universities of Bath and Cambridge, Birkbeck College, The Royal Institution…
- Over 1000,000 hours of work completed in 6 months (105 CPU-years equivalent and counting)
- Codes migrated to Windows representing huge variety:
environmental molecular work (all eMinerals codes!), materials polymorph prediction, financial
derivatives research, quantum mechanical codes, climatic research, medical image realisation…
NUMBER 1 METRIC FOR SUCCESS: Users love it.
• simple to use, doesn’t break and they can forget about their jobs.
NUMBER 2 METRIC FOR SUCCESS: UCL admin love it.
• 100% utilisation levels 24/7on the entire cluster network with no drop in performance and
negligible costs satisfies our dyed-in-the-wool, naturally paranoid, sys admin.
NUMBER 3 METRIC FOR SUCCESS: eMinerals developers love it:
• fast deployment, tweakable, can build on top of it, low admin, integratable with globus, great
metadata, great free support, great workflow capabilities, Condor-G.
NUMBER 4 METRIC FOR SUCCESS: eScience loves it.
• Other institutions are following our example, interest is high.
Environment from the Molecular Level
A NERC eScience testbed project
WHAT IS MOST IMPORTANT?
Condor ENABLES any scientist to do their work
in a way they previously dreamed about:
Beginning to make real the ability to match
unbounded science with unbounded resources.
One million Condor
nodes in a hollowed
out volcano!
Mwahahaha…
Condor has slashed time-to-results from years to
weeksScientists using our Condor resource have
Redefined their ability to achieve their goals.
Condor has organised resources at many levels:
•Desktop- June 2002 (2 nodes)
•Cluster- Sept 2002 (18 nodes)
•Department – Jan 2003 (150 nodes)
•Campus – October 16th 2003 (930 nodes)
•WHERE NEXT- (?????? nodes, ???? Pools)…
…Regional and
national Condor
resources are next…
Environment from the Molecular Level
A NERC eScience testbed project
…Regional and national Condor resources continued.
Many UK institutions have small/medium Condor pools. Some- Soton, Imperial,
Cardiff, Cambridge have large and expanding pools.
Many UK institutions have resources wasting millions of CPU cycles.
We have proved the usefulness of large Windows Condor resources.
Assurances regarding security, authorisation, authentication, access and reliable
job execution are essential to the take up of Condor on this scale in the UK
Many potential resources are Windows, which complicates matters (for example,
poor GSI port to Windows and lack of Windows check-pointing.)
With education, awareness, support and a core group to lead the way, UK
institutions can form a national-level Condor infrastructure leveraging HTC
resources for scientists within UK eScience.
It hasn’t all been plain sailing though…
Environment from the Molecular Level
A NERC eScience testbed project
Issues with Very Large Condor Installations.
• Political – the biggest problem.
– resistance to change, ownership.
• Technical – usually surmountable.
– networks, deployment, admin, load.
• Policy – changes to I.S usage.
– new usage, which is primary use?
• Security – trust or certificate based.
– trust easy and works. Certs a pain.
Environment from the Molecular Level
A NERC eScience testbed project
5) The latest from the Condor pool…
UCL Condor job time fluctuations.
Dashed line shows 5 hr recommended maximum job time.
18.00
15.26
av. job times, hours
16.00
13.28
14.00
12.00
9.73
10.00
8.00
6.73
4.93
6.00
4.00
2.34
2.23
2.00
0.00
Oct 2OO3
Nov 2OO3 Dec 2OO3
Jan 2OO4
month
Feb 2OO4
Mar 2OO4
Apr2OO4
Environment from the Molecular Level
A NERC eScience testbed project
TABLE OF JOB AND USER STATISTICS FOR THE UCL CONDOR POOL, 16th October 2003 to 27th April 2004.
USERS:
MONTH
Matt
Oct 2OO3
0
Nov 2OO3
0
Dec 2OO3
0
Jan 2OO4
0
Feb 2OO4
0
Mar 2OO4
7
Apr2OO4
0
TOTALS
7
AVERAGES
1
% POOL USE
0.01
Vinay
0
0
0
0
17
1
18
3
0.015
Paul
Mark C
130
0
6
0
6
0
0
0
2
202
8
0
0
0
152
202
22
29
0.123
0.164
Zhimei
35
122
81
23
5
20
50
336
48
0.273
Total
Total hours per users per
month
month
6651.20
5
54342.26
7
361781.10
5
375058.30
5
185563.60
7
10453.23
7
57515.68
5
1051365.37
150195.05
5.86
Arnaud
90
144
4
240
20
0
0
498
71
0.404
JonW
0
0
0
558
0
0
0
558
80
0.453
Andrew Charaka Maria
659
0
72
0
0
27
0
0
139
0
0
0
0
0
32
0
259
1309
0
649
807
659
908
2386
94
130
341
0.535
0.737
1.938
Seb
0
3962
0
0
0
0
0
3962
566
3.218
James
0
10
0
442
8692
0
0
9144
1306
7.427
MONTH
Oct 2OO3
Nov 2OO3
Dec 2OO3
Jan 2OO4
Feb 2OO4
Mar 2OO4
Apr2OO4
TOTALS
AVERAGES
Total
Average jobs per
job time month
6.73
989
2.23 24391
15.26 23716
13.28 28252
9.73 19082
4.93
2121
2.34 24573
123124
7.78
17589
NOTE #1
NOTE #2
NOTE #3
Period of study is October 16th 2003 to 27th April 2004, (175 days)
April 2004 figures incomplete, 2 new users begain job submission on April 28th. 2 further users are starting in May.
Period February 27th 2004 and March 17th 2004 has NO DATA. Approximately 25,000 jobs were run during this time.
av. jobs per
month
17589
av. jobs per day
578
av. jobs per hour
24
av. Hours per month150195
av. Hours per day
4938
av. Hours per hour
206
Sam
0
20114
23482
26986
10126
499
23122
104329
14904
84.735
Environment from the Molecular Level
A NERC eScience testbed project
2) Latest UK Condor research: FC-UK…
UCL, Cambridge and the Condor team at Wisconsin-Madison:
Microsoft-funded (50%) 1 year project to develop web-services based
Condor scheduler and administrative interfaces on the eMinerals minigrid and using Microsoft .NET.
This may extend into WS-RF (grid standard?) if it appears.
This is a fully integrated Condor project, and will form part of future
releases.
Who? Me, Clovis, Wolfgang Emmerich (UCL)
Martin Dove, Mark Calleja (Cams)
Miron Livny, Todd Tanenbaum and Matt Farrellee (Condor)
and all you prolific users!
Environment from the Molecular Level
A NERC eScience testbed project
3) Where next, given the lack of volcanoes?
UK e-Science to lead in Condor-based HTC. Here’s the idea…
1.
UCL host the UK Condor download mirror (imminent)
2.
UK Condor support network working through the new Grid Operation
Centre (Discussions with UK Grid Exec and GOC current)
3.
UK Condor working group to develop an National HTC Condor
Service, and formalise long term Condor integration across the UK.
4.
UCL to integrate W-S Condor into existing infrastructure: more
choice…
5.
UCL kicked this all off by proposing and co-leading the inaugural UK
Condor Week 2004…
Environment from the Molecular Level
A NERC eScience testbed project
4) UK Condor Week 2004. Jolly exciting it is too.
October 11th to 15th 2004, National eScience Centre, Edinburgh.
Anyone with an interest in Condor, creating HTC resources and the
future of UK eScience: Project members, leaders, scientists,
Institutional I.S leaders and administrators, eScience decision makers
and leaders.
Fully endorsed and encouraged by the Condor team, who will attend along with
Miron Livny (Condor Godfather and a top bloke) and give two days of tutorials,
hands-on sessions, Q & A, demos of new technology.
3 days will be discussions, breakout sessions etc with the aim of formalising a
Condor/HTC roadmap for the short and near term for the UK, and agreeing on a
group of people to actually do the work.
See www.nesc.ac.uk for details.
Environment from the Molecular Level
A NERC eScience testbed project
…AND FINALLY. THE MILLION DOLLAR QUESTION?
When was the millionth recorded hour of work completed?
DATE: April 2nd 2004…
HOUR: ~09.03AM…
JOB: 1735.441…
JOB LENGTH: 23hrs 41 minutes…
WHO GETS THE GLORY?
DR SAM FRENCH, e-Materials Project, R.i.
A.K.A: ‘The Poolmeister’
Environment from the Molecular Level
A NERC eScience testbed project
Summary.
Condor has enabled eMinerals scientists and their UK
colleagues to perform their science:
1.
2.
3.
4.
5.
6.
7.
8.
in significantly new ways,
on previously un-tapped resources,
on previously unutilised operating systems,
in weeks rather than years,
in an integrated, heterogeneous, grid-enabled
environment.
easily, painlessly and for no cost.
with equal importance given to data handling.
using out-of-the-box tools.
Environment from the Molecular Level
A NERC eScience testbed project
Conclusion: THIS MUST CONTINUE!
Condor has an important part to play in the UK eScience
programme:
1.
2.
3.
4.
Through meeting the increasing demands from users for large scale,
accessible Condor-enabled HTC resources.
Through harnessing the significant volumes of existing, underutilised, heterogeneous UK institutional hardware.
Through providing functionality to facilitate secure accessibility to
heterogeneous compute and data resources.
Through engaging with the UK eScience programme within
Condor’s grid/web service and standardisation developments.
Elvis from the Molecular Level
A NERC eScience testbed project
Uhhh
thankyouverymuch.
You’re beautiful.
eMinerals project
http://www.eminerals.org