What’s new? - University of Cambridge

Download Report

Transcript What’s new? - University of Cambridge

The Grid and enabling
applications for it
CCPN/TEMBLOR Workshop, Hinxton, 19th May 2004
Mark Hayes, Technical Director,
Cambridge eScience Centre
In the beginning…
"The collection of people, hardware, and software...
will become a node in a geographically distributed
computer network…. Through the network... all the large
computers can communicate with one another. And through
them, all the members of the community can communicate
with other people, with programs, with data, or with a selected
combination of those resources.”
J.C.R.Licklider, “The Computer as a Communication Device”
Science and Technology, April 1968
The ARPAnet in 1970
International connectivity - 1991
International connectivity - 1997
International bandwidth
From “3D geographic network displays” - Cox et al, ACM Sigmod Record - December 1996
What does the Internet look like?
http://www.cybergeography.org/
The World Wide Web
Invented at CERN by
Tim Berners-Lee in
1989 as a tool for
collaboration and
information sharing in
the particle physics
community.
The Grid - 1998
Editors: Foster & Kesselman
700 pages
22 chapters
40 authors
“A computational Grid is a hardware
and software infrastructure that provides
dependable, consistent, pervasive and
inexpensive access to high-end
computational capabilities.”
Analogy with the electrical
power grid - just plug in.
The Grid - 2003
Editors: Berman, Hey, Fox
1000 pages
43 chapters
116 authors
Applications, data sharing and
virtual communities.
4 types of Grid
• CPU intensive cycle scavenging (SETI@home)
• Data sharing
• Application provision
• Human-human interaction (e.g. Access Grid)
Early distributed computing
1.2 million CPU years so far...
Brute force attempt to crack strong encryption
Protein folding
It’s not just compute cycles...
An exponential growth in data from many areas of science.
The data explosion - some big numbers
• CFD turbulence simulations - 100TB
• BaBar particle physics experiment - 1TB/day
• CERN LHC will generate 1GB/s or 10PB/year
• VLBA radio telescope generates 1GB/s today
• NCBI/EMBL database is “only 0.5TB” but doubling each year
• brain imaging - 4TB/brain at full colour, 10mm resolution
(4PB/brain at 1mm i.e. cellular resolution)
• Pixar - 100TB/movie
FTP and GREP are not adequate (Jim Gray)
Application provision
• Google - 10K cpus, 2PB database (2 years ago)
• free email services - HotMail, Yahoo! 2-10PB storage
• netsolve - numerical algorithms on demand
with Matlab & Mathematica plugins
• renderfarm.net - graphics rendering on demand
The Access Grid
High end video conferencing
and collaboration technology.
O(100) nodes world wide.
Presenter
mic
Presenter
camera
Ambient mic
(tabletop)
Audience camera
“...one of the most compelling glimpses into the future I’ve seen since I first saw NCSA Mosaic.”
Larry Smarr
The Grid in the UK
Pilot projects in particle physics,
astronomy, medicine, bioinformatics,
environmental sciences...
Contributing to international
Grid software development efforts
10 regional “eScience Centres”
Some UK Grid resources
• Daresbury - loki - 64 proc Alpha cluster
• Manchester - green - 512 proc SGI Origin 3800
• Imperial - saturn - large SMP Sun
• Southampton - iridis - 400 proc.Intel Linux cluster
• Rutherford Appleton Lab - hrothgar - 32 proc Intel Linux
• Cambridge - herschel - 32 proc Intel Linux cluster
• ...
• coming soon: 4x >64 CPU JISC clusters, HPC(X)
Applications on the UK Grid
Ion diffusion through radiation damaged crystal structures
(Mark Calleja, Earth Sciences, Cambridge)
• Monte Carlo simulation
lots of independent runs
• small input & output
• more CPU -> higher
temperatures, better stats
• access to ~100 CPUs
on the UK Grid
• Condor-G client tool
for farming out jobs
Applications on the UK Grid
Reality Grid (Stephen Pickles, Robin Pinning - Manchester)
• Fluid dynamics of complex mixtures, e.g
oil, water and solid particles (mud)
• Used CPU at London, Cambridge
• Remote visualisation using SGI
Onyx in Manchester (from a laptop
in Sheffield)
• Computational steering
Applications on the UK Grid
GENIE - Grid Enabled Integrated Earth system model
(Steven Newhouse, Murtaza Gulamali - Imperial)
• Ocean-atmosphere modelling
• How does moisture transport from the
atmosphere effect ocean circulation?
• ~1000 independent 4000year runs
(3 days real time!) on ~200 CPUs
• Flocked condor pools at London &
Southampton
• Coupled modelling
£1 buys...
•
•
•
•
•
•
•
1 day of cpu time
4 GB ram for a day
1 GB of network bandwidth
1 GB of disk storage
10 M database accesses
10 TB of disk access (sequential)
10 TB of LAN bandwidth (bulk)
How do you move a terabyte?
Source: Terascale SneaketNet, Jim Gray et al
Context
Speed
Mbps
Rent
$/month
$/Mbps
$/TB
Sent
Time/TB
Home phone
0.04
40
1,000
3,086
6 years
Home DSL
0.6
70
117
360
5 months
T1
1.5
1,200
800
2,469
2 months
T3
43
28,000
651
2,010
2 days
OC3
155
49,000
316
976
14 hours
OC 192
9600
1,920,000
200
617
14 minutes
100 Mpbs
100
1 day
Gbps
1000
2.2 hours
Some consequences
Compute cycles are (almost) free...
by comparison with network costs.
-The cheapest and fastest way to move 1TB of
data out from CERN is still by FedEx.
Though this considers only bandwidth,
low latency networks are even more expensive!
(MPI over WAN doesn’t work well.)
What makes a good Grid application?
A distributed community of users.
Tiny network input & output, huge compute
requirement.
Database access & storage is also expensive,
therefore put the computation near the data.
Web services
A web service is a network-accessible application
• identified by a URI
e.g.
http://terraservice.net/TerraService.asmx?op=GetTile
• with an interface defined in terms of XML based messages
• these messages transported by internet protocols (usually
HTTP)
• The application & its interface definition should be
‘discoverable’ by other applications
independent of OS platform & programming language.
W3C standards body: http://www.w3c.org/
Acronym soup
XML - eXtensible Markup Language
XSLT - eXtensible Stylesheet Language Transformations
SOAP - Simple Object Access Protocol
WSDL- Web Service Description Language
UDDI - Universal Description, Discovery & Integration protocol
BPEL - Business Process Execution Language
WSIF - Web Services Invocation Framework
…..
terraservice.net
Web service interface to
http://terraserver.microsoft.com/
Example app: US Department of
Agriculture have a database of
soil properties, ‘federated’ with
terraservice.net to provide
geographical & topographic detail.
Soil Data Navigator Uses TerraServer
Databases available as Web Services
• Google
• Amazon
• SDSS SkyServer
• EMBL
• EBI-MSD
• EBI Open Bibliographic Query Service
• ...
http://www.escience.cam.ac.uk/services/dblist.html
Radar scattering from aircraft
Aim: increase the efficiency of the aircraft design engineering
process & the scale of radar scattering simulations to otherwise
intractable objects (i.e. whole aircraft)
A collaboration between the University of Cambridge
Department of Applied Mathematics & Theoretical Physics
(DAMTP) and the BAE Advanced Technology Centre at Filton.
Mark Spivack (PI), Andrew Usher (visualisation programmer),
Xiaobo Yang (scientific programmer), CC-HPCF, new cluster,
BAE input: expertise, data,...
Workflow
BAE
Cambridge
HPCF
Reflection
data
CAD
Design
Visualisation tools
Based on the Visualisation Toolkit - open source C++ library
cross platform, extendable, large user base - http://www.vtk.org
Surface currents, virtual fly through, looking for “hot-spots”
Increasing efficiency
The calculation can be split into a two stage process:
• Initial long-running, high fidelity calculation of induced surface
currents on the HPCF.
• 3D electromagnetic fields can be calculated on a cluster.
Using an approximation technique currently under development,
subsequent small changes can be re-calculated on the cluster.
In theory, this would allow interactive design of the aircraft without
the need for scheduling long-running jobs.
Tying it all together… with Web Services
Questions?