Transcript What’s new?
“Here comes the Grid” Mark Hayes Technical Director - Cambridge eScience Centre NIEeS Summer School 2003 In the beginning… "The collection of people, hardware, and software... will become a node in a geographically distributed computer network…. Through the network... all the large computers can communicate with one another. And through them, all the members of the community can communicate with other people, with programs, with data, or with a selected combination of those resources.” J.C.R.Licklider, “The Computer as a Communication Device” Science and Technology, April 1968 The ARPAnet in 1970 International connectivity - 1991 International connectivity - 1997 International bandwidth From “3D geographic network displays” - Cox et al, ACM Sigmod Record - December 1996 What does the Internet look like? http://www.cybergeography.org/ The World Wide Web Invented at CERN by Tim Berners-Lee in 1989 as a tool for collaboration and information sharing in the particle physics community. Early distributed computing 1.2 million CPU years so far... Brute force attempt to crack strong encryption Protein folding The Grid - 1998 Editors: Foster & Kesselman 700 pages 22 chapters 40 authors Analogy with the electrical power grid - just plug in. The Grid - 2003 Editors: Berman, Hey, Fox 1000 pages 43 chapters 116 authors Applications, data sharing and virtual communities. It’s not just compute cycles... An exponential growth in data from many areas of science. 4 types of Grid • CPU intensive cycle scavenging (SETI@home) • Data sharing • Application provision • Human-human interaction (e.g. Access Grid) SETI@home The world’s most powerful computer delivered 52 Teraflops/second yesterday (Earth Simulator is 35 Tflop/s, sum of top 2-10 is 60Tflop/s) Latest Stats http://setiathome.ssl.berkeley.edu/totals.html 6th July 2003 Total Last 24 Hours Users 4,570,474 1,226 Results received 944 M 1.1 M Total CPU time 1.5 M years 1,226 years Floating Point Operations 3 E+21 ops 3 zeta ops 4.5 E+18 flops/day 52 Teraflops/s The data explosion - some big numbers • CFD turbulence simulations - 100TB • BaBar particle physics experiment - 1TB/day • CERN LHC will generate 1GB/s or 10PB/year • VLBA radio telescope generates 1GB/s today • NCBI/EMBL database is “only 0.5TB” but doubling each year • brain imaging - 4TB/brain at full colour, 10mm resolution (4PB/brain at 1mm i.e. cellular resolution) • Pixar - 100TB/movie FTP and GREP are not adequate (Jim Gray) Application provision • Google - 10K cpus, 2PB database (2 years ago) • free email services - HotMail, Yahoo! 2-10PB storage • netsolve - numerical algorithms on demand with Matlab & Mathematica plugins • renderfarm.net - graphics rendering on demand The Access Grid High end video conferencing and collaboration technology. O(100) nodes world wide. Presenter mic Presenter camera Ambient mic (tabletop) Audience camera “...one of the most compelling glimpses into the future I’ve seen since I first saw NCSA Mosaic.” Larry Smarr £1 buys... • • • • • • • 1 day of cpu time 4 GB ram for a day 1 GB of network bandwidth 1 GB of disk storage 10 M database accesses 10 TB of disk access (sequential) 10 TB of LAN bandwidth (bulk) How do you move a terabyte? Source: Terascale SneaketNet, Jim Gray et al Context Speed Mbps Rent $/month $/Mbps $/TB Sent Time/TB Home phone 0.04 40 1,000 3,086 6 years Home DSL 0.6 70 117 360 5 months T1 1.5 1,200 800 2,469 2 months T3 43 28,000 651 2,010 2 days OC3 155 49,000 316 976 14 hours OC 192 9600 1,920,000 200 617 14 minutes 100 Mpbs 100 1 day Gbps 1000 2.2 hours Some consequences Compute cycles are (almost) free... by comparison with network costs. -The cheapest and fastest way to move 1TB of data out from CERN is still by FedEx. Though this considers only bandwidth, low latency networks are even more expensive! (MPI over WAN doesn’t work well.) What makes a good Grid application? A distributed community of users. Tiny network input & output, huge compute requirement. Database access & storage is also expensive, therefore put the computation near the data. Questions?