Craig Stewart Executive Director, Pervasive Technology Institute Associate Dean, Research Technologies, Office of the Vice President for Information Technology.
Download ReportTranscript Craig Stewart Executive Director, Pervasive Technology Institute Associate Dean, Research Technologies, Office of the Vice President for Information Technology.
Craig Stewart Executive Director, Pervasive Technology Institute Associate Dean, Research Technologies, Office of the Vice President for Information Technology License terms • • Please cite as: Stewart, C.A. 2009. The future is cloudy. (Presentation) Technische Universitaet Dresden (Dresden, Germany, 18 Jun 2009). Available from: http://hdl.handle.net/2022/13913 Except where otherwise noted, by inclusion of a source url or some other note, the contents of this presentation are © by the Trustees of Indiana University. This content is released under the Creative Commons Attribution 3.0 Unported license (http://creativecommons.org/licenses/by/3.0/). This license includes the following terms: You are free to share – to copy, distribute and transmit the work and to remix – to adapt the work under the following conditions: attribution – you must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). For any reuse or distribution, you must make clear to others the license terms of this work. 2 2 Outline • • • • • • • Economic turmoil Data production Realities of power and air conditioning Cloud computing New agendas in the US A few thoughts about opportunities More questions than answers….. 3 Economic situation • SGI – Gone, back again, again – bought by Rackable Systems • SiCortex – just gone • Sun Microsystems – bought by Oracle – MySQL and Lustre status • The economic problems are global • Long term problem: effects of economic thrash on human resources 4 Data – 7 digital academic data services Scholarly Life (podcasts, email) Teaching & Learning (courses, OER, etc) Scholarly Data Scholarly Record (Journals, Multi-media) 5 Digital Books & Collections Digitized Film & Completed Art Works Administrative Data, Clinical Service Delivery Public and community data sets getting bigger Data set Size # files Annual rate of growth GenBank 55 GB 99B base pairs, 99M sequences 50% projected PDB ~200 GB 58,236 structures 14-20% projected PubChem ~475 GB 101,301,473 compounds, bioassays, & substances 80% PubMed Central ~262 GB 1,808,934 articles 2-3% LHC 15 PB ODI 54,750 files 164 TB http://www.ncbi.nlm.nih.gov/genbank/genbankstats.html 6 Local production Roche 454 Life Sciences GS FLX Titanium Series Genome Analyzer 125,000 Files/Year 8 TB/Year NimbleGen Hybridization System 4 625 Files/Year 77.5 GB/Year BD Pathways Imager 855 76,800,000 Files/Year 7 TB/Year All images here © manufacturer May not be reused without permission 7 Molecular Devices GenePix Professional 4200A Scanner 432 Files/Year 2 GB/Year 8 All this data, and what for metadata? http://www.duraspace.org/ http://www.escidoc.org/ 9 Realities of power and air conditioning 10 11 IU’s new Data Center 12 Educated guesses on cooling • Water cooled jackets or enclosed cooled cases are best • Nearby by expandable cooling towers needed • More plans than you have money to build All photographs by Crhis Eller, IU 13 Cloud and computing 14 Cloud computing does not exist • Platform as a Service (PaaS) – e.g. GoogleApps, various mail applications • Infrastructure as a Service (IaaS) – e.g. AmazonWeb Services – Amazon Elastic Compute Cloud (EC2) – Amazon Simple Storage Service (S3) $0.10/GB – Amazon Elastic MapReduce – implements Hadoop 15 Open source equivalents http://workspace.globus.org/ clouds/nimbus.html http://www.eucalyptus.com/ 16 Higher latencies Avergae Time (Seconds) 9 Xen configuration for 1-VM per node 8 MPI processes inside the VM 8 LAM 7 OpenMPI Kmeans Clustering 6 5 4 3 2 1 0 Bare-metal 1-VM per node 8-VMs per node • Lack of support for in-node communication => “Sequentilizing” parallel communication • Better support for in-node communication in OpenMPI resulted better performance than LAM-MPI for 1-VM per node configuration • In 8-VMs per node, 1 MPI process per VM configuration, both OpenMPI and LAM-MPI perform equally well 17 Is it safe? Advantages Disadvantages Services and functionality Control Cost savings Privacy Remote facility for disaster recovery Sensitive data regulation compliance (particularly FERPA) Ability to create service level agreements with providers (SLAs) SLAs may not be flexible enough to meet academic institution needs Store and access library and research data collectively for groups as well as for individuals License terms, particularly as regards intellectual property rights Opportunity to take advantage of economies of If an institution ever wanted to “undo” scale in performance, scalability, electrical power outsourcing the services, the institution might use, and cooling find itself without the staff expertise that would make this possible – you could become captive once you outsource Need new skills in your local IT organization/opportunities to redirect staff effort Need new skills in your local IT organization 18 New agendas being set • New president, new director of OSTP (Tom Kalil), new director for OCI (Ed Seidel) • NITRD (Networking and Information Technology Research and Development) Reauthorization – High Performance Computing Act of 1991 – Next Generation Internet Research Act of 1998 – America COMPETES Act of 2007 19 • Cyberinfrastructure consists of computing systems, data storage systems, advanced instruments and data repositories, visualization environments, and people, all linked together by software and high performance networks to improve research productivity and enable breakthroughs not otherwise possible • In a world of made-up, goofy terms … cyberinfrastructure is made-up but not goofy, and likely here to stay for a good while 20 21 TeraGrid HPC user community is growing 4,500 4,000 3,500 3,000 4,277 TeraGrid Users Current Accounts Active Users New Accounts Gateway Users Target 3,702 2,500 2,000 1,807 1,500 1,000 575 500 D ec Fe -03 b A -04 pr Ju -04 A n-04 ug O -04 c D t-04 ec Fe -04 b A -05 pr Ju -05 A n-05 ug O -05 ct D -05 ec Fe -05 b A -06 pr Ju -06 A n-06 ug O -06 c D t-06 ec Fe -06 b A -07 pr Ju -07 n A -07 ug O -07 c D t-07 ec -0 7 0 www.teragrid.org 22 TeraGrid HPC Usage, 2008 3.8B NUs in Q4 2008 In 2008, •Aggregate HPC power increased by 3.5x •NUs requested and awarded quadrupled •NUs delivered Kraken, Aug. 2008 increased by 2.5x Ranger, Feb. 2008 3.9B NUs in 2007 Slide courtesy and © John Towns, NCSA 23 Complexity-hiding interfaces 24 Thoughts on opportunities • No one makes their own lab glassware anymore (usually) • Cloud computing, long haul networks, and data management issues are deeply intertwined • Our challenge may be to figure out how to make best use of IaaS in the future • Some good bit of cyberinfrastructure might look like clouds on the front and and HPC or grids on the back end [remember chinkapin] • In 2009, and for the next 5 years, which will make more difference? Hoechleistungsrechnen fuer immer veniger, oder Hochleistungsrechnen fuer viel mehr? 25 Acknowledgements – Funding sources • • • • • • • • IU’s involvement as a TeraGrid Resource Partner is supported in part by the National Science Foundation under Grants No. ACI-0338618l, OCI-0451237, OCI-0535258, and OCI0504075. The IU Data Capacitor is supported in part by the National Science Foundation under Grant No. CNS-0521433. This research was supported in part by the Indiana METACyt Initiative and part by the Pervasive Technology Institute. The Indiana METACyt Initiative and the Pervasive Technology Institute of Indiana University is supported in part by Lilly Endowment, Inc. The LEAD portal is developed under the leadership of IU Professors Dr. Dennis Gannon and Dr. Beth Plale, and supported by NSF grant 331480. Many of the ideas presented in this talk were developed under a Fulbright Senior Scholar’s award to Stewart, funded by the US Department of State and the Technische Universitaet Dresden. Some of the ideas presented here were developed by CASC members (www.casc.org), members of the EDUCAUSE Campus Cyberinfrastructure Committee, and participants in various workshops. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation (NSF), National Institutes of Health (NIH), Lilly Endowment, Inc., or any other funding agency. This talk was developed during June 2009 while Stewart was a guest at ZIH, Technishe Universitaet Dresden, as part of the collaborative relationship between IU and TU-D. I appreciate the financial support from TU-D and the generosity of my hosts, Dr. Wolfgang Nagel and Matthias Mueller. 26 Acknowledgements - People • • • • • • • • • • • • • Malinda Lingwall: editing, graphic layout, and managing process Steve Simms: several of the graphics John Morris (www.editide.us): graphics re power Marlon Pierce: research and slides on VM performance Matt Link: Magic 8-ball graphic Richard Repasky, Dale Lantrip, Scott Michaels: information on instrument data production Niagara Falls photograph from Flickr, from user Maxmaria Guido Jukeland: helpful comments on earlier versions Brad Wheeler: new concept of “a miracle occurs here” based on cartoon by Sidney Harris from 2007 John Towns and Dave Hart (SDSC): TeraGrid utilization slides This work would not have been possible without the dedicated and expert efforts of the staff of the Research Technologies Division of University Information Technology Services, the faculty and staff of the Pervasive Technology Institute, and the staff of UITS generally. IU’s definition of cyberinfrastructure is becoming widely used in the US, is referenced in Wikipedia, and was created as a group effort with particular contributions from Steve Simms Thanks to the faculty and staff with whom we collaborate locally at IU and globally (via the TeraGrid, and especially at Technische Universitaet Dresden) 27 Thanks! • Questions? 28