Craig Stewart Executive Director, Pervasive Technology Institute Associate Dean, Research Technologies, Office of the Vice President for Information Technology.

Download Report

Transcript Craig Stewart Executive Director, Pervasive Technology Institute Associate Dean, Research Technologies, Office of the Vice President for Information Technology.

Craig Stewart
Executive Director, Pervasive Technology Institute
Associate Dean, Research Technologies,
Office of the Vice President for Information Technology
License terms
•
•
Please cite as: Stewart, C.A. 2009. The future is cloudy. (Presentation)
Technische Universitaet Dresden (Dresden, Germany, 18 Jun 2009).
Available from: http://hdl.handle.net/2022/13913
Except where otherwise noted, by inclusion of a source url or some other
note, the contents of this presentation are © by the Trustees of Indiana
University. This content is released under the Creative Commons
Attribution 3.0 Unported license
(http://creativecommons.org/licenses/by/3.0/). This license includes the
following terms: You are free to share – to copy, distribute and transmit the
work and to remix – to adapt the work under the following conditions:
attribution – you must attribute the work in the manner specified by the
author or licensor (but not in any way that suggests that they endorse you
or your use of the work). For any reuse or distribution, you must make
clear to others the license terms of this work.
2
2
Outline
•
•
•
•
•
•
•
Economic turmoil
Data production
Realities of power and air conditioning
Cloud computing
New agendas in the US
A few thoughts about opportunities
More questions than answers…..
3
Economic situation
• SGI – Gone, back again, again – bought by Rackable
Systems
• SiCortex – just gone
• Sun Microsystems – bought by Oracle
– MySQL and Lustre status
• The economic problems are global
• Long term problem: effects of economic thrash on
human resources
4
Data – 7 digital academic data services
Scholarly
Life
(podcasts,
email)
Teaching &
Learning
(courses,
OER, etc)
Scholarly
Data
Scholarly
Record
(Journals,
Multi-media)
5
Digital
Books &
Collections
Digitized Film
& Completed
Art Works
Administrative
Data, Clinical
Service
Delivery
Public and community data sets getting bigger
Data set
Size
# files
Annual rate
of growth
GenBank
55 GB
99B base
pairs, 99M
sequences
50% projected
PDB
~200
GB
58,236
structures
14-20%
projected
PubChem
~475
GB
101,301,473
compounds,
bioassays, &
substances
80%
PubMed
Central
~262
GB
1,808,934
articles
2-3%
LHC
15 PB
ODI
54,750 files
164 TB
http://www.ncbi.nlm.nih.gov/genbank/genbankstats.html
6
Local production
Roche 454 Life Sciences GS FLX
Titanium Series Genome Analyzer
125,000 Files/Year
8 TB/Year
NimbleGen Hybridization System 4
625 Files/Year
77.5 GB/Year
BD Pathways Imager 855
76,800,000 Files/Year
7 TB/Year
All images here © manufacturer
May not be reused without permission
7
Molecular Devices GenePix
Professional 4200A Scanner
432 Files/Year
2 GB/Year
8
All this data, and what for metadata?
http://www.duraspace.org/
http://www.escidoc.org/
9
Realities of power and air conditioning
10
11
IU’s new Data Center
12
Educated guesses on cooling
• Water cooled jackets or enclosed cooled cases are best
• Nearby by expandable cooling towers needed
• More plans than you have money to build
All photographs by Crhis Eller, IU
13
Cloud and computing
14
Cloud computing does not exist
• Platform as a Service (PaaS) – e.g. GoogleApps, various
mail applications
• Infrastructure as a Service (IaaS) – e.g. AmazonWeb
Services
– Amazon Elastic Compute Cloud (EC2)
– Amazon Simple Storage Service (S3) $0.10/GB
– Amazon Elastic MapReduce – implements Hadoop
15
Open source equivalents
http://workspace.globus.org/
clouds/nimbus.html
http://www.eucalyptus.com/
16
Higher latencies
Avergae Time (Seconds)
9
Xen configuration for 1-VM per node
8 MPI processes inside the VM
8
LAM
7
OpenMPI
Kmeans Clustering
6
5
4
3
2
1
0
Bare-metal
1-VM per node
8-VMs per node
• Lack of support for in-node communication => “Sequentilizing”
parallel communication
• Better support for in-node communication in OpenMPI resulted
better performance than LAM-MPI for 1-VM per node configuration
• In 8-VMs per node, 1 MPI process per VM configuration, both
OpenMPI and LAM-MPI perform equally well
17
Is it safe?
Advantages
Disadvantages
Services and functionality
Control
Cost savings
Privacy
Remote facility for disaster recovery
Sensitive data regulation compliance
(particularly FERPA)
Ability to create service level agreements with
providers (SLAs)
SLAs may not be flexible enough to meet
academic institution needs
Store and access library and research data
collectively for groups as well as for individuals
License terms, particularly as regards
intellectual property rights
Opportunity to take advantage of economies of
If an institution ever wanted to “undo”
scale in performance, scalability, electrical power outsourcing the services, the institution might
use, and cooling
find itself without the staff expertise that would
make this possible – you could become captive
once you outsource
Need new skills in your local IT
organization/opportunities to redirect staff effort
Need new skills in your local IT organization
18
New agendas being set
• New president, new director of OSTP (Tom Kalil), new director for
OCI (Ed Seidel)
• NITRD (Networking and Information Technology Research and
Development) Reauthorization
– High Performance Computing Act of 1991
– Next Generation Internet Research Act of 1998
– America COMPETES Act of 2007
19
• Cyberinfrastructure consists of computing systems, data
storage systems, advanced instruments and data
repositories, visualization environments, and people, all
linked together by software and high performance
networks to improve research productivity and enable
breakthroughs not otherwise possible
• In a world of made-up, goofy terms …
cyberinfrastructure is made-up but not goofy, and likely
here to stay for a good while
20
21
TeraGrid HPC user community is growing
4,500
4,000
3,500
3,000
4,277
TeraGrid Users
Current Accounts
Active Users
New Accounts
Gateway Users
Target
3,702
2,500
2,000
1,807
1,500
1,000
575
500
D
ec
Fe -03
b
A -04
pr
Ju -04
A n-04
ug
O -04
c
D t-04
ec
Fe -04
b
A -05
pr
Ju -05
A n-05
ug
O -05
ct
D -05
ec
Fe -05
b
A -06
pr
Ju -06
A n-06
ug
O -06
c
D t-06
ec
Fe -06
b
A -07
pr
Ju -07
n
A -07
ug
O -07
c
D t-07
ec
-0
7
0
www.teragrid.org
22
TeraGrid HPC Usage, 2008
3.8B NUs in Q4 2008
In 2008,
•Aggregate HPC power
increased by 3.5x
•NUs requested and
awarded quadrupled
•NUs delivered
Kraken, Aug. 2008
increased by 2.5x
Ranger, Feb. 2008
3.9B NUs in 2007
Slide courtesy and © John Towns, NCSA
23
Complexity-hiding interfaces
24
Thoughts on opportunities
• No one makes their own lab glassware anymore
(usually)
• Cloud computing, long haul networks, and data
management issues are deeply intertwined
• Our challenge may be to figure out how to make best
use of IaaS in the future
• Some good bit of cyberinfrastructure might look like
clouds on the front and and HPC or grids on the back
end [remember chinkapin]
• In 2009, and for the next 5 years, which will make more
difference? Hoechleistungsrechnen fuer immer veniger,
oder Hochleistungsrechnen fuer viel mehr?
25
Acknowledgements – Funding sources
•
•
•
•
•
•
•
•
IU’s involvement as a TeraGrid Resource Partner is supported in part by the National
Science Foundation under Grants No. ACI-0338618l, OCI-0451237, OCI-0535258, and OCI0504075.
The IU Data Capacitor is supported in part by the National Science Foundation under Grant
No. CNS-0521433.
This research was supported in part by the Indiana METACyt Initiative and part by the
Pervasive Technology Institute. The Indiana METACyt Initiative and the Pervasive
Technology Institute of Indiana University is supported in part by Lilly Endowment, Inc.
The LEAD portal is developed under the leadership of IU Professors Dr. Dennis Gannon and
Dr. Beth Plale, and supported by NSF grant 331480.
Many of the ideas presented in this talk were developed under a Fulbright Senior Scholar’s
award to Stewart, funded by the US Department of State and the Technische Universitaet
Dresden.
Some of the ideas presented here were developed by CASC members (www.casc.org),
members of the EDUCAUSE Campus Cyberinfrastructure Committee, and participants in
various workshops.
Any opinions, findings and conclusions or recommendations expressed in this material are
those of the author and do not necessarily reflect the views of the National Science
Foundation (NSF), National Institutes of Health (NIH), Lilly Endowment, Inc., or any other
funding agency.
This talk was developed during June 2009 while Stewart was a guest at ZIH, Technishe
Universitaet Dresden, as part of the collaborative relationship between IU and TU-D. I
appreciate the financial support from TU-D and the generosity of my hosts, Dr. Wolfgang
Nagel and Matthias Mueller.
26
Acknowledgements - People
•
•
•
•
•
•
•
•
•
•
•
•
•
Malinda Lingwall: editing, graphic layout, and managing process
Steve Simms: several of the graphics
John Morris (www.editide.us): graphics re power
Marlon Pierce: research and slides on VM performance
Matt Link: Magic 8-ball graphic
Richard Repasky, Dale Lantrip, Scott Michaels: information on instrument data
production
Niagara Falls photograph from Flickr, from user Maxmaria
Guido Jukeland: helpful comments on earlier versions
Brad Wheeler: new concept of “a miracle occurs here” based on cartoon by
Sidney Harris from 2007
John Towns and Dave Hart (SDSC): TeraGrid utilization slides
This work would not have been possible without the dedicated and expert efforts
of the staff of the Research Technologies Division of University Information
Technology Services, the faculty and staff of the Pervasive Technology Institute,
and the staff of UITS generally.
IU’s definition of cyberinfrastructure is becoming widely used in the US, is
referenced in Wikipedia, and was created as a group effort with particular
contributions from Steve Simms
Thanks to the faculty and staff with whom we collaborate locally at IU and globally
(via the TeraGrid, and especially at Technische Universitaet Dresden)
27
Thanks!
• Questions?
28