Educational Applications of Supercomputing and Cyberinfrastructure KAUST Economic Development International Symposium at ISC'11, 21 June 2011, Hamburg Supercomputing in Science and Engineering: Economic and Technological.

Download Report

Transcript Educational Applications of Supercomputing and Cyberinfrastructure KAUST Economic Development International Symposium at ISC'11, 21 June 2011, Hamburg Supercomputing in Science and Engineering: Economic and Technological.

Educational Applications of Supercomputing
and Cyberinfrastructure
KAUST Economic Development International Symposium at
ISC'11, 21 June 2011, Hamburg
Supercomputing in Science and Engineering:
Economic and Technological Opportunities and Challenges
Dr. Craig A. Stewart
Associate Dean, Research Technologies
Executive Director, Pervasive Technology Institute
Indiana University
[email protected]
Outline
• Too many people, too few people
• Inspiring young people
• Examples of interesting educational activities
(roughly scaling up by participant count)
• New opportunities for cyberinfrastructure at the
campus, national, and international levels
(campus bridging)
• Conclusions: Education, technology, economic
development
NB: License terms for slides at end
2
Some definitions
• Supercomputer – large, monolithic, tightly integrated computer
• High Performance Computer – a more general term than
supercomputer including a wider variety of cluster types
• High Throughput Computing – systems of computers that work on
nicely parallel problems with (very) low bandwidth connections
• Cyberinfrastructure consists of computing systems, data storage
systems, advanced instruments and data repositories, visualization
environments, and people, all linked together by software and high
performance networks to improve research productivity and enable
breakthroughs not otherwise possible.
• eScience – large scale science increasingly carried out by global
collaborations enabled by the Internet.
3
Technology assertions…
“…results of this discovery upon society
will be greater than the imagination of
the most sanguine can now distinctly
conceive.”
The telegraph
The American Biblical
Repository, 1838
“… will tremendously influence our
national elections, will promote world
understanding of social, racial, and
economic problems, will influence our
daily lives to a degree yet undreamed of.”
The television
Franklin Dunham, 1956
“… is becoming the town square for the
global village of tomorrow.”
The Internet
Bill Gates, 1996
“The world is poised on the cusp of an
economic and cultural shift as dramatic
as that of the Industrial Revolution.”
The WWW
Steven Levy, 1997
“We have technology, finally, that for the
first time in human history allows people
to really maintain rich connections with
much larger numbers of people.”
The Internet
Pierre Omidyar, 2005
4
World population growth (history & predicted)
Billions
12
11
2100
10
9
Old
Stone
7 Age
8
New Stone Age
Bronze
Age
Iron
Age
6
Modern
Age
Middle
Ages
2000
Future
5
4
1975
3
1950
2
1
Black Death
1+ million 7000 6000 5000
years
B.C. B.C. B.C.
4000
B.C.
— The Plague
1900
1800
3000 2000 1000 A.D. A.D. A.D. A.D. A.D. A.D.
B.C. B.C. B.C.
1 1000 2000 3000 4000 5000
Source: © Population Reference Bureau; and United Nations, World Population Projections to 2100 (1998).
5
© Schnabel, R. 2011. ACM’s engagement in education policy.
CRA Leadership Meeting. 28 Feb 2011.
6
Analytics market
$76B market by 2015
Information & Analytics Market
$60B in 2011; 6.4% CGR ′10-′15
Info Integration & MDM
$4.9B ′10;
8.3% CGR ′10-’15
DW DBMS
$6.9B ′11;
7.1% CGR ′10-′15
Analytic Applications
$7.3B ′11;
7.0% CGR ′10-′15
BI Platform & PM
$15.0B ′11;
7.7% CGR ′10-′15
Data Mgmt & IDM
$18.9B ′11;
4.2% CGR ′10-′15
Content Management
$6.9B ′11;
6.7% CGR ′10-′15
Source: GMV 2H10 (incl. analytic applications)
© IBM, Inc.
7
The conundrum
• Technology will not solve our problems by itself
• We do not have enough knowledge workers
• People in many parts of the globe do not have
access to education that will enable them to fill
the jobs of today and tomorrow
• Colleges and universities are not recruiting and
retaining enough students to fulfill demand for
students with CSTEM skills in general and
advanced computing skills in particular
8
We need people comfortable with critical
thinking and computational thinking
• Critical thinking skills
• Computational thinking skills
–
–
–
–
Conceptualizing, not programming
Fundamental, not rote skill
A way that humans, not computers, think
Complements and combines mathematical and
engineering thinking
– Ideas, not artifacts
– For everyone, everywhere
From: Wing, J.M. Computational thinking. 2006.
Communications of the ACM. 49(3): 33-35
9
Inspiration matters!
© Estes-COX Inc.
www.estesrockets.com
Two young model and model rocket builders,
shortly after claiming world record for
continuous model building at 37 hours and 40
minutes (quickly surpassed by others) April, 1973
10
Ready, Set
Robots! Camp
@ PTI 2010
From 3D movie What is Cancer? by Albert William
IUPUI, SOIC, AVL, Research Technologies, UITS / PTI
© Matthew King, student in IU professor
Margaret Dolinsky's Digital Art class
Mike Boyles, AVL, Research Technologies, UITS / PTI giving demo
11
Games are not reality
12
PolarGrid
Je’aime Powell, Elizabeth City State
University graduate researcher on
Greenland expedition, 2009.
Photos courtesy of Keith Lehigh and Matt Link, Indiana University
13
Geoffrey Fox, PI. PolarGrid
SC ‘08 Cluster Challenge
IU / Dresden team – organized within IU side by Dr. Andrew Lumsdaine, Director, Open Systems Lab
and Center for Scalable Computing, PTI, and Professor, School of Informatics and Computing; and
Matt Link and D. Scott McCaulay, Directors, Research Technologies, UITS / PTI
14
Guitar workshop
Photos courtesy of Rebecca Lowe,
Open Systems Lab, SOIC and PTI,
Indiana University. Guitar workshop
sponsored by Dr. Andrew Lumsdaine,
Director, Open Systems Lab and Center
for Scalable Computing, PTI; and
Professor, School of Informatics and
Computing
15
Minority Engineering Advancement
Program @ IUPUI
Use the Bootable Cluster CD with the “Game of
Life” to demonstrate speedup
LittleFe - small integrated cluster
Matt Link, Director, Research Technologies,
UITS; and Associate Director, Center for
Scalable Computing, PTI
16
LittleFe
“LittleFe is a complete 6 node Beowulf style
portable computational cluster. The entire
package weighs less than 50 pounds; easily
travels; and sets-up in 5 minutes. Current
generation LittleFe hardware includes multicore processors and GPGPU support enabling
support for shared memory parallelism,
distributed memory parallelism, and hybrid
models. By leveraging the Bootable Cluster CD
project, and the Computational Science
Education Reference Desk LittleFe is a
powerful, ready-to-run, computational science
and parallel programming educational
platform for the price of a high-end laptop.”
http://LittleFe.net
Photo courtesy Charlie Peck, Earlham College.
© Earlham College.
17
LEAD (Linked Environments for
Atmospheric Discovery) & LEAD II – an
example Science Gateway
Meteorology researchers used data and images generated by LEAD II while chasing tornadoes.
Images © Beth Plale, Professor, School of Informatics &
Computing; Director, Data to Insight Center, PTI
18
WxChallenge & LEAD II
www.wxchallenge.com. Screen image © University of Oklahoma.
In support of the 2010 Vortex2 campaign,
LEAD II successfully executed 214
workflows, used 109,568 CPU hours,
generated 215 GB of data and over 9,100
2D products.
http://pti.iu.edu/d2i/leadii-vortex2 Image © Trustees of
Indiana University
19
nanoHUB
Screen Image © Network for Computational Nanotechnology
(nanohub.org/groups/ncn).
20
nanoHUB usage
nanoHUB usage, September 2010. Red dots: tutorial and seminar use.
Yellow dots: online simulation use. Size of dot indicates number of users
from location. Annually nanoHUB serves over 170,000 users in 172 countries.
© Gerhard Klimeck, Network for Computational Nanotechnology
(nanohub.org/groups/ncn). Used by permission. May not be reused without permission.
21
@home projects (based on BOINC)
http://escatter11.fullerton.edu/nfs/. Image courtesy
Dr. Greg Childers, and © California State University,
Fullerton.
docking.cis.udel.edu Image Courtesy of Michela
Taufer, GCLab, U. Delaware. © U. Deleware
22
You don’t need access to a supercomputer
to teach parallel computing… or dataintensive computing
•
•
•
•
Multicore & GPUs
LittleFe
Cloud providers
Citizen Science – access to
and participation in
authentic science
physicsworld.com/cws/article/news/2738
© Institute of Physics. Reused under
Licensing terms @ physicsworld.com/cws/copyright
Photograph © Chris Eller,
Advanced Visualization
Lab, Research Technologies,
UITS; and PTI
23
Campus bridging
• Campus bridging is the seamlessly integrated use of
cyberinfrastructure operated by a scientist or engineer with
other cyberinfrastructure on the scientist’s campus, at
other campuses, and at the regional, national, and
international levels as if they were proximate to the
scientist, and when working within the context of a Virtual
Organization (VO) make the ‘virtual’ aspect of the
organization irrelevant (or helpful) to the work of the VO.
• Campus bridging material:
http://pti.iu.edu/campusbridging/
• ACCI Taskforce final reports:
http://www.nsf.gov/od/oci/taskforces/
24
Estimated Computing Capacity (TFLOPS)
NSF Track 1
Track 2 and other
major facilities
Campus HPC/ Tier 3
systems
Workstations at
Carnegie research
universities
Volunteer
computing
Commercial cloud
(Iaas and Paas)
0
2000
4000
6000
8000
10000
12000
TFLOPS
Data at http://hdl.handle.net/2022/13136
25
Single lab biological instruments
Type of instrument
Model
Raw image
data
Data
products
Light Microscopy
BD Pathway 855 Bioimager
N/A
7 GB/day
Genome
sequencing
Roche 454 Life Sciences genome
analyzer system
39 GB/day
9 GB/day
Illumina-Solexa genome analyzer
system
367 GB/day
100 GB/day
ABI SOLID 3
238 GB/day
150 GB/day
Microarray Gene
Expression Chip
Reader
Molecular Devices GenePix
Professional 4200A Scanner
N/A
8 MB/day
Microarray Gene
Expression Chip
Reader
NimbleGen Hybridization System
4 (110V)
N/A
300
MB/day
Several Task Force recommendations to the NSF re Hardware and networking: Much
more attention to data and networking challenges!
26
Cyberinfrastructure is infrastructure
Strategic Recommendation
to the NSF: NSF must lead
the community in
establishing a blueprint for
a National CI
CI software must be made
more robust
National Science Foundation. Investing in America’s Future:
Strategic Plan FY 2006-2011. September 2006. Available from:
http://www.nsf.gov/pubs/2006/nsf0648/nsf0648.jsp
27
Examples of mature and maturing
systems & software
DEISA
UK eScience Grid
NSF CIF 21 (Cyberinfrastructure
Framework for 21st Century
Science and Engineering
ROCKS (www.rocksclusters.org)
Condor (www.condor.org)
© DEISA. http://www.deisa.eu/usersupport/user-documentation/
unicore-5-in-deisa/job-submission-through-unicore-5/
DEISA-UNICORE-Figure01.png/image_preview
28
Critical challenge: curriculum materials
http://ocw.mit.edu/index.htm Used under Creative Commons License –
Attribution-NonCommercial-ShareAlike 3.0 United States (CC BY-NC-SA 3.0)
http://creativecommons.org/licenses/by-nc-sa/3.0/us/
29
Existing curriculum resources
• MIT Computer Science & Engineering curriculum –
web.mit.edu/catalog/degre.engin.ch6.html
• ACM – www.acm.org/education/curricula-recommendations
• TCPP (Technical Committee on Parallel Programming) tcpp.cs.gsu.edu/
– CORE COURSES:
•
•
•
•
•
CS1 Introduction to Computer Programming (First Courses)
CS2 Second Programming Course in the Introductory Sequence
Systems Intro Systems/Architecture Core Course
DS/A Data Structures and Algorithms
DM Discrete Structures/Math ADVANCED
– ELECTIVE COURSES:
•
•
•
•
•
•
•
Arch 2 Advanced Elective Course on Architecture
Algo 2 Elective/Advanced Algorithm Design and Analysis
Lang Programming Language/Principles (after introductory sequence)
SwEngg Software Engineering
ParAlgo Parallel Algorithms
ParProg Parallel Programming
Compilers Compiler Design
– IMHO: The TCPP curriculum demonstrates the need for more attention to
computational thinking in K-12 education
30
300+ Students learning about Twister & Hadoop
MapReduce technologies, supported by FutureGrid.
July 26-30, 2010 NCSA Summer School Workshop
http://salsahpc.indiana.edu/tutorial
Washington
University
University of
Minnesota
Iowa
IBM Almaden
Research Center
University of
California at
Los Angeles
San Diego
Supercomputer
Center
Michigan
State
Univ.Illinois
at Chicago
Notre
Dame
Johns
Hopkins
Penn
State
Indiana
University
University of
Texas at El Paso
University of
Arkansas
University
of Florida
Slide © Judy Qiu, SOIC and SALSA Lab, PTI
Economies of scale in training
Photo courtesy Robert Quick,
Research Technologies & PTI.
OSG Grid School in Sao Paulo
Brazil, January 2011
Image from TeraGridEOT: Education, Outreach, and
Training 2010. https://www.teragrid.org/web/news/
news#2010scihigh
32
Great challenges, great opportunities
• Challenges
– Matters such as human impact on the global environment will
be most successfully addressed with fact-based consensus
approaches.
– More countries must have the skill and access to technology to
do their own modeling
• Cyberinfrastructure and education opportunities
– If we can treat cyberinfrastructure more like infrastructure … we
can focus on the challenging / important / fun work
– Robust cyberinfrastructure => reusable educational materials
– Data-intensive science creates tremendous need and
opportunity in education and application
– While we are busy improving the pipeline of talent, involving
undergrads in research may greatly improve the % of the
existing pipeline that pursues an advanced technology career
33
New economic growth opportunities
• VOs and opportunities they provide for research
• Digital manufacturing (new opportunities in a
different approach to globalization)
• Sustainable societies
• With better education in supercomputing, and all
forms of high performance computing, people may
enable us to achieve some of the technology nirvana
described at beginning of talk
34
This talk is dedicated to the memory of
Truman O. Stewart
35
Additional information
• Droegemeier, K., B. Plale, M. Ramamurthy, and C. Mattocks, "A New
Approach for Using Web Services, Grids, and Virtual Organizations
in Mesoscale Meteorological Research" 25th Conference on
Interactive Information Processing Systems for Meteorology,
Oceanography, and Hydrology (IIPS), 01/2009.
• Stewart, C.A., S. Simms, B. Plale, M. Link, D. Hancock and G. Fox.
2010. What is Cyberinfrastructure? In: Proceedings of SIGUCCS
2010 (Norfolk, VA, 24-27 Oct, 2010).
http://portal.acm.org/citation.cfm?doid=1878335.1878347
• http://www.computinginthecore.org/ − “a non-partisan advocacy
coalition … to elevate computer science education to a core
academic subject in K-12 education …
• http://hubzero.org/resources/408/ Exploring the Impact of
nanoHUB.org on Research and Education
• Cohen, D. 2006. Globalization and Its Enemies. MIT Press
36
Acknowledgments
• Thanks to King Abdullah University of Science and Technology for
the opportunity to present today (Through Inspiration, Discovery
indeed!)
• Malinda Lingwall for editing, graphical work, and factfinding/checking
• Ready, Set, Robots! Camp: Daphne Siefert-Herron, Kurt Seiffert,
Kristy Kallback-Rose, Danko Antolovic, Jenett Tillotson, Therese
Miller
• MEAP: David Hancock, Andrew Arenson, Rich Knepper, Kurt Seiffert,
Matt Link (Research Technologies, UITS, Research Technologies,
PTI); Patrick Gee, Mark Russell
• Thanks to all of the IU Research Technologies staff and Pervasive
Technology Institute students, staff, and faculty who have led or
been involved in the IU projects described here
37
Acknowledgments
•
•
•
•
•
•
•
•
•
•
•
•
Many of the scientific workflow examples here use the IU Data Capacitor – project led by
Steve Simms, Research Technologies, UITS, & PTI. http://pti.iu.edu/dc/ NSF CNS 05-21433
LEAD: Beth Plale, IU (SOIC-PTI) funded by NSF 0331480
PolarGrid: NSF 0723054 (G. Fox, PI)
FutureGrid: NSF 0910812 (G. Fox, PI)
nanoHUB: nanoHUB.org is operated by Network for Computational Nanotechnology (NCN).
NCN was funded by the National Science Foundation (NSF) under various grants.
Development and support of nanoHUB is also supported in part by the HUBzero consortium,
of which IU is a member.
Campus Bridging: NSF 040777, 1059812, 0948142, 1002526, 0829462
LittleFe: Support from TeraGrid, SC Conference, Intel Corporation and Earlham College.
Lilly Endowment for its support of IU through INGEN, METACyt, and the Pervasive
Technology Institute
Tevfik Kosar, who as Chair of DIDC ‘10 invited me to present the Keynote presentation at the
Third International Workshop on Data Intensive Distributed Computing (DIDC'10). “It’s not a
data deluge – it’s worse than that.” Several slides from that talk are reused here. That original
talk is available : http://hdl.handle.net/2022/13195
Thanks to those individuals who gave permission to use images presented in this talk
Any opinions presented here are those of the presenter and do not necessarily represent the
opinions of the National Science Foundation, the Lilly Endowment, the NSF ACCI, NSF ACCI
Task Force on Campus Bridging, or any other funding agencies or organizations
38
License terms
•
Items indicated with a © are under copyright and used here with permission. Such
items may not be reused without permission from the holder of copyright except
where license terms noted on a slide permit reuse.
•
Please cite this presentation as: Stewart, C.A. “Educational Applications of
Supercomputing and Cyberinfrastructure.” Presentation at KAUST Economic
Development International Symposium at ISC'11, 21 June 2011. Available from:
http://hdl.handle.net/2022/13365
•
Except where otherwise noted, contents of this presentation are copyright 2011 by
the Trustees of Indiana University.
•
This document is released under the Creative Commons Attribution 3.0 Unported
license (http://creativecommons.org/licenses/by/3.0/). This license includes the
following terms: You are free to share – to copy, distribute and transmit the work
and to remix – to adapt the work under the following conditions: attribution – you
must attribute the work in the manner specified by the author or licensor (but not
in any way that suggests that they endorse you or your use of the work). For any
reuse or distribution, you must make clear to others the license terms of this work.
39
Questions?
And thank you for your kind
attention….
40