A new collaborative scientific initiative at Harvard. One-Slide IIC Proposal-driven, from within Harvard “Projects” focus on areas where computers are key to.

Download Report

Transcript A new collaborative scientific initiative at Harvard. One-Slide IIC Proposal-driven, from within Harvard “Projects” focus on areas where computers are key to.

A new collaborative scientific initiative at Harvard.
One-Slide IIC
Proposal-driven, from within Harvard
“Projects” focus on areas where computers are key to new science; widely
applicable results
Technical focus “Branches”





Instrumentation
Databases & Provenance
Analysis & Simulations
Visualization
Distributed Computing (e.g. GRID, Semantic Web)
Matrix organization: “Projects” by “Branches”
Education: Train Future Consumers & Producers of Computational Science
Goal: Fill the void in, highly value, and learn from,
the emerging field of “computational science.”
“Astronomical Medicine”
A joint venture of FAS-Astronomy & HMS/BWH-Surgical Planning Lab;
Work shown here is from the 2005 Junior Thesis of Michelle Borkin, Harvard College.
Filling the “Gap”
between Science and Computer Science
Scientific
disciplines
Computer Science
departments
Increasingly, core problems in
science require computational
solution
Focused on finding elegant
solutions to basic computer
science challenges
Typically hire/“home grow”
computationalists, but often lack
the expertise or funding to go
beyond the immediate pressing
need
Often see specific, “applied”
problems as outside their
interests
“Workflow” & “Continuum”
Workflow
Examples
Astronomy
Public Health
“Collect”
Telescope
Microscope,
Stethoscope, Survey
COLLECT
“National Virtual
Observatory”/
COMPLETE
CDC Wonder
“Analyze”
Study the density
structure of a starforming glob of gas
Find a link between one
factory’s chlorine runoff
& disease
ANALYZE
Study the density
structure of all starforming gas in…
Study the toxic effects
of chlorine runoff in the
U.S.
“Collaborate”
Work with your student
COLLABORATE
Work with 20 people in 5 countries, in real-time
“Respond”
Write a paper for a Journal.
RESPOND
Write a paper, the quantitative results of which
are shared globally, digitally.
IIC branches address shared “workflow” challenges
Challenges common to data-intensive science
• Data acquisition
IIC branches
Instrumentation
• Data processing, storage, and access
Databases/
Provenance
• Deriving meaningful insight from large datasets
Analysis &
Simulations
•
Maximizing understanding through visual representation
• Sharing knowledge and computing resources across
geographically dispersed researchers
Visualization
Distributed
Computing
Continuum
“Computational Science”
Missing at Most Universities
“Pure” Discipline
Science
“Pure” Computer
Science
(e.g. Einstein)
(e.g. Turing)
IIC Organization: Research and Education
Provost
Dean,
Physical Sciences
Assoc Provost
IIC Director
Dir of Admin &
Operations
Dir of Research
Assoc Dir,
Instrumentation
Project 1
(Proj Mgr 1)

Project 2
(Proj Mgr 2)
Project 3
(Proj Mgr 3)
Etc.
CIO
(systems)
Knowledge
mgmt

Assoc Dir,
Visualization
Assoc Dir,
Databases/Data
Provenance

Assoc Dir,
Analysis &
Simulation
Assoc Dir,
Distributed
Computing
Education &
Outreach staff







Dir of Education &
Outreach
COMPLETE/IRAS Ndust
QuickTime™ and a TIFF (LZW) decompressor are nee
Barnard’s Perseus
H
2MASS/NICER
Extinction
H- emission,WHAM/SHASSA Surveys (see Finkbeiner 2003)
IRAS Ndust
Numerical
Simulation of
Star Formation
•MHD turbulence gives “t=0”
conditions; Jeans mass=1
Msun
•50 Msun, 0.38 pc, navg=3 x
105 ptcls/cc
•forms ~50 objects
•T=10 K
•SPH, no B or L, G
•movie=1.4 free-fall times
Bate, Bonnell & Bromm 2002
(UKAFF)
QuickTime™ and a
Cinepak decompressor
are needed to see this picture.
Goal:
Statistical
Comparison of
“Real” and
“Synthesized”
Star Formation
Figure based on work of Padoan, Nordlund, Juvela, et al.
Excerpt from realization used in Padoan & Goodman 2002.
Measuring Motions: Molecular Line Maps
Spectral Line Observations
Radio Spectral-line Observations of Interstellar Clouds
Radio Spectral-Line Survey
Alves, Lada & Lada 1999
Velocity from
Spectroscopy
Observed Spectrum
Telescope 
Spectrometer
1.5
Intensity
1.0
0.5
0.0
All thanks to Doppler
-0.5
100
150
200
250
"Velocity"
300
350
400
QuickTime™ and a
TIFF (UncQuickTime™
ompressed) deco
andmpre
a ssor
are needed
YUV420
codec
decompressor
to see
this picture.
are needed to see this picture.
COMPLETE/FCRAO W(13CO)
Barnard’s Perseus
“Astronomical Medicine”
Excerpts from Junior Thesis of Michelle Borkin (Harvard College); IIC Contacts: AG (FAS) & Michael Halle (HMS/BWH/SPL)
IC 348
IC 348
“Astronomical Medicine”
“Astronomical Medicine”
“Astronomical Medicine”
Before “Medical Treatment”
After “Medical Treatment”
3D Slicer Demo
IIC contacts: Michael Halle & Ron Kikinis
IIC Research Branches
Visualization
Physically
meaningful
combination of
diverse data
types.
Distributed
Computing
Databases/
Provenance
e-Science aspects
of large
collaborations.
Management, and
rapid retrieval, of
data.
Sharing of data
and
computational
resources and
tools in real-time.
“Research
reproducibility”
…where did the
data come from?
How?
Analysis &
Simulations
Development of
efficient
algorithms.
Cross-disciplinary
comparative tools
(e.g. statistical).
Instrumentation
Improved data
acquisition.
Novel hardware
approaches (e.g.
GPUs, sensors).
IIC projects will bring together IIC experts from relevant branches with discipline scientists
to address a pressing computing challenge facing the discipline, that has broad application
3D Slicer
Distributed Computing & Large Databases:
Large Synoptic Survey Telescope
Optimized for time domain
scan mode
deep mode
7 square degree field
6.5m effective aperture
24th mag in 20 sec
> 5 Tbyte/night
Real-time analysis
Simultaneous multiple science goals
IIC contact: Christopher Stubbs (FAS)
Relative optical survey power
160
Time (x10)
Stellar
Galactic (x2)
Figure of Merit
120
80
40
0
LSST
SNAP
PanSubaru
STARRS
CFHT
SDSS
based on AW = 270 LSST design
MMT
Astronomy
LSST
SDSS
2MASS
2011
1998
2001
5000
Peak 500
Avg
8.3
Daily average data
rate (TB/day)
20
Annual data store
(TB)
High Energy Physics
DLS
BaBar
Atlas
RHIC
1992
1999
1998
2007
1999
1
1
2.7
60 (zerosuppressd)
6*
540*
120* (’03)
250* (’04)
0.02
0.016
0.008
0.012
0.6
60.0
3 (’03)
10 (’04)
2000
3.6
6
1
0.25
300
7000
200 (’03)
500 (’04)
Total data store
capacity (TB)
20,000
(10 yrs)
200
24.5
8
2
10,000
100,000
(10 yrs)
10,000
(10 yrs)
Peak computational load
(GFLOPS)
140,000
100
11
1.00
0.600
2,000
100,000
3,000
Average computational
load (GFLOPS)
140,000
10
2
0.700
0.030
2,000
100,000
3,000
Data release delay
acceptable
1 day
moving 3
months
static
2
months
6 months
1 year
6 hrs
(trans)
1 yr
(static)
1 day (max)
<1 hr (typ)
Few days
100 days
30 sec
none
none
<1 hour
1 hr
none
none
none
TBD
1GHz
Xeon
18
450MHz
Sparc
28
60-70MHz
Sparc
10
500MH
z
Mixed/
20GHz/
Pentium/
Pentium
5000
10,000
2500
First year of
operation
Run-time data rate to
storage (MB/sec)
Real-time alert of event
Type/number of
processors
MACHO
5
Challenges at the LHC
For each experiment (4 total):
10’s of Petabytes/year of data logged
2000 + Collaborators
40 Countries
160 Institutions (Universities, National
Laboratories)
CPU intensive
Global distribution of data
Test with « Data Challenges »
CPU vs. Collaboration Size
CPU v. Collab.
CPU v. Collab.
CP U
10 0 , 0 0 0
Earth Simulator
10 , 0 0 0
LHC Exp.
Grav. Wave
1, 0 0 0
Current accelerator Exp.
Nuclear Exp.
Astronomy
10 0
Atmospheric Chemistry Group
10
0
500
1000
1500
Co lla bo ra t io n S iz e
2000
2500
event filter
(selection &
reconstruction)
detector
Data Handling and
Computation for
Physics Analysis
processed
data
event
summary
data
raw
data
event
reprocessing
batch
physics
analysis
event
simulation
CERN
interactive
physics
analysis
[email protected]
analysis objects
(extracted by physics topic)
Workflow
a.k.a. The Scientific Method (in the Age of the Age of High-Speed
Networks, Fast Processors, Mass Storage, and Miniature Devices)
IIC contact: Matt Welsh, FAS