CReSIS Cyberinfrastructure CReSIS Lawrence Kansas February 10 2009 Geoffrey Fox Computer Science, Informatics, Physics Chair Informatics Department Director Digital Science Center and Community Grids Laboratory of Pervasive Technology.

Download Report

Transcript CReSIS Cyberinfrastructure CReSIS Lawrence Kansas February 10 2009 Geoffrey Fox Computer Science, Informatics, Physics Chair Informatics Department Director Digital Science Center and Community Grids Laboratory of Pervasive Technology.

CReSIS Cyberinfrastructure
CReSIS Lawrence Kansas
February 10 2009
Geoffrey Fox
Computer Science, Informatics, Physics
Chair Informatics Department
Director Digital Science Center
and Community Grids Laboratory of
Pervasive Technology Institute (PTI)
Indiana University Bloomington IN 47404
[email protected]
http://www.infomall.org
1
What is Cyberinfrastructure







Cyberinfrastructure is infrastructure that supports distributed
research and learning (e-Science, e-Research, e-Education)
• Links data, people and computers
Exploits Internet technology (Web2.0 and Clouds) adding (via
Grid technology) management, security, supercomputers etc.
It has two aspects: parallel – low latency (microseconds)
between nodes and distributed – highish latency (milliseconds)
between nodes
Parallel needed to get high performance on individual large
simulations, data analysis etc.; must decompose problem
Distributed aspect integrates already distinct components
(data)
Integrate with TeraGrid (and Open Science Grid)
We are using Cyberinfrastructure – with innovation from
special characteristics of use; exploit software from astronomy,
biology, earth science, particle physics, business (clouds) ….
experience
2
Indiana University Experience







Indiana University PTI team is a partnership between a
research group (Community Grids Laboratory led by Fox) and
the University IT Research Technologies (UITS-RT led by
Stewart)
This allows us robust systems support from expeditions to
lower 48 systems with use of leading edge technologies
PolarGrid would not have succeeded without this collaboration
IU runs Internet2/NLR Network Operations Center
http://globalnoc.iu.edu/
IU is a member of TeraGrid and Open Science Grid
IU has provided Cyberinfrastructure for LEAD (Tornado
forecasting), QuakeSim (Earthquakes), Sensor Grids for Air
Force in areas with some overlap with CReSIS requirements
IU has significant parallel computing expertise (Fox developed
some of earliest successful parallel machines); Lumsdaine
leader in many MPI projects – MPI.NET and OpenMPI
3
4
CYBERINFR AST RUCT URE CENTER FOR POL AR SCIENCE (CICPS)
PolarGrid 2008-2009









Supported several expeditions starting July 2008
Ilulissat: airborne radar
NEEM: ground-based radar, remote deployment
Thwaites: ground-based radar
Expedition Cyberinfrastructure simplified after initial
experiences as power/mobility more important than ability to
do sophisticated analysis.
Offline analysis partially done on PolarGrid system at
Indiana University
Education and Training supported by laboratory and
systems at ECSU
Collaboration enhanced by Polycom systems
PolarGrid was an NSF MRI Instrument grant – substantial
people support donated by Indiana University
5
CReSIS Cyberinfrastructure

Base and Field Camps for Arctic and Antarctic
expeditions
• Initial data analysis to monitor experimental equipment

Training and education resources
• Computer labs; Cyberlearning/collaboration





Full off-line analysis of data on “lower 48” systems
exploiting TeraGrid as appropriate
Data management, metadata support and long term
data repositories
Hardware available through PolarGrid, Indiana
University (archival and dynamic storage), TeraGrid
Parallel (multicore/cluster) versions of simulation and
data analysis codes
Portals for ease of use
6
Technical Approach I






Clouds and Web 2.0 are disruptive technologies but
One should still build distributed components as services
• But keep to simple interfaces – REST or basic SOAP
Still access systems through portals
• Allowing either gadgets or portlets
Still orchestrate systems using workflow
• But mash-ups can be used in simple cases
Still use OGC (Open Geospatial Consortium) standards for
Geographic Information System Services
Still use MPI for parallel computing
• Threading may be useful on multicore but not obvious (we
find better performance with MPI than threads on 24 core
nodes for large jobs)
7
Technical Approach II






Semantic Web still useful for metadata but be sure only to use
simple RDF and be use it can be mapped to MySQL or
equivalent databases
Relevant areas of uncertainty include “Data Intensive” Web
2.0 technologies such as Hadoop (Yahoo) and Dryad
(Microsoft)
• Likely to change workflow and systems architecture for
data intensive problems – as CReSIS has
Clouds likely to change architectures for loosely coupled
dynamic jobs such as spawning a bunch of independent
Matlab’s
No reason to develop new core technologies for CReSIS but
rather deploy and customize existing technologies
Over next year will focus on portal (have some TeraGrid
funding with ECSU) and gather requirements in data and
modeling areas
Followed by examples of IU Cyberinfrastructure
8
Disloc model of Northridge
fault. Disloc used in Gerry
Simila’s geophysics classes
(CSUN).
OGCE QuakeSim Portlets
OGCE (Open Grid Computing Environments led
by CGL) Google Gadgets:
MOAB dashboard, remote directory browser, and
proxy management.
LEAD Cyberinfrastructure
11
OGCE Workflow Tools
WRF-Static running
on Tungsten
AFRL Sensor Grid
Sensors on robot
RFID Reader
Lego Robot
N800 Webcam carried by robot
GPS
RFID Signal
13
Comparison of MPI and Threads on Classic parallel Code
0.6
0.5
0.4
Parallel
Overhead f
Speedup = 24/(1+f)
24-way
0.3
Patient2000
0.2
Patient4000
16-way
Patient10000
2-way
0.1
4-way
8-way
1-way
0
-0.1
-0.2
Speedup 28
-0.3
MPI 1 2 1 4 2 1 8 4 2 1 16 2 4 8 1 24 12 8 6 4 3 2 1 Processes
CCR 1 1 2 1 2 4 1 2 4 8 1 8 4 2 16 1 2 3 4 6 8 12 24 Threads
4 Intel Six Core Xeon E7450 2.4GHz 48GB Memory 12M L2 Cache
3 Dataset sizes