Cyberinfrastructure for e-Education and e-Research (e-Science) Cyberinfrastructure Days New Mexico Highlands University Las Vegas NM March 10-11 2008 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington.

Download Report

Transcript Cyberinfrastructure for e-Education and e-Research (e-Science) Cyberinfrastructure Days New Mexico Highlands University Las Vegas NM March 10-11 2008 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington.

Cyberinfrastructure for
e-Education
and e-Research (e-Science)
Cyberinfrastructure Days
New Mexico Highlands University
Las Vegas NM
March 10-11 2008
Geoffrey Fox
Computer Science, Informatics, Physics
Pervasive Technology Laboratories
Indiana University Bloomington IN 47401
[email protected]
http://www.infomall.org
1
e-moreorlessanything






‘e-Science is about global collaboration in key areas of science,
and the next generation of infrastructure that will enable it.’ from
inventor of term John Taylor Director General of Research
Councils UK, Office of Science and Technology
e-Science is about developing tools and technologies that allow
scientists to do ‘faster, better or different’ research
Similarly e-Business captures an emerging view of corporations as
dynamic virtual organizations linking employees, customers and
stakeholders across the world.
This generalizes to e-moreorlessanything including eDigitalLibrary, e-NationalSecurity, e-HavingFun and e-Education
A deluge of data of unprecedented and inevitable size must be
managed and understood.
People (virtual organizations), computers, data (including sensors
and instruments) must be linked via hardware and software
networks
2
Applications, Infrastructure,
Technologies






This field is confused by inconsistent use of terminology; I define
Web Services, Grids and (aspects of) Web 2.0 (Enterprise 2.0) are
technologies
Grids could be everything (Broad Grids implementing some sort
of managed web) or reserved for specific architectures like OGSA
or Web Services (Narrow Grids)
These technologies combine and compete to build electronic
infrastructures termed e-infrastructure or Cyberinfrastructure
e-moreorlessanything is an emerging application area of broad
importance that is hosted on the infrastructures e-infrastructure
or Cyberinfrastructure
e-Science or perhaps better e-Research is a special case of emoreorlessanything
3
What is Cyberinfrastructure

Cyberinfrastructure is (from NSF) infrastructure that
supports distributed science (e-Science)– data, people,
computers
• Clearly core concept more general than Science



Exploits Internet technology (Web2.0) adding (via Grid
technology) management, security, supercomputers etc.
It has two aspects: parallel – low latency (microseconds)
between nodes and distributed – highish latency (milliseconds)
between nodes
Parallel needed to get high performance on individual large
simulations, data analysis etc.; must decompose problem
• New Mexico Encanto supercomputer excellent parallel resource

Distributed aspect integrates already distinct components –
especially natural for data
4
Underpinnings of
Cyberinfrastructure



Distributed software systems are being “revolutionized” by
developments from e-commerce, e-Science and the consumer
Internet. There is rapid progress in technology families termed
“Web services”, “Grids” and “Web 2.0”
The emerging distributed system picture is of distributed
services with advertised interfaces but opaque implementations
communicating by streams of messages over a variety of
protocols
• Complete systems are built by combining either services or
predefined/pre-existing collections of services together to
achieve new capabilities
As well as Internet/Communication revolutions (distributed
systems), multicore chips will likely be hugely important
(parallel systems)
5
Virtual Observatory Astronomy Grid
Integrate Experiments
Radio
Far-Infrared
Visible
Dust Map
Visible + X-ray
Galaxy Density Map6
Example: Setting up a Polar CI-Grid
• The North and South poles are melting with potential huge
environmental impact
• As a result of MSI meetings, I am working with MSI ECSU in
North Carolina and Kansas University to design and set up a
Polar Grid (Cyberinfrastructure)
• This is a network of computers, sensors (on robots and
satellites), data and people aimed at understanding science of
ice-sheets and impact of global warming
• We have changed the 100,000 year Glacier cycle into a ~50
year cycle; the field has increased dramatically in importance
and interest
• Good area to get involved in as not so much established work
7
8
CYBERINFRASTRUCTURE CENTER FOR POLAR SCIENCE (CICPS)
TeraGrid resources include more than 250 teraflops of computing capability and more than 30 petabytes of
online and archival data storage, with rapid access and retrieval over high-performance networks. TeraGrid
is coordinated at the University of Chicago, working with the Resource Provider sites: Indiana University,
Oak Ridge National Laboratory, National Center for Supercomputing Applications, Pittsburgh
Supercomputing Center, Purdue University, San Diego Supercomputer Center, Texas Advanced Computing
Center, University of Chicago/Argonne National Laboratory, and the National Center for Atmospheric
Research.
UW
Grid Infrastructure
Group (UChicago)
PSC
UC/ANL
NCAR
PU
NCSA
Caltech
IU
UNC/RENCI
ORNL
USC/ISI
SDSC
TACC
Resource Provider (RP)
Software Integration Partner
Computing and Cyberinfrastructure: TeraGrid
Large Hadron Collider
CERN, Geneva: 2008 Start
 pp s =14 TeV L=1034 cm-2 s-1
 27 km Tunnel in Switzerland & France
CMS
TOTEM
Atlas
pp, general
purpose; HI
5000+ Physicists
250+ Institutes
60+ Countries
ALICE : HI
LHCb: B-physics
Higgs,
SUSY,Analyze
Extra Dimensions,
CP Violation,
QG
Challenges:
petabytes of complex
data cooperatively
Harness
data & network resources
Plasma,
… global computing,
the Unexpected
Environmental Monitoring Sensor
Grid at Clemson
11
Sensor Grids Can be Fun

Note sensors are any time dependent source of
information and a fixed source of information is just a
broken sensor
•
•
•
•
•
•
•
•
•
•
SAR Satellites
Environmental Monitors
Nokia N800 pocket computers
RFID tags and readers
GPS Sensors
Lego Robots
RSS Feeds
Audio/video: web-cams
Presentation of teacher in distance education
Text chats of students
12
The Sensors on the Fun Grid
Laptop for PowerPoint
2 Robots used
Lego Robot
GPS
Nokia N800
RFID Tag
RFID Reader
13
Data from the Robot RFID Sensors

Data from GPS geolocates other sensors
Sensor Data from Lego Light
sensor plus videocams from
N800 carried as payload on Lego
RFID Reader sees
many tags
14
BIRN Bioinformatics Research Network
15
The People in Cyberinfrastructure




Web 2.0 can enhance scientific collaboration, i.e.
effectively support virtual organizations, in different
ways from grids
I expect more resources like MyExperiment from UK,
SciVee from SDSC and Connotea from Nature that
offer Flickr, YouTube, Facebook, Second Life type
capabilities optimized for science
The usability and participatory nature of Web 2.0 can
bring science and its informatics to a broader audience
In particular distance collaborative aspects of such
Cyberinfrastructure can level playing field; you do not
have to be at Harvard etc. to succeed
• e.g. ECSU in CReSIS NSF Science and Technology Center
• Navajo Tech can access TeraGrid Science Gateways
16
SciVee: Share videos etc.
Connotea: Share links/comments
All have tags
17
MSI-CIEC Web 2.0 Research Matching Portal










Portal supporting tagging and
linkage of Cyberinfrastructure
Resources
NSF (soon other agencies)
Solicitations and Awards
MSI-CIEC Portal Homepage
Feeds such as SciVee and NSF
Researchers on NSF Awards
User and Friends
TeraGrid Allocations
Search Results
Search for linked people, grants etc.
Could also be used to support
matching of students and faculty for
REUs etc.
MSI-CIEC Portal Homepage
Search Results
18
19
The social process
of science 2.0
Digital
Libraries
Virtual Learning
Environment
Undergraduate
Students
scientists
Graduate
Students
Reprints
PeerReviewed
Journal &
Conference
Papers
Technical
Preprints Reports
&
Metadata
Repositories
experimentation
Local
Web
Certified
Experimental
Results & Analyses
Data, Metadata
Provenance
Workflows
Ontologies
20
Data and Cyberinfrastructure




DIKW: Data  Information  Knowledge  Wisdom
transformation
Applies to e-Science, Distributed Business Enterprise (including
outsourcing), Military Command and Control and general
decision support
(SOAP or just RSS) messages transport information expressed in
a semantically rich fashion between sources and services that
enhance and transform information so that complete system
provides
• Semantic Web technologies like RDF and OWL might help us
to have rich expressivity but they might be too complicated
We are meant to build application specific information
management/transformation systems for each domain
• Each domain has Specific Services/Standards (for API’s and Information
such as KML and GML for Geographical Information Systems)
• and will use Generic Services (like R for datamining) and
• Generic Standards (such as RDF, WSDL)
 Standards made before consensus or not observant of technology
progress are dubious
21
Raw Data 
Data  Information 
Knowledge 
Wisdom  Decisions
Information and Cyberinfrastructure
S
S
S
S
S
S
fs
SS
fs
fs
S
S
S
S
fs
fs
fs
fs
S
S
fs
S
S
S
S
S
S
Discovery
Cloud
fs
fs
Filter
Cloud
fs
S
S
fs
Filter
Service
fs
Compute
Cloud
Database
Filter
Cloud
Filter
Service
fs
SS
SS
Filter
Cloud
fs
SS
Another
Grid
fs
fs
Filter
Cloud
fs
Discovery
Cloud
fs
fs
Filter
Service
fs
SS
Filter
Service
fs
SS
SS
fs
fs
Filter
Cloud
Another
Service
S
S
Another
Grid
Another
Grid
Traditional Grid
with exposed
services
Filter
Cloud
S
S
S
S
Storage
Cloud
S
S
Sensor or Data
Interchange
Service
22
APEC Cooperation for Earthquake Simulation

ACES is a eight year-long collaboration among scientists
interested in earthquake and tsunami predication
• iSERVO is Infrastructure to support
work of ACES
• SERVOGrid is (completed) US Grid that is
a prototype of iSERVO
• http://www.quakes.uq.edu.au/ACES/

Chartered under APEC –
the Asia Pacific Economic
Cooperation of 21 economies
23
Repositories
Federated Databases
Database
Sensors
Streaming
Data
Field Trip Data
Database
Sensor Grid
Database Grid
Research
SERVOGrid
Education
Compute Grid
Data
Filter
Services Research
Simulations
?
GIS
Discovery Grid
Services
Customization
Services
From
Research
to Education
Analysis and
Visualization
Portal
Grid of Grids: Research Grid and Education Grid
Education
Grid
Computer
Farm 24
Grid Workflow Datamining in Earth Science

NASA GPS

Work with Scripps Institute
Grid services controlled by workflow process real time
data from ~70 GPS Sensors in Southern California
Earthquake
Streaming Data
Support
Archival
Transformations
Data Checking
Hidden Markov
Datamining (JPL)
Real Time
Display (GIS)
25
Grid Workflow Data Assimilation in Earth Science

Grid services triggered by abnormal events and controlled by workflow process real
time data from radar and high resolution simulations for tornado forecasts
Typical
graphical
interface to
service
composition
26