Cyberinfrastructure across the Globe Indiana University Computer Science Undergraduate Honors Seminar January 8 2007 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401 [email protected] http://www.infomall.org.

Download Report

Transcript Cyberinfrastructure across the Globe Indiana University Computer Science Undergraduate Honors Seminar January 8 2007 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401 [email protected] http://www.infomall.org.

Cyberinfrastructure
across the Globe
Indiana University
Computer Science Undergraduate
Honors Seminar
January 8 2007
Geoffrey Fox
Computer Science, Informatics, Physics
Pervasive Technology Laboratories
Indiana University Bloomington IN 47401
[email protected]
http://www.infomall.org
1
Abstract

We discuss the role of Cyberinfrastructure (also called
e-infrastructure and implemented by Grid technology)
in a variety of global activities. These include the
linking of researchers and data world wide in many
fields; new generations of digital libraries and tools like
Google Scholar; study of ice-sheets at the poles and the
dramatic impact of Global warming; the study of
earthquakes across the Pacific ocean; the linking of
apparel manufacturers in Asia to designers in different
continents and the command and control system for the
Department of Defense. We discuss these applications
and their associated technology.
2
Why Cyberinfrastructure Useful







Supports distributed science – data, people, computers
Exploits Internet technology (Web2.0) adding management,
security, supercomputers etc.
It has two aspects: parallel – low latency (microseconds)
between nodes and distributed – highish latency
(microseconds) between nodes
Parallel needed to get high performance on individual 3D
simulations, data analysis etc.; must decompose problem
Distributed aspect integrates already distinct components
Cyberinfrastructure is in general a distributed collection of
parallel systems
Grids are made of services that are “just” programs or data
sources packaged for distributed access
3
e-moreorlessanything and the Grid







‘e-Science is about global collaboration in key areas of science,
and the next generation of infrastructure that will enable it.’ from
its inventor John Taylor Director General of Research Councils
UK, Office of Science and Technology
e-Science is about developing tools and technologies that allow
scientists to do ‘faster, better or different’ research
Similarly e-Business captures an emerging view of corporations as
dynamic virtual organizations linking employees, customers and
stakeholders across the world.
• The growing use of outsourcing is one example
The Grid provides the information technology e-infrastructure for
e-moreorlessanything.
A deluge of data of unprecedented and inevitable size must be
managed and understood.
People, computers, data and instruments must be linked.
On demand assignment of experts, computers, networks and
storage resources must be supported
4
TeraGrid: Integrating NSF Cyberinfrastructure
Buffalo
Wisc
UC/ANL
Utah
Cornell
Iowa
PU
NCAR
IU
NCSA
Caltech
PSC
ORNL
USC-ISI
UNC-RENCI
SDSC
TACC
TeraGrid is a facility that integrates computational, information, and analysis resources at the
San Diego Supercomputer Center, the Texas Advanced Computing Center, the University of
Chicago / Argonne National Laboratory, the National Center for Supercomputing Applications,
Purdue University, Indiana University, Oak Ridge National Laboratory, the Pittsburgh
Supercomputing Center, and the National Center for Atmospheric Research.
Today 100 Teraflop; tomorrow a petaflop; Indiana 20 teraflop today.
Virtual Observatory Astronomy Grid
Integrate Experiments
Radio
Far-Infrared
Visible
Dust Map
Visible + X-ray
6
Galaxy Density Map
Grid Capabilities for Science







Open technologies for any large scale distributed system that is adopted by
industry, many sciences and many countries (including UK, EU, USA, Asia)
• Security, Reliability, Management and state standards
Service and messaging specifications
User interfaces via portals and portlets virtualizing to desktops, email,
PDA’s etc.
• ~20 TeraGrid Science Gateways (their name for portals)
• OGCE Portal technology effort led by Indiana
Uniform approach to access distributed (super)computers supporting single
(large) jobs and spawning lots of related jobs
Data and meta-data architecture supporting real-time and archives as well
as federation
• Links to Semantic web and annotation
Grid (Web service) workflow with standards and several successful
instantiations (such as Taverna and MyLead)
Many Earth science grids including ESG (DoE), GEON, LEAD, SCEC,
SERVO; LTER and NEON for Environment
• http://www.nsf.gov/od/oci/ci-v7.pdf
7
eApparel





Much of the world’s manufacturing industry is
globalized and the apparel/textile industry is typical
We are working with Hong Kong Textile Industry to
link the Asian manufacturers with
design/marketing/purchase functions elsewhere (USA,
Europe)
Need to exchange designs, available fabrics and
discussions
Good example of e-infrastructure enabling
specialization in one geographical area to thrive
Software and digital animation outsourcing are good
examples
8
APEC Cooperation for Earthquake Simulation

ACES is a seven year-long collaboration among scientists
interested in earthquake and tsunami predication
• iSERVO is Infrastructure to support
work of ACES
• SERVOGrid is (completed) US Grid that is
a prototype of iSERVO
• http://www.quakes.uq.edu.au/ACES/

Chartered under APEC –
the Asia Pacific Economic
Cooperation of 21 economies
9
Repositories
Federated Databases
Database
Sensors
Streaming
Data
Field Trip Data
Database
Sensor Grid
Database Grid
Research
SERVOGrid
Education
Compute Grid
Data
Filter
Services Research
Simulations
?
GIS
Discovery Grid
Services
Customization
Services
From
Research
to Education
Analysis and
Visualization
Portal
Grid of Grids: Research Grid and Education Grid
Education
Grid
Computer
Farm 10
SERVOGrid and Cyberinfrastructure


Grids are the technology based on Web services that implement
Cyberinfrastructure i.e. support eScience or science as a team
sport
• Internet scale managed services that link computers data
repositories sensors instruments and people
There is a portal and services in SERVOGrid for
• Applications such as GeoFEST, RDAHMM, Pattern
Informatics, Virtual California (VC), Simplex, mesh
generating programs …..
• Job management and monitoring web services for running
the above codes.
• File management web services for moving files between
various machines.
• Geographical Information System services
• Quaketables earthquake specific database
• Sensors as well as databases
• Context (dynamic metadata) and UDDI system long term
metadata services
• Services support streaming real-time data
11
a
Site-specific Irregular
Scalar Measurements
Ice Sheets
Constellations for Plate
Boundary-Scale Vector
Measurements
a
a
Volcanoes
PBO
Greenland
Long Valley, CA
Topography
1 km
Stress Change
Northridge, CA
Earthquakes
Hector Mine, CA
12
Some Grid Concepts I


Services are “just” (distributed) programs sending and
receiving messages with well defined syntax
Interfaces (input-output) must be open; innards can be
open source (allowing you to modify) or proprietary
• Services can be any language from Fortran, Shell scripts, C,
C#, C++, Java, Python, Perl – your choice!!
• Web Services supported by all vendors (IBM, Microsoft …)

Service overhead will be just a few milliseconds (more
now) which is < typical network transit time
• Any program that is distributed can be a Web service
• Any program taking execution time ≥ 20ms can be an
efficient Web service
13
Web services

resources
Humans
service logic
BPEL, Java, .NET
Databases
Programs
Computational resources
message processing

Web Services build
loosely-coupled,
distributed
applications, (wrapping
existing codes and
databases) based on the
SOA (service oriented
architecture) principles.
Web Services interact
by exchanging messages
in SOAP format
The contracts for the
message exchanges that
implement those
interactions are
described via WSDL
interfaces.
SOAP and WSDL

Devices
<env:Envelope>
<env:Header>
...
</env:header>
<env:Body>
...
</env:Body>
</env:Envelope>
SOAP messages
14
Some Grid Concepts II

Systems are built from contributions from many different groups
– you do not need one “vendor” for all components as Web
services allow interoperability between components
• One reason DoD likes Grids (called Net-Centric computing)

Grids are distributed in services and data allowing anybody to
store their data and to produce “their” view
• Some think that University Library of future will curate/store data of
their faculty



“2 level programming model”: Classic programming of services
and services are composed using workflow consistent with
industry standards (BPEL)
Grid of Grids: (System of Systems) Realistically Grid-like
systems will be built using multiple technologies and “standards”
–integrate separate Grids for Sensors, GIS, Visualization,
computing etc. with OGSA (Open Grid Service Architecture
from OGF) system Grid (Security, registry) into a single Grid
Existing codes UNCHANGED; wrap as a service with metadata 15
TeraGrid User Portal
16
LEAD Gateway Portal
NSF Large ITR and Teragrid Gateway
- Adaptive Response to Mesoscale
weather events
- Supports Data exploration,Grid Workflow
Grid Workflow Data Assimilation in Earth Science

Grid services triggered by abnormal events and controlled by workflow process real
time data from radar and high resolution simulations for tornado forecasts
Use a Portlet-based user portal to access
and control services and workflow
18
SERVOGrid has a portal
The Portal is built from portlets
– providing user interface
fragments for each service
that are composed into the
full interface – uses OGCE
technology as does planetary
science VLAB portal with
University of Minnesota
19
Portlets v. Google Gadgets





Portals for Grid Systems are built using portlets with
software like GridSphere integrating these on the
server-side into a single web-page
Google (at least) offers the Google sidebar and Google
home page which support Web 2.0 services and do not
use a server side aggregator
Google is more user friendly!
The many Web 2.0 competitions is an interesting model
for promoting development in the world-wide
distributed collection of Web 2.0 developers
I guess Web 2.0 model will win!
20
GIS and Sensor Grids







OGC has defined a suite of data structures and services to
support Geographical Information Systems and Sensors
GML Geography Markup language defines specification of georeferenced data
SensorML and O&M (Observation and Measurements) define
meta-data and data structure for sensors
Services like Web Map Service, Web Feature Service, Sensor
Collection Service define services interfaces to access GIS and
sensor information
Grid workflow links services that are designed to support
streaming input and output messages
We built Grid (Web) service implementations of these
specifications for NASA’s SERVOGrid
Use Google maps as front end to WMS and WFS
21
Grid Workflow Datamining in Earth Science

NASA GPS

Work with Scripps Institute
Grid services controlled by workflow process real time
data from ~70 GPS Sensors in Southern California
Earthquake
Streaming Data
Support
Transformations
Data Checking
Hidden Markov
Datamining (JPL)
Display (GIS)
22
Earthquake
SERVOGrid
…
Earthquake Data,
Filters &
Simulation
Services
Tornado
Grid
Collaboration Grid
Sensor Grid
Registry
…
Portals
GIS Grid
Data Access/Storage
Ice Sheet PolarGrid
Ice Sheet Sensors,
SAR, Filters, EM,
Glacier Simulations
Visualization Grid
Compute Grid
Metadata
Core Grid Services
Security
Notification
Workflow
Messaging
Physical Network
Earth/Atmosphere Grids built as Grids of (library) Grids
23
Community Tools




e-mail and list-serves are oldest and best used
Kazaa, Instant Messengers, Skype, Napster, BitTorrent for P2P Collaboration –
text, audio-video conferencing, files
del.icio.us, Connotea, Citeulike, Bibsonomy, Biolicious manage shared
bookmarks
MySpace, YouTube, Bebo, Hotornot, Facebook, or similar sites allow you to
create (upload) community resources and share them; Friendster, LinkedIn
create networks
• http://en.wikipedia.org/wiki/List_of_social_networking_websites



Writely, Wikis and Blogs are powerful specialized shared document systems
ConferenceXP and WebEx share general applications
Google Scholar tells you who has cited your papers while publisher sites tell you
about co-authors
• Windows Live Academic Search has similar goals

Note sharing resources creates (implicit) communities
• Social network tools study graphs to both define communities and extract
their properties

Mashups link resources together (federation/workflow)
Mashups and Grids






http://www.programmableweb.com
There are 281 “commodity”
service Web 2.0 API’s on
October 1 06 (356 Jan 9 07)
Mashups are composed from
JavaScript, AJAX and REST
and not usually BPEL WSDL
and SOAP; Google Gadgets not
portlets
Architecture of Mashups and
Grids “identical”
See Amazon S3 Storage and
EC2 Elastic Computing services
Mashups enable everybody to
contribute
Mashup Matrix
Mashups using GoogleMaps
Indiana Map Mash-up
GIS Grid of “Indiana Map” and ~10 Indiana counties with accessible Map (Feature)
Servers from different vendors. Grids federate different data repositories (cf Astronomy
VO federating different observatory collections
27
eSports?






YouTube illustrates asynchronous
video sharing and video conferencing
illustrates synchronous video sharing
One can link trainers (or spectators)
and athletes globally with real time
video supporting video and text
annotation
Technically hard due to network
issues and allowing real-time playing
of annotated video
Exploring with China
Note IU could export coaching in
Soccer, Basketball etc
Example of Cyberinfrastructure
supporting geographically distributed
specialization
28
Minority Serving Institutions and the Grid
• Historically the R1 Research University powerhouses dominated
research due to their concentration of expertise
• Cyberinfrastructure allows others to participate in same way it
supports distributed open source software and distributed Web 2.0
• Navajo Nation (Colorado Plateau covering over 25,000 square
miles in northeast Arizona, northwest New Mexico, and southeast
Utah) with 110 communities and over 40% unemployment.
Building a wireless grid for education, healthcare
• http://www.win-hec.org/ World Indigenous Nations Higher
Education Consortium
• Cyberinfrastructure allows Nations to preserve their geographical
identity but participate fully with world class jobs and research
• Some 335 MSI’s in Alliance have similar hopes for
29
Cyberinfrastructure to jump start their advancement!
Example: Setting up a Polar CI-Grid
• The North and South poles are melting with potential huge
environmental impact
• As a result of MSI meetings, I am working with MSI ECSU in
North Carolina and Kansas University to design and set up a
Polar Grid (Cyberinfrastructure)
• This is a network of computers, sensors (on robots and
satellites), data and people aimed at understanding science of
ice-sheets and impact of global warming
• We have changed the 100,000 year Glacier cycle into a ~50
Typical Illustration
of effect of
year cycle; the field has increased dramatically
in importance
Climate Change on Greenland:
and interest
Velocity of Jakobshavn from
• Good area to get involved in as not so 1995
much
established
work
30
to 2005
as a function
of
distance from its end
31
32
PolarGrid

Important Polar Grid Cyberinfrastructure components
include
• Managed data from sensors and satellites
• Data analysis such as SAR processing – possibly with parallel
algorithms
• Electromagnetic simulations (currently commercial codes) to
design instrument antennas
• 3D simulations of ice-sheets (glaciers) with non-uniform
meshes
• GIS Geographical Information Systems

Also need capabilities present in many Grids
• Portal i.e. Science Gateway
• Submitting multiple sequential or parallel jobs

Power/Bandwidth Challenged Expedition Grids
33
Polar Expeditions
Archival – High Latency
F
B
Real Time Monitor
Low Bandwidth
F
Field
Base Camps
F
B
Real Time Monitor
IU
A
d
a
p
t
o
r
ECSU
Haskell
ECSU
Education
and Training
Core simulation
and
Data analysis
Existing IU
Low Bandwidth
F
Archival – High Latency
Other Polar Sensors and
Sensor Aggregators
(Non-polar and Polar Sites)
IU
l
a
y
e
r
Prototype Base/Field Grid
Existing CRESIS
B
F
F
34
Document-enhanced Cyberinfrastructure
Export:
RSS, Bibtex
Endnote etc.
Traditional
Cyberinfrastructure
Windows Live
Academic Search
Del.icio.us
CiteULike
Google Scholar
Connotea
Citeseer
Bibliographic
Database
MyResearch
Database
Science.gov
Generic Document Tools
Biolicious
PubMed
CMT
Conference
Management
etc.
Integration/
Enhancement
User Interface
New Document-enhanced
Research Tools
PubChem
Manuscript
Central
Community Tools
Bibsonomy
Existing
User Interface
Web service
Wrappers
Existing Document 35
based Research Tools
Delicious Semantic Web/Grid









http://del.icio.us purchased by Yahoo for ~$30M
http://www.CiteULike.org
http://www.connotea.org (Nature)
Associate metadata with Bookmarks specified by
URL’s, DOI’s (Digital Object Identifiers)
Users add comments and keywords (called tags)
Users are linked together into groups (communities)
Information such as title and authors extracted
automatically from some sites (PubMed, ACM, IEEE,
Wiley etc.)
Bibtex like additional information in CiteULike
This is perhaps de facto Semantic Web – remarkable
for its simplicity
36
Connotea queried by SERVOGrid
37
Document-enhanced Cyberinfrastructure
aka Semantic Scholar Grid I



Citeseer and Google Scholar scour the Internet and analyze
documents for incidental metadata
• Title, author and institution of documents
• Citations with their own metadata allowing one to match
to other documents
Science.gov extracts metadata from lots of US Government
databases
These capabilities are sure to become more powerful and to
be extended
• Give “Citation Index” in real time
• Tell you all authors of all papers that cite a paper that
cites you etc. (Note it’s a small world so don’t go too far
in link analysis)
• Tell you all citations of all papers in a workshop
38
Document-enhanced Cyberinfrastructure
aka Semantic Scholar Grid II

It is natural to develop core document Services such as those
used in Citeseer/Google Scholar but applied to “your”
documents of interest that may not have been processed yet
• As just submitted to a conference perhaps


These tools can help form useful lists such as authors of all cited
or submitted papers to a journal
OSCAR2/3 (from Peter Murray-Rust’s group at Cambridge)
augment the application independent “core” metadata (Title,
authors, institutions, Citations) with a list of all chemical terms
• This tool is a Service that can be applied to “your” document or to a set of
documents harvested in some fashion
• Other fields have natural application specific metadata and OSCAR like
tools can be developed for them

Such high value tools could appear on “publisher” sites of future
(or else publishers will disappear)
39