Grids and Web 2.0 supporting eScience STEM Scholars Seminar Indiana University Memorial Union August 1 2007 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington.

Download Report

Transcript Grids and Web 2.0 supporting eScience STEM Scholars Seminar Indiana University Memorial Union August 1 2007 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington.

Grids and Web 2.0 supporting
eScience
STEM Scholars Seminar
Indiana University Memorial Union
August 1 2007
Geoffrey Fox
Computer Science, Informatics, Physics
Pervasive Technology Laboratories
Indiana University Bloomington IN 47401
[email protected]
http://www.infomall.org
1
Community Grids Laboratory
Technology Expertise

Web Service and Web 2.0 technologies for large scale
distributed systems -- largely to support science
• Web Services: Integrate ideas in Enterprise Software into
science
• Web 2.0: Integrate ideas in Flickr Connotea Slideshare Scribd
and YouTabe into science





Geographical Information Systems (e.g. Google Maps)
Streaming Sensor data (including audio-video streams)
Portals (User Interfaces)
Parallel computing to make computers fast
Technologies built as part of applications
2
Community Grids Laboratory Projects







Funded by NSF NASA NIH DoE and DoD
Cheminformatics – High Throughput Screening data and
filtering; PubChem PubMed including document analysis
Interactive Particle Physics Data Analysis
Earthquake Science predicting earthquakes using simulations
and satellite and GPS global positioning system Sensor Grid
eSports collaboration for real time trainers and sportsman with
HPER IU School of Health, Physical Education, and Recreation.
Ice Sheet Dynamics – melting of Glaciers
Navajo Nation Grid Education (Science Gateways) and
Healthcare
• Web 2.0 tutorial and distance education course spring 2007

Architecture of Air Force Sensor and Decision support systems
3
Why Cyberinfrastructure Useful







Supports distributed science – data, people, computers
Exploits Internet technology (Web2.0) adding (via Grid
technology) management, security, supercomputers etc.
It has two aspects: parallel – low latency (microseconds)
between nodes and distributed – highish latency (milliseconds)
between nodes
Parallel needed to get high performance on individual 3D
simulations, data analysis etc.; must decompose problem
Distributed aspect integrates already distinct components
Cyberinfrastructure is in general a distributed collection of
parallel systems
Cyberinfrastructure is made of services (usually Web services)
that are “just” programs or data sources packaged for
distributed access
4







e-moreorlessanything and
Cyberinfrastructure
‘e-Science is about global collaboration in key areas of science,
and the next generation of infrastructure that will enable it.’ from
its inventor John Taylor Director General of Research Councils
UK, Office of Science and Technology
e-Science is about developing tools and technologies that allow
scientists to do ‘faster, better or different’ research
Similarly e-Business captures an emerging view of corporations as
dynamic virtual organizations linking employees, customers and
stakeholders across the world.
• The growing use of outsourcing is one example
The Grid or Web 2.0 (Enterprise 2.0) provides the information
technology e-infrastructure for e-moreorlessanything.
A deluge of data of unprecedented and inevitable size must be
managed and understood.
People (see Web 2.0), computers, data and instruments must be
linked.
On demand assignment of experts, computers, networks and
storage resources must be supported
5
TeraGrid: Integrating NSF Cyberinfrastructure
Buffalo
Wisc
UC/ANL
Utah
Cornell
Iowa
PU
NCAR
IU
NCSA
Caltech
PSC
ORNL
USC-ISI
UNC-RENCI
SDSC
TACC
TeraGrid is a facility that integrates computational, information, and analysis resources at the
San Diego Supercomputer Center, the Texas Advanced Computing Center, the University of
Chicago / Argonne National Laboratory, the National Center for Supercomputing Applications,
Purdue University, Indiana University, Oak Ridge National Laboratory, the Pittsburgh
Supercomputing Center, and the National Center for Atmospheric Research.
Today 250 Teraflop; tomorrow a petaflop; Indiana 20 teraflop today becoming 30 teraflop
Virtual Observatory Astronomy Grid
Integrate Experiments
Radio
Far-Infrared
Visible
Dust Map
Visible + X-ray
7
Galaxy Density Map
Grid Capabilities for Science







Open technologies for any large scale distributed system that is adopted by
industry, many sciences and many countries (including UK, EU, USA, Asia)
• Security, Reliability, Management and state standards
Service and messaging specifications
User interfaces via portals and portlets virtualizing to desktops, email,
PDA’s etc.
• ~20 TeraGrid Science Gateways (their name for portals)
• OGCE Portal technology effort led by Indiana
Uniform approach to access distributed (super)computers supporting single
(large) jobs and spawning lots of related jobs
Data and meta-data architecture supporting real-time and archives as well
as federation
• Links to Semantic web and annotation
Grid (Web service) workflow with standards and several successful
instantiations (such as Taverna and MyLead)
Many Earth science grids including ESG (DoE), GEON, LEAD, SCEC,
SERVO; LTER and NEON for Environment
• http://www.nsf.gov/od/oci/ci-v7.pdf
8
Old and New (Web 2.0) Community Tools




e-mail and list-serves are oldest and best used
Kazaa, Instant Messengers, Skype, Napster, BitTorrent for P2P
Collaboration – text, audio-video conferencing, files
del.icio.us, Connotea, Citeulike, Bibsonomy, Biolicious manage
shared bookmarks
MySpace, YouTube, Bebo, Hotornot, Facebook, or similar sites
allow you to create (upload) community resources and share
them; Friendster, LinkedIn create networks
• http://en.wikipedia.org/wiki/List_of_social_networking_websites



Writely, Wikis and Blogs are powerful specialized shared
document systems
ConferenceXP and WebEx share general applications
Google Scholar tells you who has cited your papers while
publisher sites tell you about co-authors
• Windows Live Academic Search has similar goals

Note sharing resources creates (implicit) communities
• Social network tools study graphs to both define communities
and extract their properties
“Best Web 2.0 Sites” -- 2006

Extracted from http://web2.wsj2.com/
Social Networking

Start Pages

Social Bookmarking

Peer Production News

Social Media Sharing

Online Storage
(Computing)

10
Web 2.0 Systems are Portals, Services, Resources

Captures the incredible development of interactive Web
sites enabling people to create and collaborate
11
Mashups v Workflow?





Mashup Tools are reviewed at http://blogs.zdnet.com/Hinchcliffe/?p=63
Workflow Tools are reviewed by Gannon and Fox
http://grids.ucs.indiana.edu/ptliupages/publications/Workflow-overview.pdf
Both include
scripting in PHP,
Python, sh etc. as
both implement
distributed
programming at level
of services
Mashups use all
types of service
interfaces and do not
have the potential
robustness (security)
of Grid service
approach
Typically “pure”
HTTP (REST)
12
Grid Workflow Datamining in Earth Science

NASA GPS

Work with Scripps Institute
Grid services controlled by workflow process real time
data from ~70 GPS Sensors in Southern California
Earthquake
Streaming Data
Support
Archival
Transformations
Data Checking
Hidden Markov
Datamining (JPL)
Real Time
Display (GIS)
13
Web 2.0 uses all types of Services

Here a Gadget Mashup uses a 3 service workflow with
a JavaScript Gadget Client
14
Web 2.0 APIs


http://www.programmable
web.com/apis has (May 14
2007) 431 Web 2.0 APIs
with GoogleMaps the most
often used in Mashups
This site acts as a “UDDI”
for Web 2.0
The List of
Web 2.0 API’s





Each site has API and
its features
Divided into broad
categories
Only a few used a lot
(42 API’s used in
more than 10
mashups)
RSS feed of new APIs
Amazon S3 growing
in popularity
4 more Mashups
each day



Growing number of commercial Mashup Tools
For a total of 1906
April 17 2007 (4.0 a
day over last
month)
Note ClearForest
runs Semantic Web
Services Mashup
competitions (not
workflow
competitions)
Some Mashup
types: aggregators,
search aggregators,
visualizers, mobile,
maps, games
Mash
Planet
Web 2.0
Architecture
http://www.imagine
-it.org/mashplanet
Display too large to
be a Gadget
18
Searched on Transit/Transportation
19
20
Grid-style portal as used in Earthquake Grid
The Portal is built from portlets
– providing user interface
fragments for each service
that are composed into the
full interface – uses OGCE
technology as does planetary
science VLAB portal with
University of Minnesota
Now to Portals
21
Note the many competitions powering Web 2.0
Mashup Development
Portlets v. Google Gadgets





Portals for Grid Systems are built using portlets with
software like GridSphere integrating these on the
server-side into a single web-page
Google (at least) offers the Google sidebar and Google
home page which support Web 2.0 services and do not
use a server side aggregator
Google is more user friendly!
The many Web 2.0 competitions is an interesting model
for promoting development in the world-wide
distributed collection of Web 2.0 developers
I guess Web 2.0 model will win!
22
Building Distributed Systems or
Cyberinfrastructure for Science

One use Web 2.0 which is more intuitive and has lower
barrier to entry
• Typically uses PHP

Or Web Service technology which is more powerful
(e.g. for security) but has a high learning and
infrastructure overhead
• Typically uses Java



One can use Grid resources like TeraGrid and/or
Web 2.0 capabilities like MySpace, Google Maps
We try to use best of both worlds!
23
24
Workflows - Taverna
(taverna.sourceforge.net)
25
Closing CMS for the first time (July)
Michel Della Negra/Opening Session/18 September 2006
27
Higgs diphoton Analysis using Rootlets
28
Ice Sheet Dynamics
29
My Tags Menu
Opened up.
My Account
also opens up
to show
account and
profile
information
30
Add To CITeam button opens new window
Clicking the Add To CITeam
button opens up this box to add
information about this page (tags,
description, etc), which will be
added to our database and to
Connotea
31