Cyberinfrastructure across the Globe Indiana University Computer Science Undergraduate Honors Seminar January 8 2007 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401 [email protected] http://www.infomall.org.
Download ReportTranscript Cyberinfrastructure across the Globe Indiana University Computer Science Undergraduate Honors Seminar January 8 2007 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401 [email protected] http://www.infomall.org.
Cyberinfrastructure across the Globe Indiana University Computer Science Undergraduate Honors Seminar January 8 2007 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401 [email protected] http://www.infomall.org 1 Abstract We discuss the role of Cyberinfrastructure (also called e-infrastructure and implemented by Grid technology) in a variety of global activities. These include the linking of researchers and data world wide in many fields; new generations of digital libraries and tools like Google Scholar; study of ice-sheets at the poles and the dramatic impact of Global warming; the study of earthquakes across the Pacific ocean; the linking of apparel manufacturers in Asia to designers in different continents and the command and control system for the Department of Defense. We discuss these applications and their associated technology. 2 Why Cyberinfrastructure Useful Supports distributed science – data, people, computers Exploits Internet technology (Web2.0) adding management, security, supercomputers etc. It has two aspects: parallel – low latency (microseconds) between nodes and distributed – highish latency (microseconds) between nodes Parallel needed to get high performance on individual 3D simulations, data analysis etc.; must decompose problem Distributed aspect integrates already distinct components Cyberinfrastructure is in general a distributed collection of parallel systems Grids are made of services that are “just” programs or data sources packaged for distributed access 3 e-moreorlessanything and the Grid ‘e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it.’ from its inventor John Taylor Director General of Research Councils UK, Office of Science and Technology e-Science is about developing tools and technologies that allow scientists to do ‘faster, better or different’ research Similarly e-Business captures an emerging view of corporations as dynamic virtual organizations linking employees, customers and stakeholders across the world. • The growing use of outsourcing is one example The Grid provides the information technology e-infrastructure for e-moreorlessanything. A deluge of data of unprecedented and inevitable size must be managed and understood. People, computers, data and instruments must be linked. On demand assignment of experts, computers, networks and storage resources must be supported 4 TeraGrid: Integrating NSF Cyberinfrastructure Buffalo Wisc UC/ANL Utah Cornell Iowa PU NCAR IU NCSA Caltech PSC ORNL USC-ISI UNC-RENCI SDSC TACC TeraGrid is a facility that integrates computational, information, and analysis resources at the San Diego Supercomputer Center, the Texas Advanced Computing Center, the University of Chicago / Argonne National Laboratory, the National Center for Supercomputing Applications, Purdue University, Indiana University, Oak Ridge National Laboratory, the Pittsburgh Supercomputing Center, and the National Center for Atmospheric Research. Today 100 Teraflop; tomorrow a petaflop; Indiana 20 teraflop today. Virtual Observatory Astronomy Grid Integrate Experiments Radio Far-Infrared Visible Dust Map Visible + X-ray 6 Galaxy Density Map Grid Capabilities for Science Open technologies for any large scale distributed system that is adopted by industry, many sciences and many countries (including UK, EU, USA, Asia) • Security, Reliability, Management and state standards Service and messaging specifications User interfaces via portals and portlets virtualizing to desktops, email, PDA’s etc. • ~20 TeraGrid Science Gateways (their name for portals) • OGCE Portal technology effort led by Indiana Uniform approach to access distributed (super)computers supporting single (large) jobs and spawning lots of related jobs Data and meta-data architecture supporting real-time and archives as well as federation • Links to Semantic web and annotation Grid (Web service) workflow with standards and several successful instantiations (such as Taverna and MyLead) Many Earth science grids including ESG (DoE), GEON, LEAD, SCEC, SERVO; LTER and NEON for Environment • http://www.nsf.gov/od/oci/ci-v7.pdf 7 eApparel Much of the world’s manufacturing industry is globalized and the apparel/textile industry is typical We are working with Hong Kong Textile Industry to link the Asian manufacturers with design/marketing/purchase functions elsewhere (USA, Europe) Need to exchange designs, available fabrics and discussions Good example of e-infrastructure enabling specialization in one geographical area to thrive Software and digital animation outsourcing are good examples 8 APEC Cooperation for Earthquake Simulation ACES is a seven year-long collaboration among scientists interested in earthquake and tsunami predication • iSERVO is Infrastructure to support work of ACES • SERVOGrid is (completed) US Grid that is a prototype of iSERVO • http://www.quakes.uq.edu.au/ACES/ Chartered under APEC – the Asia Pacific Economic Cooperation of 21 economies 9 Repositories Federated Databases Database Sensors Streaming Data Field Trip Data Database Sensor Grid Database Grid Research SERVOGrid Education Compute Grid Data Filter Services Research Simulations ? GIS Discovery Grid Services Customization Services From Research to Education Analysis and Visualization Portal Grid of Grids: Research Grid and Education Grid Education Grid Computer Farm 10 SERVOGrid and Cyberinfrastructure Grids are the technology based on Web services that implement Cyberinfrastructure i.e. support eScience or science as a team sport • Internet scale managed services that link computers data repositories sensors instruments and people There is a portal and services in SERVOGrid for • Applications such as GeoFEST, RDAHMM, Pattern Informatics, Virtual California (VC), Simplex, mesh generating programs ….. • Job management and monitoring web services for running the above codes. • File management web services for moving files between various machines. • Geographical Information System services • Quaketables earthquake specific database • Sensors as well as databases • Context (dynamic metadata) and UDDI system long term metadata services • Services support streaming real-time data 11 a Site-specific Irregular Scalar Measurements Ice Sheets Constellations for Plate Boundary-Scale Vector Measurements a a Volcanoes PBO Greenland Long Valley, CA Topography 1 km Stress Change Northridge, CA Earthquakes Hector Mine, CA 12 Some Grid Concepts I Services are “just” (distributed) programs sending and receiving messages with well defined syntax Interfaces (input-output) must be open; innards can be open source (allowing you to modify) or proprietary • Services can be any language from Fortran, Shell scripts, C, C#, C++, Java, Python, Perl – your choice!! • Web Services supported by all vendors (IBM, Microsoft …) Service overhead will be just a few milliseconds (more now) which is < typical network transit time • Any program that is distributed can be a Web service • Any program taking execution time ≥ 20ms can be an efficient Web service 13 Web services resources Humans service logic BPEL, Java, .NET Databases Programs Computational resources message processing Web Services build loosely-coupled, distributed applications, (wrapping existing codes and databases) based on the SOA (service oriented architecture) principles. Web Services interact by exchanging messages in SOAP format The contracts for the message exchanges that implement those interactions are described via WSDL interfaces. SOAP and WSDL Devices <env:Envelope> <env:Header> ... </env:header> <env:Body> ... </env:Body> </env:Envelope> SOAP messages 14 Some Grid Concepts II Systems are built from contributions from many different groups – you do not need one “vendor” for all components as Web services allow interoperability between components • One reason DoD likes Grids (called Net-Centric computing) Grids are distributed in services and data allowing anybody to store their data and to produce “their” view • Some think that University Library of future will curate/store data of their faculty “2 level programming model”: Classic programming of services and services are composed using workflow consistent with industry standards (BPEL) Grid of Grids: (System of Systems) Realistically Grid-like systems will be built using multiple technologies and “standards” –integrate separate Grids for Sensors, GIS, Visualization, computing etc. with OGSA (Open Grid Service Architecture from OGF) system Grid (Security, registry) into a single Grid Existing codes UNCHANGED; wrap as a service with metadata 15 TeraGrid User Portal 16 LEAD Gateway Portal NSF Large ITR and Teragrid Gateway - Adaptive Response to Mesoscale weather events - Supports Data exploration,Grid Workflow Grid Workflow Data Assimilation in Earth Science Grid services triggered by abnormal events and controlled by workflow process real time data from radar and high resolution simulations for tornado forecasts Use a Portlet-based user portal to access and control services and workflow 18 SERVOGrid has a portal The Portal is built from portlets – providing user interface fragments for each service that are composed into the full interface – uses OGCE technology as does planetary science VLAB portal with University of Minnesota 19 Portlets v. Google Gadgets Portals for Grid Systems are built using portlets with software like GridSphere integrating these on the server-side into a single web-page Google (at least) offers the Google sidebar and Google home page which support Web 2.0 services and do not use a server side aggregator Google is more user friendly! The many Web 2.0 competitions is an interesting model for promoting development in the world-wide distributed collection of Web 2.0 developers I guess Web 2.0 model will win! 20 GIS and Sensor Grids OGC has defined a suite of data structures and services to support Geographical Information Systems and Sensors GML Geography Markup language defines specification of georeferenced data SensorML and O&M (Observation and Measurements) define meta-data and data structure for sensors Services like Web Map Service, Web Feature Service, Sensor Collection Service define services interfaces to access GIS and sensor information Grid workflow links services that are designed to support streaming input and output messages We built Grid (Web) service implementations of these specifications for NASA’s SERVOGrid Use Google maps as front end to WMS and WFS 21 Grid Workflow Datamining in Earth Science NASA GPS Work with Scripps Institute Grid services controlled by workflow process real time data from ~70 GPS Sensors in Southern California Earthquake Streaming Data Support Transformations Data Checking Hidden Markov Datamining (JPL) Display (GIS) 22 Earthquake SERVOGrid … Earthquake Data, Filters & Simulation Services Tornado Grid Collaboration Grid Sensor Grid Registry … Portals GIS Grid Data Access/Storage Ice Sheet PolarGrid Ice Sheet Sensors, SAR, Filters, EM, Glacier Simulations Visualization Grid Compute Grid Metadata Core Grid Services Security Notification Workflow Messaging Physical Network Earth/Atmosphere Grids built as Grids of (library) Grids 23 Community Tools e-mail and list-serves are oldest and best used Kazaa, Instant Messengers, Skype, Napster, BitTorrent for P2P Collaboration – text, audio-video conferencing, files del.icio.us, Connotea, Citeulike, Bibsonomy, Biolicious manage shared bookmarks MySpace, YouTube, Bebo, Hotornot, Facebook, or similar sites allow you to create (upload) community resources and share them; Friendster, LinkedIn create networks • http://en.wikipedia.org/wiki/List_of_social_networking_websites Writely, Wikis and Blogs are powerful specialized shared document systems ConferenceXP and WebEx share general applications Google Scholar tells you who has cited your papers while publisher sites tell you about co-authors • Windows Live Academic Search has similar goals Note sharing resources creates (implicit) communities • Social network tools study graphs to both define communities and extract their properties Mashups link resources together (federation/workflow) Mashups and Grids http://www.programmableweb.com There are 281 “commodity” service Web 2.0 API’s on October 1 06 (356 Jan 9 07) Mashups are composed from JavaScript, AJAX and REST and not usually BPEL WSDL and SOAP; Google Gadgets not portlets Architecture of Mashups and Grids “identical” See Amazon S3 Storage and EC2 Elastic Computing services Mashups enable everybody to contribute Mashup Matrix Mashups using GoogleMaps Indiana Map Mash-up GIS Grid of “Indiana Map” and ~10 Indiana counties with accessible Map (Feature) Servers from different vendors. Grids federate different data repositories (cf Astronomy VO federating different observatory collections 27 eSports? YouTube illustrates asynchronous video sharing and video conferencing illustrates synchronous video sharing One can link trainers (or spectators) and athletes globally with real time video supporting video and text annotation Technically hard due to network issues and allowing real-time playing of annotated video Exploring with China Note IU could export coaching in Soccer, Basketball etc Example of Cyberinfrastructure supporting geographically distributed specialization 28 Minority Serving Institutions and the Grid • Historically the R1 Research University powerhouses dominated research due to their concentration of expertise • Cyberinfrastructure allows others to participate in same way it supports distributed open source software and distributed Web 2.0 • Navajo Nation (Colorado Plateau covering over 25,000 square miles in northeast Arizona, northwest New Mexico, and southeast Utah) with 110 communities and over 40% unemployment. Building a wireless grid for education, healthcare • http://www.win-hec.org/ World Indigenous Nations Higher Education Consortium • Cyberinfrastructure allows Nations to preserve their geographical identity but participate fully with world class jobs and research • Some 335 MSI’s in Alliance have similar hopes for 29 Cyberinfrastructure to jump start their advancement! Example: Setting up a Polar CI-Grid • The North and South poles are melting with potential huge environmental impact • As a result of MSI meetings, I am working with MSI ECSU in North Carolina and Kansas University to design and set up a Polar Grid (Cyberinfrastructure) • This is a network of computers, sensors (on robots and satellites), data and people aimed at understanding science of ice-sheets and impact of global warming • We have changed the 100,000 year Glacier cycle into a ~50 Typical Illustration of effect of year cycle; the field has increased dramatically in importance Climate Change on Greenland: and interest Velocity of Jakobshavn from • Good area to get involved in as not so 1995 much established work 30 to 2005 as a function of distance from its end 31 32 PolarGrid Important Polar Grid Cyberinfrastructure components include • Managed data from sensors and satellites • Data analysis such as SAR processing – possibly with parallel algorithms • Electromagnetic simulations (currently commercial codes) to design instrument antennas • 3D simulations of ice-sheets (glaciers) with non-uniform meshes • GIS Geographical Information Systems Also need capabilities present in many Grids • Portal i.e. Science Gateway • Submitting multiple sequential or parallel jobs Power/Bandwidth Challenged Expedition Grids 33 Polar Expeditions Archival – High Latency F B Real Time Monitor Low Bandwidth F Field Base Camps F B Real Time Monitor IU A d a p t o r ECSU Haskell ECSU Education and Training Core simulation and Data analysis Existing IU Low Bandwidth F Archival – High Latency Other Polar Sensors and Sensor Aggregators (Non-polar and Polar Sites) IU l a y e r Prototype Base/Field Grid Existing CRESIS B F F 34 Document-enhanced Cyberinfrastructure Export: RSS, Bibtex Endnote etc. Traditional Cyberinfrastructure Windows Live Academic Search Del.icio.us CiteULike Google Scholar Connotea Citeseer Bibliographic Database MyResearch Database Science.gov Generic Document Tools Biolicious PubMed CMT Conference Management etc. Integration/ Enhancement User Interface New Document-enhanced Research Tools PubChem Manuscript Central Community Tools Bibsonomy Existing User Interface Web service Wrappers Existing Document 35 based Research Tools Delicious Semantic Web/Grid http://del.icio.us purchased by Yahoo for ~$30M http://www.CiteULike.org http://www.connotea.org (Nature) Associate metadata with Bookmarks specified by URL’s, DOI’s (Digital Object Identifiers) Users add comments and keywords (called tags) Users are linked together into groups (communities) Information such as title and authors extracted automatically from some sites (PubMed, ACM, IEEE, Wiley etc.) Bibtex like additional information in CiteULike This is perhaps de facto Semantic Web – remarkable for its simplicity 36 Connotea queried by SERVOGrid 37 Document-enhanced Cyberinfrastructure aka Semantic Scholar Grid I Citeseer and Google Scholar scour the Internet and analyze documents for incidental metadata • Title, author and institution of documents • Citations with their own metadata allowing one to match to other documents Science.gov extracts metadata from lots of US Government databases These capabilities are sure to become more powerful and to be extended • Give “Citation Index” in real time • Tell you all authors of all papers that cite a paper that cites you etc. (Note it’s a small world so don’t go too far in link analysis) • Tell you all citations of all papers in a workshop 38 Document-enhanced Cyberinfrastructure aka Semantic Scholar Grid II It is natural to develop core document Services such as those used in Citeseer/Google Scholar but applied to “your” documents of interest that may not have been processed yet • As just submitted to a conference perhaps These tools can help form useful lists such as authors of all cited or submitted papers to a journal OSCAR2/3 (from Peter Murray-Rust’s group at Cambridge) augment the application independent “core” metadata (Title, authors, institutions, Citations) with a list of all chemical terms • This tool is a Service that can be applied to “your” document or to a set of documents harvested in some fashion • Other fields have natural application specific metadata and OSCAR like tools can be developed for them Such high value tools could appear on “publisher” sites of future (or else publishers will disappear) 39