Cyberinfrastructure for e-Education and e-Research (e-Science) Cyberinfrastructure Days New Mexico Highlands University Las Vegas NM March 10-11 2008 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington.
Download ReportTranscript Cyberinfrastructure for e-Education and e-Research (e-Science) Cyberinfrastructure Days New Mexico Highlands University Las Vegas NM March 10-11 2008 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington.
Cyberinfrastructure for e-Education and e-Research (e-Science) Cyberinfrastructure Days New Mexico Highlands University Las Vegas NM March 10-11 2008 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401 [email protected] http://www.infomall.org 1 e-moreorlessanything ‘e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it.’ from inventor of term John Taylor Director General of Research Councils UK, Office of Science and Technology e-Science is about developing tools and technologies that allow scientists to do ‘faster, better or different’ research Similarly e-Business captures an emerging view of corporations as dynamic virtual organizations linking employees, customers and stakeholders across the world. This generalizes to e-moreorlessanything including eDigitalLibrary, e-NationalSecurity, e-HavingFun and e-Education A deluge of data of unprecedented and inevitable size must be managed and understood. People (virtual organizations), computers, data (including sensors and instruments) must be linked via hardware and software networks 2 Applications, Infrastructure, Technologies This field is confused by inconsistent use of terminology; I define Web Services, Grids and (aspects of) Web 2.0 (Enterprise 2.0) are technologies Grids could be everything (Broad Grids implementing some sort of managed web) or reserved for specific architectures like OGSA or Web Services (Narrow Grids) These technologies combine and compete to build electronic infrastructures termed e-infrastructure or Cyberinfrastructure e-moreorlessanything is an emerging application area of broad importance that is hosted on the infrastructures e-infrastructure or Cyberinfrastructure e-Science or perhaps better e-Research is a special case of emoreorlessanything 3 What is Cyberinfrastructure Cyberinfrastructure is (from NSF) infrastructure that supports distributed science (e-Science)– data, people, computers • Clearly core concept more general than Science Exploits Internet technology (Web2.0) adding (via Grid technology) management, security, supercomputers etc. It has two aspects: parallel – low latency (microseconds) between nodes and distributed – highish latency (milliseconds) between nodes Parallel needed to get high performance on individual large simulations, data analysis etc.; must decompose problem • New Mexico Encanto supercomputer excellent parallel resource Distributed aspect integrates already distinct components – especially natural for data 4 Underpinnings of Cyberinfrastructure Distributed software systems are being “revolutionized” by developments from e-commerce, e-Science and the consumer Internet. There is rapid progress in technology families termed “Web services”, “Grids” and “Web 2.0” The emerging distributed system picture is of distributed services with advertised interfaces but opaque implementations communicating by streams of messages over a variety of protocols • Complete systems are built by combining either services or predefined/pre-existing collections of services together to achieve new capabilities As well as Internet/Communication revolutions (distributed systems), multicore chips will likely be hugely important (parallel systems) 5 Virtual Observatory Astronomy Grid Integrate Experiments Radio Far-Infrared Visible Dust Map Visible + X-ray Galaxy Density Map6 Example: Setting up a Polar CI-Grid • The North and South poles are melting with potential huge environmental impact • As a result of MSI meetings, I am working with MSI ECSU in North Carolina and Kansas University to design and set up a Polar Grid (Cyberinfrastructure) • This is a network of computers, sensors (on robots and satellites), data and people aimed at understanding science of ice-sheets and impact of global warming • We have changed the 100,000 year Glacier cycle into a ~50 year cycle; the field has increased dramatically in importance and interest • Good area to get involved in as not so much established work 7 8 CYBERINFRASTRUCTURE CENTER FOR POLAR SCIENCE (CICPS) TeraGrid resources include more than 250 teraflops of computing capability and more than 30 petabytes of online and archival data storage, with rapid access and retrieval over high-performance networks. TeraGrid is coordinated at the University of Chicago, working with the Resource Provider sites: Indiana University, Oak Ridge National Laboratory, National Center for Supercomputing Applications, Pittsburgh Supercomputing Center, Purdue University, San Diego Supercomputer Center, Texas Advanced Computing Center, University of Chicago/Argonne National Laboratory, and the National Center for Atmospheric Research. UW Grid Infrastructure Group (UChicago) PSC UC/ANL NCAR PU NCSA Caltech IU UNC/RENCI ORNL USC/ISI SDSC TACC Resource Provider (RP) Software Integration Partner Computing and Cyberinfrastructure: TeraGrid Large Hadron Collider CERN, Geneva: 2008 Start pp s =14 TeV L=1034 cm-2 s-1 27 km Tunnel in Switzerland & France CMS TOTEM Atlas pp, general purpose; HI 5000+ Physicists 250+ Institutes 60+ Countries ALICE : HI LHCb: B-physics Higgs, SUSY,Analyze Extra Dimensions, CP Violation, QG Challenges: petabytes of complex data cooperatively Harness data & network resources Plasma, … global computing, the Unexpected Environmental Monitoring Sensor Grid at Clemson 11 Sensor Grids Can be Fun Note sensors are any time dependent source of information and a fixed source of information is just a broken sensor • • • • • • • • • • SAR Satellites Environmental Monitors Nokia N800 pocket computers RFID tags and readers GPS Sensors Lego Robots RSS Feeds Audio/video: web-cams Presentation of teacher in distance education Text chats of students 12 The Sensors on the Fun Grid Laptop for PowerPoint 2 Robots used Lego Robot GPS Nokia N800 RFID Tag RFID Reader 13 Data from the Robot RFID Sensors Data from GPS geolocates other sensors Sensor Data from Lego Light sensor plus videocams from N800 carried as payload on Lego RFID Reader sees many tags 14 BIRN Bioinformatics Research Network 15 The People in Cyberinfrastructure Web 2.0 can enhance scientific collaboration, i.e. effectively support virtual organizations, in different ways from grids I expect more resources like MyExperiment from UK, SciVee from SDSC and Connotea from Nature that offer Flickr, YouTube, Facebook, Second Life type capabilities optimized for science The usability and participatory nature of Web 2.0 can bring science and its informatics to a broader audience In particular distance collaborative aspects of such Cyberinfrastructure can level playing field; you do not have to be at Harvard etc. to succeed • e.g. ECSU in CReSIS NSF Science and Technology Center • Navajo Tech can access TeraGrid Science Gateways 16 SciVee: Share videos etc. Connotea: Share links/comments All have tags 17 MSI-CIEC Web 2.0 Research Matching Portal Portal supporting tagging and linkage of Cyberinfrastructure Resources NSF (soon other agencies) Solicitations and Awards MSI-CIEC Portal Homepage Feeds such as SciVee and NSF Researchers on NSF Awards User and Friends TeraGrid Allocations Search Results Search for linked people, grants etc. Could also be used to support matching of students and faculty for REUs etc. MSI-CIEC Portal Homepage Search Results 18 19 The social process of science 2.0 Digital Libraries Virtual Learning Environment Undergraduate Students scientists Graduate Students Reprints PeerReviewed Journal & Conference Papers Technical Preprints Reports & Metadata Repositories experimentation Local Web Certified Experimental Results & Analyses Data, Metadata Provenance Workflows Ontologies 20 Data and Cyberinfrastructure DIKW: Data Information Knowledge Wisdom transformation Applies to e-Science, Distributed Business Enterprise (including outsourcing), Military Command and Control and general decision support (SOAP or just RSS) messages transport information expressed in a semantically rich fashion between sources and services that enhance and transform information so that complete system provides • Semantic Web technologies like RDF and OWL might help us to have rich expressivity but they might be too complicated We are meant to build application specific information management/transformation systems for each domain • Each domain has Specific Services/Standards (for API’s and Information such as KML and GML for Geographical Information Systems) • and will use Generic Services (like R for datamining) and • Generic Standards (such as RDF, WSDL) Standards made before consensus or not observant of technology progress are dubious 21 Raw Data Data Information Knowledge Wisdom Decisions Information and Cyberinfrastructure S S S S S S fs SS fs fs S S S S fs fs fs fs S S fs S S S S S S Discovery Cloud fs fs Filter Cloud fs S S fs Filter Service fs Compute Cloud Database Filter Cloud Filter Service fs SS SS Filter Cloud fs SS Another Grid fs fs Filter Cloud fs Discovery Cloud fs fs Filter Service fs SS Filter Service fs SS SS fs fs Filter Cloud Another Service S S Another Grid Another Grid Traditional Grid with exposed services Filter Cloud S S S S Storage Cloud S S Sensor or Data Interchange Service 22 APEC Cooperation for Earthquake Simulation ACES is a eight year-long collaboration among scientists interested in earthquake and tsunami predication • iSERVO is Infrastructure to support work of ACES • SERVOGrid is (completed) US Grid that is a prototype of iSERVO • http://www.quakes.uq.edu.au/ACES/ Chartered under APEC – the Asia Pacific Economic Cooperation of 21 economies 23 Repositories Federated Databases Database Sensors Streaming Data Field Trip Data Database Sensor Grid Database Grid Research SERVOGrid Education Compute Grid Data Filter Services Research Simulations ? GIS Discovery Grid Services Customization Services From Research to Education Analysis and Visualization Portal Grid of Grids: Research Grid and Education Grid Education Grid Computer Farm 24 Grid Workflow Datamining in Earth Science NASA GPS Work with Scripps Institute Grid services controlled by workflow process real time data from ~70 GPS Sensors in Southern California Earthquake Streaming Data Support Archival Transformations Data Checking Hidden Markov Datamining (JPL) Real Time Display (GIS) 25 Grid Workflow Data Assimilation in Earth Science Grid services triggered by abnormal events and controlled by workflow process real time data from radar and high resolution simulations for tornado forecasts Typical graphical interface to service composition 26