Cyberinfrastructure to integrate simulation, data and sensors for collaborative eScience in CRESIS CERSER and CRESIS http://nia.ecsu.edu/ Elizabeth City State University October 19 2006 Geoffrey Fox Computer Science,
Download ReportTranscript Cyberinfrastructure to integrate simulation, data and sensors for collaborative eScience in CRESIS CERSER and CRESIS http://nia.ecsu.edu/ Elizabeth City State University October 19 2006 Geoffrey Fox Computer Science,
Cyberinfrastructure to integrate simulation, data and sensors for collaborative eScience in CRESIS CERSER and CRESIS http://nia.ecsu.edu/ Elizabeth City State University October 19 2006 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401 [email protected] http://www.infomall.org 1 Abstract Cyberinfrastructure supports eScience or collaborative science with distributed scientists, computers, data repositories and sensors. We describe the emerging Grid software for eScience and the underlying Cyberinfrastructure such as the TeraGrid. We give one examples in detail: iSERVO – the International Solid Earth Research Virtual Organization supporting Earthquake Science This illustrates Computing Grids, Geographical Information System Grids, Sensor Grids We suggest implications for CReSIS – Center for Remote Sensing of Ice Sheets 2 Why Cyberinfrastructure Useful Supports distributed science – data, people, computers Exploits Internet technology (Web2.0) adding management, security, supercomputers etc. It has two aspects: parallel – low latency (microseconds) between nodes and distributed – highish latency (milliseconds) between nodes Parallel needed to get high performance on individual 3D simulations, data analysis etc.; must decompose problem Distributed aspect integrates already distinct components Cyberinfrastructure is in general a distributed collection of parallel systems Grids are made of services that are “just” programs or data sources packaged for distributed access 3 e-moreorlessanything and the Grid ‘e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it.’ from its inventor John Taylor Director General of Research Councils UK, Office of Science and Technology e-Science is about developing tools and technologies that allow scientists to do ‘faster, better or different’ research Similarly e-Business captures an emerging view of corporations as dynamic virtual organizations linking employees, customers and stakeholders across the world. • The growing use of outsourcing is one example The Grid provides the information technology e-infrastructure for e-moreorlessanything. A deluge of data of unprecedented and inevitable size must be managed and understood. People, computers, data and instruments must be linked. On demand assignment of experts, computers, networks and storage resources must be supported 4 TeraGrid: Integrating NSF Cyberinfrastructure Buffalo Wisc UC/ANL Utah Cornell Iowa PU NCAR IU NCSA Caltech PSC ORNL USC-ISI UNC-RENCI SDSC TACC TeraGrid is a facility that integrates computational, information, and analysis resources at the San Diego Supercomputer Center, the Texas Advanced Computing Center, the University of Chicago / Argonne National Laboratory, the National Center for Supercomputing Applications, Purdue University, Indiana University, Oak Ridge National Laboratory, the Pittsburgh Supercomputing Center, and the National Center for Atmospheric Research. Today 100 Teraflop; tomorrow a petaflop; Indiana 20 teraflop today. Virtual Observatory Astronomy Grid Integrate Experiments Radio Far-Infrared Visible Dust Map Visible + X-ray 6 Galaxy Density Map Grid Capabilities for Science Open technologies for any large scale distributed system that is adopted by industry, many sciences and many countries (including UK, EU, USA, Asia) • Security, Reliability, Management and state standards Service and messaging specifications User interfaces via portals and portlets virtualizing to desktops, email, PDA’s etc. • ~20 TeraGrid Science Gateways (their name for portals) • OGCE Portal technology effort led by Indiana Uniform approach to access distributed (super)computers supporting single (large) jobs and spawning lots of related jobs Data and meta-data architecture supporting real-time and archives as well as federation • Links to Semantic web and annotation Grid (Web service) workflow with standards and several successful instantiations (such as Taverna and MyLead) Many Earth science grids including ESG (DoE), GEON, LEAD, SCEC, SERVO; LTER and NEON for Environment • http://www.nsf.gov/od/oci/ci-v7.pdf 7 APEC Cooperation for Earthquake Simulation ACES is a seven year-long collaboration among scientists interested in earthquake and tsunami predication • iSERVO is Infrastructure to support work of ACES • SERVOGrid is (completed) US Grid that is a prototype of iSERVO • http://www.quakes.uq.edu.au/ACES/ Chartered under APEC – the Asia Pacific Economic Cooperation of 21 economies 8 Repositories Federated Databases Database Sensors Streaming Data Field Trip Data Database Sensor Grid Database Grid Research SERVOGrid Education Compute Grid Data Filter Services Research Simulations ? GIS Discovery Grid Services Customization Services From Research to Education Analysis and Visualization Portal Grid of Grids: Research Grid and Education Grid Education Grid Computer Farm 9 SERVOGrid and Cyberinfrastructure Grids are the technology based on Web services that implement Cyberinfrastructure i.e. support eScience or science as a team sport • Internet scale managed services that link computers data repositories sensors instruments and people There is a portal and services in SERVOGrid for • Applications such as GeoFEST, RDAHMM, Pattern Informatics, Virtual California (VC), Simplex, mesh generating programs ….. • Job management and monitoring web services for running the above codes. • File management web services for moving files between various machines. • Geographical Information System services • Quaketables earthquake specific database • Sensors as well as databases • Context (dynamic metadata) and UDDI system long term metadata services • Services support streaming real-time data 10 a Site-specific Irregular Scalar Measurements Ice Sheets Constellations for Plate Boundary-Scale Vector Measurements a a Volcanoes PBO Greenland Long Valley, CA Topography 1 km Stress Change Northridge, CA Earthquakes Hector Mine, CA 11 Some Grid Concepts I Services are “just” (distributed) programs sending and receiving messages with well defined syntax Interfaces (input-output) must be open; innards can be open source (allowing you to modify) or proprietary • Services can be any language from Fortran, Shell scripts, C, C#, C++, Java, Python, Perl – your choice!! • Web Services supported by all vendors (IBM, Microsoft …) Service overhead will be just a few milliseconds (more now) which is < typical network transit time • Any program that is distributed can be a Web service • Any program taking execution time ≥ 20ms can be an efficient Web service 12 Web services resources Humans service logic BPEL, Java, .NET Databases Programs Computational resources message processing Web Services build loosely-coupled, distributed applications, (wrapping existing codes and databases) based on the SOA (service oriented architecture) principles. Web Services interact by exchanging messages in SOAP format The contracts for the message exchanges that implement those interactions are described via WSDL interfaces. SOAP and WSDL Devices <env:Envelope> <env:Header> ... </env:header> <env:Body> ... </env:Body> </env:Envelope> SOAP messages 13 A typical Web Service In principle, services can be in any language (Fortran .. Java .. Perl .. Python) and the interfaces can be method calls, Java RMI Messages, CGI Web invocations, totally compiled away (inlining) The simplest implementations involve XML messages (SOAP) and programs written in net friendly languages like Java and Python Web Services WSDL interfaces Portal Service Security WSDL interfaces Web Services Payment Credit Card Catalog Warehouse Shipping control 14 Some Grid Concepts II Systems are built from contributions from many different groups – you do not need one “vendor” for all components as Web services allow interoperability between components • One reason DoD likes Grids (called Net-Centric computing) Grids are distributed in services and data allowing anybody to store their data and to produce “their” view • Some think that University Library of future will curate/store data of their faculty “2 level programming model”: Classic programming of services and services are composed using workflow consistent with industry standards (BPEL) Grid of Grids: (System of Systems) Realistically Grid-like systems will be built using multiple technologies and “standards” –integrate separate Grids for Sensors, GIS, Visualization, computing etc. with OGSA (Open Grid Service Architecture from OGF) system Grid (Security, registry) into a single Grid Existing codes UNCHANGED; wrap as a service with metadata 15 TeraGrid User Portal 16 LEAD Gateway Portal NSF Large ITR and Teragrid Gateway - Adaptive Response to Mesoscale weather events - Supports Data exploration,Grid Workflow Grid Workflow Data Assimilation in Earth Science Grid services triggered by abnormal events and controlled by workflow process real time data from radar and high resolution simulations for tornado forecasts Use a Portlet-based user portal to access and control services and workflow 18 SERVOGrid has a portal The Portal is built from portlets – providing user interface fragments for each service that are composed into the full interface – uses OGCE technology as does planetary science VLAB portal with University of Minnesota 19 GIS and Sensor Grids OGC has defined a suite of data structures and services to support Geographical Information Systems and Sensors GML Geography Markup language defines specification of georeferenced data SensorML and O&M (Observation and Measurements) define meta-data and data structure for sensors Services like Web Map Service, Web Feature Service, Sensor Collection Service define services interfaces to access GIS and sensor information Grid workflow links services that are designed to support streaming input and output messages We built Grid (Web) service implementations of these specifications for NASA’s SERVOGrid Use Google maps as front end to WMS and WFS 20 Grid Workflow Datamining in Earth Science NASA GPS Work with Scripps Institute Grid services controlled by workflow process real time data from ~70 GPS Sensors in Southern California Earthquake Streaming Data Support Transformations Data Checking Hidden Markov Datamining (JPL) Display (GIS) 21 Earthquake SERVOGrid … Earthquake Data, Filters & Simulation Services Tornado Grid Collaboration Grid Sensor Grid Registry … Portals GIS Grid Data Access/Storage Ice Sheet PolarGrid Ice Sheet Sensors, SAR, Filters, EM, Glacier Simulations Visualization Grid Compute Grid Metadata Core Grid Services Security Notification Workflow Messaging Physical Network Earth/Atmosphere Grids built as Grids of (library) Grids 22 CReSIS PolarGrid Important CReSIS-specific Cyberinfrastructure components include • Managed data from sensors and satellites • Data analysis such as SAR processing – possibly with parallel algorithms • Electromagnetic simulations (currently commercial codes) to design instrument antennas • 3D simulations of ice-sheets (glaciers) with non-uniform meshes • GIS Geographical Information Systems Also need capabilities present in many Grids • Portal i.e. Science Gateway • Submitting multiple sequential or parallel jobs 23 What should we do? Identify existing programs that should be wrapped as Grid services • One can do this even for commercial codes as one keeps existing codes (Fortran, C++) unchanged and constructs a “metadata” wrapper defining where programs and its data are located and how to invoke. Identify where parallel versions needed and if help needed in creating these • Parallel codes can be Grid services • Electromagnetic codes are commercial – in principle parallel • Ice sheet models can be parallelized for high resolution simulations Scope out system; Computational needs -Identify value of TeraGrid; data storage needs; network requirements Examine data model and produce a data Grid architecture • Use databases? Distributed? Metadata? Files? What are key performance issues? Examine integration of GIS with Grid Services Design and implement Science Gateway Are there important visualization requirements outside GIS? Are there key issues from security? Bring up core services such as registries Need infrastructure to run services (Linux PC) 24 Benefits of CReSIS PolarGrid Shared resources support collaboration among CReSIS scientists Integration of Polar related data with appropriate compute resources enabling research on specific topics and studies across topics Polar Science Gateway accessing common services (programs), data and their integration as workflow Access to TeraGrid with same interface for large scale simulations Can share common capabilities (SAR analysis, GIS) with related Grids such as SERVOGrid, GEON, LEAD etc. Modular Grid services allow exchange of new capabilities preserving systems • e.g. Change EM Simulation service Management of dynamic heterogeneous data 25 SERVO/QuakeSim Services Eye Chart Service Description Job Management SERVO wraps Apache Ant as a web service and uses it to launch jobs. For a particular application, we design a build.xml template. The interface is simply a string array of build properties called for by the template. We’ve also built a simple generic “template engine” version of this. Specific Applications: Virtual California, Geofest, Park, RDAHMM .. These can be all launched by a single Job Management service or by custom instances of this with metadata preset to a particular application Context Data Service We store information gathered from users’ interactions with the portal interface in a generic, recursively defined XML data structure. Typically we store input parameters and choices made by the user so that we can recover and reload these later. We also use this for monitoring remote workflows. We have devoted considerable effort into developing WS-Context to support the generalization of this initial simple service. Application and Host Metadata Service We have an Application and a Host Descriptor service based on XML schema descriptors. Portlet interfaces allow code administrators to make applications available through the browser. File Services We built a file web service that could do uploads, downloads, and crossloads between different services. Clearly this supports specific operations such as file browsing, creation, deletion and copying. Portal We use an OGCE based portal based on portlet architecture Authentication and Authorization This uses capabilities built into portal. Note that simulations are typically performed on machines where user has accounts while data services are shared for read access Information Service We have built data model extensions to UDDI to support XPath queries over Geographical Information System capability.xml files. This is designed to replace OGC (Open Geospatial Consortium) Web registry service Web Map Service We built a Web Service version of this Open Geospatial Consortium specification. The WMS constructs images out of abstract feature descriptions. Web Feature Service We’ve built a Web Service version of this OGC standard. We’ve extended it to support data streaming for increased performance. Service Eye Chart Continued Workflow/Monitoring/Management Services The HPSearch project uses HPSearch Web Services to execute JavaScript workflow descriptions. It has more recently been revised to support WS-Management and to support both workflow (where there are many alternatives) and system management (where there is less work). Management functions include life cycle of services and QoS for inter-service links Sensor Grid Services We are developing infrastructure to support streaming GPS signals and their successive filtering into different formats. This is built over NaradaBrokering (see messaging service). This does not use Web Services as such at present but the filters can be controlled by HPSearch services. Messaging Service This is used to stream data in workflow fed by real-time sources. It is based on NaradaBrokering which can also be used in cases just involving archival data Notification Service This supplies alerts to users when filters (data-mining) detects features of interest QuakeTables Database Services The USC QuakeTables fault database project includes a web service that allows you to search for Earthquake faults. Scientific Plotting Services We are developing Dislin-based scientific plotting services as a variation of our Web Map Service: for a given input service, we can generate a raster image (like a contour plot) which can be integrated with other scientific and GIS map plot images. Data Tables Web Service We are developing a Web Service based on the National Virtual Observatory’s VOTables XML format for tabular data. We see this as a useful general format for ASCII data produced by various application codes in SERVO and other projects. Key interfaces/standards/software Used GML WFS WMS WSDL XML Schema with pull parser XPP SOAP with Axis 1.x UDDI WS-Context JSR-168 JDBC Servlets WS-Management VOTables in Research Key interfaces/standards/software NOT Used (often just for historical reasons as project predated standard) WS-Security JSDL WSRF BPEL OGSA-DAI Key GIS and Related Services Component Description HPSearch Support for streaming data between services; supports scriptable workflows so not limited to DAGs; implementation of WS-Distributed Management WS-Context Contexts can be used to hold arbitrary content (XML, URIs, name-value pairs); can be used to support distributed session state as well as persistent data; currently researching scalability. Web Feature Service Supports both streaming and non-streaming returns of query results. Web Map Services Supports integration of local and remote map services; treats Google maps as an OGC-compliant map server; Sensor Grid Publish/subscribe system allows data streams to be reorganized using topics. 28