Computational Infrastructure for Policy Informatics Policy Informatics in an Interdependent World Workshop Washington DC September 13 2007 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University.
Download ReportTranscript Computational Infrastructure for Policy Informatics Policy Informatics in an Interdependent World Workshop Washington DC September 13 2007 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University.
Computational Infrastructure for Policy Informatics Policy Informatics in an Interdependent World Workshop Washington DC September 13 2007 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401 http://grids.ucs.indiana.edu/ptliupages/presentations/ [email protected] http://www.infomall.org 1 e-moreorlessanything ‘e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it.’ from its inventor John Taylor Director General of Research Councils UK, Office of Science and Technology e-Science is about developing tools and technologies that allow scientists to do ‘faster, better or different’ research Similarly e-Business captures an emerging view of corporations as dynamic virtual organizations linking employees, customers and stakeholders across the world. This generalizes to e-moreorlessanything including presumably ePolicyinformatics A deluge of data of unprecedented and inevitable size must be managed and understood. People (see Web 2.0), computers, data and instruments must be linked. On demand assignment of experts, computers, networks and storage resources must be supported 2 Role of Cyberinfrastructure Cyberinfrastructure is infrastructure that supports distributed science (e-Science)– data, people, computers Exploits Internet technology (Web2.0) adding (via Grid technology) management, security, supercomputers etc. It has two aspects: parallel – low latency (microseconds) between nodes and distributed – highish latency (milliseconds) between nodes Parallel needed to get high performance on individual large simulations, data analysis etc.; must decompose problem Distributed aspect integrates already distinct components – especially natural for data Cyberinfrastructure is in general a distributed collection of parallel systems Cyberinfrastructure is made of services (originally Web services) that are “just” programs or data sources packaged for distributed access 3 Structure of Cyberinfrastructure Distributed software systems are being “revolutionized” by developments from e-commerce, e-Science and the consumer Internet. There is rapid progress in technology families termed “Web services”, “Grids” and “Web 2.0” The emerging distributed system picture is of distributed services with advertised interfaces but opaque implementations communicating by streams of messages over a variety of protocols • Complete systems are built by combining either services or predefined/pre-existing collections of services together to achieve new capabilities As well as Internet/Communication revolutions (distributed systems), multicore chips will likely be hugely important (parallel systems) Industry not academia is leading innovation in these technologies 4 Policy Informatics Infrastructure The Party Line approach is clear – one creates a Cyberinfrastructure consisting of distributed services accessed by portals/gadgets/gateways/RSS feeds Services include: • “original data” • Transformations or filters implementing DIKW (Data Information Knowledge Wisdom) pipeline • Final “Decision Support” step converting wisdom into action • Generic services such as security, profiles etc. Some filters could correspond to large simulations Infrastructure will be set up as a System of Systems (Grids of Grids) • Services and/or Grids just accept some form of DIKW and produce another form of DIKW • “Original data” has no explicit input; just output 5 Raw Data S S S S FS FS FS FS MD FS MD O S FS O S FS F S FS MD MD SS O S FS FS O S FS MD O S FS F S O S MD Filter Service FS O S FS Other Service MD O S FS MetaData SS S S Database O S FS SS Another Grid FS O S O S SS Decisions MD MD FS SS FS S S O S SS Another Service Wisdom Knowledge Another Grid FS SS Information S S Another Grid Data S S S S Another Service S S S S S S S S S S S S Sensor Service 6 Information Management/Processing Diagram describes e-Science, Military Command and Control and perhaps Policy Informatics Data Information Knowledge Wisdom transformation (SOAP or just RSS) messages transport information expressed in a semantically rich fashion between sources and services that enhance and transform information so that complete system provides • Semantic Web technologies like RDF and OWL might help us to have rich expressivity but they might be too complicated We are meant to build application specific information management/transformation systems for each domain • Each domain has specific services/standards (for API’s and Information) and will use generic services (like R for datamining) and standards (RDF, WSDL) • What is PIML Policy Informatics Markup Language? • Standards made before consensus or not observant of technology progress are dubious (cf. HLA in simulation or many grid standards) 7 Too much Computing? Historically one has tried to increase computing capabilities by • Optimizing performance of codes • Exploiting all possible CPU’s such as Graphics co-processors and “idle cycles” • Making central computers available such as NSF/DoE/DoD supercomputer networks Next Crisis in technology area will be the opposite problem – commodity chips will be 32-128way parallel in 5 years time and we currently have no idea how to use them – especially on clients • Only 2 releases of standard software (e.g. Office) in this time span Gaming and Generalized decision support (data mining) are two obvious ways of using these cycles • Intel RMS analysis • Note even cell phones will be multicore “Too much data” matched to “Too much computing” but implications involved rather different 8 Intel’s Projection 9 RMS: Recognition Mining Synthesis Recognition Mining Synthesis What is …? Is it …? What if …? Model Find a model instance Create a model instance Today Model-less Real-time streaming and transactions on static – structured datasets Very limited realism Tomorrow Model-based multimodal recognition Real-time analytics on dynamic, unstructured, multimodal datasets Pradeep K. Dubey, [email protected] Photo-realism and physics-based animation 10 Recognition What is a tumor? Mining Synthesis Is there a tumor here? What if the tumor progresses? It is all about dealing efficiently with complex multimodal datasets Images courtesy: http://splweb.bwh.harvard.edu:8000/pages/images_movies.html Pradeep K. Dubey, [email protected] 11 Intel’s Application Stack 12 What should we do? There will be high quality parallel data mining algorithms • Speech Recognition, Text and multimedia search and browsers • New generation of desktop aides • What are synergies to “Personal aides in an information rich world” (future of PC?) and Policy Informatics? What filters (data mining) does policy informatics need? As computing free, focus on identifying information/knowledge/wisdom needed (there is probably too much data but not so much wisdom in DIKW pipeline) • We should use supercomputer/computer services but Information services more important and less “controversial” Identify standards for data and data-mining API’s Set up distributed Policy Informatics Services Use Web 2.0 (as it makes things easier) not current Grids (which makes things harder) • Build a “Programmable Policy Informatics Web”’ • Emphasize Simplicity • Is “Secrecy” important and in fact viable? Should we care just about “original data” or also about the whole pipeline DIKW? 13 Web 2.0 Mashups and APIs http://www.programmable web.com/apis has (Sept 12 2007) 2312 Mashups and 511 Web 2.0 APIs and with GoogleMaps the most often used in Mashups Mashups are called workflow in Grid arena 14 The List of Web 2.0 API’s Each site has API and its features Divided into broad categories Only a few used a lot (49 API’s used in 10 or more mashups) RSS feed of new APIs Amazon S3 growing in popularity 15 Spare Slides 16 Grid Service Philosophy I Services receive data in SOAP messages, manipulate it and produce transformed data as further messages Knowledge is created from information by services • Information is created from data by services Semantic Grid comes from building metadata rich systems of services Meta-data is carried in SOAP messages The Grid enhances Web services with semantically rich system and application specific management One must exploit and work around the different approaches to meta-data (state) and their manipulation in Web Services 17 Grid Service Philosophy II There are a horde of support services supplying security, collaboration, database access, user interfaces The support services are either associated with system or application where the former are WS-* and GS-* which implicitly or explicitly define many support services There are generalized filter services which are applications that accept messages and produce new messages with some data derived from that in input • Simulations (including PDE’s and reactive systems) • Data-mining • Transformations • Agents • Reasoning • Decision making Tools are all termed filters here Agent Systems are a special case of Grids Peer-to-peer systems can be built as a Grid with particular discovery and messaging strategies 18 Grid Service Philosophy III Filters can be a workflow which means they are “just collections of other simpler services” Grids are distributed systems that accept distributed messages and produce distributed result messages A service or a workflow is a special case of a Grid A collection of services on a multi-core chip is a Grid Sensors or Instruments are “managed” by services; they may accept non SOAP control messages and produce data as messages (that are not usually SOAP) 19 Virtual Observatory Astronomy Grid Integrate Experiments Radio Far-Infrared Visible Dust Map Visible + X-ray 20 Galaxy Density Map Service or Web service Approach One uses GML, CML etc. to define the data in a system and one uses services to capture “methods” or “programs” In eScience, important services fall in three classes • Simulations • Data access, storage, federation, discovery • Filters for data mining and manipulation Services use something like WSDL (Web Service Definition Language) to define interoperable interfaces (see OPAL talk!) WSDL establishes a “contract” independent of implementation between two services or a service and a client Services should be loosely coupled which normally means they are coarse grain Services will be composed (linked together) by mashups (typically scripts) or workflow (often XML – BPEL) Software Engineering and Interoperability/Standards are closely related 21 Philosophy of Web Service Grids Much of Distributed Computing was built by natural extensions of computing models developed for sequential machines This leads to the distributed object (DO) model represented by Java and CORBA • RPC (Remote Procedure Call) or RMI (Remote Method Invocation) for Java Key people think this is not a good idea as it scales badly and ties distributed entities together too tightly • Distributed Objects Replaced by Services Note CORBA was considered too complicated in both organization and proposed infrastructure • and Java was considered as “tightly coupled to Sun” • So there were other reasons to discard Thus replace distributed objects by services connected by “one-way” messages and not by request-response messages 22 Web services resources Humans service logic BPEL, Java, .NET Databases Programs Computational resources message processing Web Services build loosely-coupled, distributed applications, (wrapping existing codes and databases) based on the SOA (service oriented architecture) principles. Web Services interact by exchanging messages in SOAP format The contracts for the message exchanges that implement those interactions are described via WSDL interfaces. SOAP and WSDL Devices <env:Envelope> <env:Header> ... </env:header> <env:Body> ... </env:Body> </env:Envelope> SOAP messages 23 A typical Web Service In principle, services can be in any language (Fortran .. Java .. Perl .. Python) and the interfaces can be method calls, Java RMI Messages, CGI Web invocations, totally compiled away (inlining) The simplest implementations involve XML messages (SOAP) and programs written in net friendly languages like Java and Python Web Services WSDL interfaces Portal Service Security WSDL interfaces Web Services Payment Credit Card Catalog Warehouse Shipping control 24 The Grid and Web Service Institutional Hierarchy 4: Application or Community of Interest (CoI) Specific Services such as “Map Services”, “Run BLAST” or “Simulate a Missile” XBML XTCE VOTABLE CML CellML 3: Generally Useful Services and Features (OGSA and other GGF, W3C) Such as “Collaborate”, “Access a Database” or “Submit a Job” OGSA GS-* and some WS-* GGF/W3C/…. XGSP (Collab) 2: System Services and Features (WS-* from OASIS/W3C/Industry) Handlers like WS-RM, Security, UDDI Registry 1: Container and Run Time (Hosting) Environment (Apache Axis, .NET etc.) Must set standards to get interoperability WS-* from OASIS/W3C/ Industry Apache Axis .NET etc. 25 The Ten areas covered by the 60 core WS-* Specifications WS-* Specification Area Examples 1: Core Service Model XML, WSDL, SOAP 2: Service Internet WS-Addressing, WS-MessageDelivery; Reliable Messaging WSRM; Efficient Messaging MOTM 3: Notification WS-Notification, WS-Eventing (Publish-Subscribe) 4: Workflow and Transactions BPEL, WS-Choreography, WS-Coordination 5: Security WS-Security, WS-Trust, WS-Federation, SAML, WS-SecureConversation 6: Service Discovery UDDI, WS-Discovery 7: System Metadata and State WSRF, WS-MetadataExchange, WS-Context 8: Management WSDM, WS-Management, WS-Transfer 9: Policy and Agreements WS-Policy, WS-Agreement 10: Portals and User Interfaces WSRP (Remote Portlets) 26 Activities in Global Grid Forum Working Groups GGF Area GS-* and OGSA Standards Activities 1: Architecture High Level Resource/Service Naming (level 2 of slide 6), Integrated Grid Architecture 2: Applications Software Interfaces to Grid, Grid Remote Procedure Call, Checkpointing and Recovery, Interoperability to Job Submittal services, Information Retrieval, 3: Compute Job Submission, Basic Execution Services, Service Level Agreements for Resource use and reservation, Distributed Scheduling 4: Data Database and File Grid access, Grid FTP, Storage Management, Data replication, Binary data specification and interface, High-level publish/subscribe, Transaction management 5: Infrastructure Network measurements, Role of IPv6 and high performance networking, Data transport 6: Management Resource/Service configuration, deployment and lifetime, Usage records and access, Grid economy model 7: Security Authorization, P2P and Firewall Issues, Trusted Computing 27 Two-level Programming I • The Web Service (Grid) paradigm implicitly assumes a two-level Programming Model • We make a Service (same as a “distributed object” or “computer program” running on a remote computer) using conventional technologies – C++ Java or Fortran Monte Carlo module – Data streaming from a sensor or Satellite – Specialized (JDBC) database access • Such services accept and produce data from users files and databases Service Data • The Grid is built by coordinating such services assuming we have solved problem of programming the service 28 Two-level Programming II The Grid is discussing the composition of distributed services with the runtime Service1 Service2 interfaces to Grid as opposed to UNIX Service3 Service4 pipes/data streams Familiar from use of UNIX Shell, PERL or Python scripts to produce real applications from core programs Such interpretative environments are the single processor analog of Grid Programming Some projects like GrADS from Rice University are looking at integration between service and composition levels but dominant effort looks at each level separately 29 Grid Workflow Data Assimilation in Earth Science Grid services triggered by abnormal events and controlled by workflow process real time data from radar and high resolution simulations for tornado forecasts Typical graphical interface to service composition 30