Transcript robgill
Architecting the Virtual Organisation Rob Gill Biology Domain MDR-IT March 2007 Drug Discovery Process GSK is a research based pharmaceutical company – new products from R&D pipeline Gene ID Disease Association Gene-to-Target Identify gene or target protein important for disease mechanism. Target Selection Screening Hit to Lead Cost Target to Lead Identify and Optimise Inhibitors / modulators Preclinical Candidate Candidate Selection Evaluate • Safety • Dose • Toxicity PoC, File Clinical Launch FTIM Test Hypothesis in Man • Efficacy Informatics links the information used to find and validate new targets TARGETS SCREENS CANDIDATES ? 700000 600000 500000 400000 300000 200000 100000 0 BUT….. All these tools are built and integrated within GSK DRUGS R&D Strategy: Virtualisation Total investment in drug research: – GSK R&D: ~15,000 Scientists, >$4B R&D spend – PhRMA member companies¥ (2005): >$51B R&D spend – Biotechs (2006): >$25B R&D spend R&D strategy is to tap into external knowledge and expertise through a network of external alliances sharing the risk, reward and control Centre of Excellence for External Drug Discovery (CEEDD) launched in 2005 ¥Includes GSK R&D Strategy: Globalisation The rate of growth of scientific and technical graduates in Asia is out pacing the US and Europe China is one of the fastest growing countries: – well over a million scientists and engineers are graduating each year (National Bureau of Statistics, 2005) – by 2010 it is predicted that China will surpass the United States in the number of science and engineering PhDs conferred (National Bureau of Economic Research) R&D strategy is to be able to tap into this wealth of knowledge along with other developments in the global market. GSK IT Definition of Virtualisation “Virtualisation will allow GSK to operate seamlessly across organisational, process and technology boundaries enabling GSK to: exploit process expertise, partner effectively and flexibly outsource” Here’s why…. Development of Alliances Access to Service Providers (CROs) Integrated Outsource Partners Better access for key opinion leaders Acquisition Globalised distributed workforce Challenge for IT in GSK Business trend is away from the “fortress” towards an “ecosystem” However the GSK IT environment was not designed for this Significant overhead required to be integrated into R&D/GSK processes Most solutions are “bolted on” to our current infrastructure Virtualisation is one of the key streams of GSK strategy IT Techniques needed to support this transition Technical Architectures Information Architecture (IA) Speed Vs Flexibility Need to simplify our environment Necessity to support remote sites Reduce costs – Development – Support Capture processes across Biology Publish and maintain models Use to direct the generation of data services Creation/Use of Ontologies Identify “integration points” Design out overlaps Drive GSK Integration Business Process Modelling & Model Driven Design If we are to fully embrace the Virtualisation challenge we need to look again at how we capture and implement business processes. Process Definition and Information Standards – Architects and Business analyst in conjunction with business process owners define the process model and information passing between process tasks using a business process modeling tool Implementation – Software developers implement services and define business rules Process Optimization and Monitoring – Business process owners can then monitor metrics to analyze and optimize the process Better modeled processes and SOA approaches then open up GSK systems to a number of delivery approaches – Enterprise Service Bus (ESB) – Workflow Tools (InforSense, Taverna …) Business Process Management Architecture Bespoke point-to-point interfaces between systems and adhoc communication do not allow business process to be easily reconfigured, outsourced, or tracked. Workflow is either hard-coded into application or handled by adhoc communication. Inventory System Request System QC System Delivery System email phone phone phone Workflow is externalized and visible and point-to-point interfaces eliminated. This allows business process to be easily reconfigured, outsourced and metrics tracked. Process Server Request Product Build Product QC Product Deliver Product Task queue Process Monitor (captures metrics) Request System Product system QC System Delivery System Use of Information Architecture Business Process Management takes us closer to the Virtualisation goals However building this on top of an unmanaged data architecture still leaves us unable to fully benefit from the distributed approach To properly Virtualise we need to both understand business process and have a managed data architecture. Historical Ad-Hoc Integration (As Required) Networks View Reagents View Gene View Platform View Connection Spaghetti Move to “services” (But no IA) Networks View Reagents View Gene View Services Platform View Services Services Hidden Spaghetti Modelled data and business process Business BRAD Perspective Genomics Perspective Discovery Perspective Services Services Data Services Genetics Perspective Gene Data Protein Data Gene Omics Results Platform Networks Data Network Workflow Analysis Visualise Get Gene Sequence Get Protein Sequence Gene Data Get Gene Annotation Protein Data Gene Get platform results Get all results Omics Results Platform Filter By Technology Get closest neighbours Filter by Species Networks Data Network Get All Pathways Why Workflow ? Drives modularisation and aligns with model driven approach – Good for prototyping and process capture – Very fast response time for development Captures scientific knowledge as part of the workflow design process Built in flexibility for a rapidly changing environment Supports Integrity and retention initiatives Lowers the barrier of software development Target Application Architecture Visualization & Analysis Tools Sequencing LIMS Workflow Analytics microArray LIMS Bio Catalog Proteomics LIMS Cellular LIMS Sample Management Biological Inventory Metabolomics LIMS Experimental & Analysis Results Reference Biology (Sequences, Genes, Proteins, Biological Networks) Data Integration Study Design & Sample ID Generation Request Management Data Marts Biology Target Architecture Goals Access to all Biology applications made accessible to vendors and partners by hosting in open environment. Remove all major business workflow from individual applications and implement workflow control through business process management with well-defined Service Data Objects. Look to standardize data formats and participate in semantic GRID computing Demonstrations of Workflow and Grid usage at GSK POC’s in the Biology Domain Demonstrators in the portfolio space Workflow tools have been heavily used across both the Biology and Chemistry space – Some moving to production status – Still some Challenges to overcome (discussed later) Primary use in process management and reporting Portfolio space has some novel and potentially valuable opportunities. – Using Workflow from the “Top down” has the opportunity to cause integration across different scientific disciplines – Portfolio view will need to span the whole GSK space as we move to the virtualised environment Portfolio support using Workflow and Semantic Grid Target is an overloaded term. – Confusion over meaning and identification – Controlled Vocabulary, Standards and inappropriate identification has lead to a complex mix of data types which are difficult to traverse. A GSK “target” is actually a scientifically generated concept linking a disease to a set of entities and relationships needing to be tested within models. Concept Capture “Gene” Data “Disease” Data “Compound” Data Concept Concept ID Sequence db Sequence ID Reagent db Reagent ID Assay db Assay ID Assay Results Target Management Once captured as a network it is possible to query and search across “Target space” in a far more flexible and intuitive way. Standards can be enforced and aligned with GSK requirements. Automated analysis can be used to answer numerous questions about the inferences and results generated within a portfolio program. Using services generated in the Biology and Chemistry space it is possible to annotate the portfolio in new and novel ways. And answer fundamental questions around portfolio progression. This process uses simple independent services on top of available GSK data and is reusable Drug Discovery Process (again) GSK is a research based pharmaceutical company – new “Concepts” for R&D pipeline Gene ID Disease Association Request System Target Selection Screening Inventory System Preclinical Candidate Hit to Lead PoC, File Clinical Launch QC System Delivery System email phone phone X phone X 1. Workflow in reagent tracking POC Using Workflow to manage reagent processing Numerous questions can be asked: – – – – – – Where is / how many Target(s) ? Programs using a molecular target Availability of reagents for a program Lead series generated from a program Mapping status of bio-reagents to Molecular Target Which Bio-reagents used within a specific screen Very process driven approach focussed on GSK data. Target Identification Assay Development Gene Info Query Assay Info Query Chemical Query Bioreagent Info Query ProtA Compound 123 ProtB Multimeric protein SoC Workflow challenges for Production Status Service availability – Using services in this manner requires a high availability of services Service Orchestration – Need to ensure that once started a workflow runs to completion Service support – Can this be managed internally and externally 2. Semantic Grid for Portfolio POC Create an environment to query “concept” data across GSK and externally Access novel algorithms / analysis for use on concept Networks to expand knowledge and identify project “risks”. – Data expansion (e.g. platform analysis) Used in Validation / Reagent generation – Network expansion (looking for connections) Used by Discovery groups / Informatics Assay Validation Analysis Assay Development Target Identification Bioreagent Analysis Transcript Analysis GRID “Concept Novelty” TransA_1 GeneA TransA_2 GSK00000 X TransA_3 Assay may fail to identify this variant. Business Value ?? Methodology opens up Target space for broader analysis – Supports MDR requirements for target tracking and analysis – Linkage of Biology and Chemistry domains for better tracking – Drives standards and vocabulary down to the business – Removes need to redundantly identify “Molecular Target” throughout process. Methods can be used to spot issues with programs and allow “kill early” decisions – Transcript analysis and assays – Bio-reagent availability (inc. AB’s / Crystallography etc.) Methods and services can be reused across organisation outside of portfolio area GRID Challenges in Industry Service Productionisation – Availability, Security, Granularity – Relationship management Define the negotiation paradigm Licensed service model Implement local scoped catalogue – Service Orchestration How to manage skews in Ontologies & technology – Vendors – Academia – Knowledge Engineers ?? GSK is looking at developing standards One such group is SIMDAT Objectives: – Develop federated versions of problem solving environments – Support of distributed product and process development – Test and enhance grid technology for access to distributed data bases – Tools for semantic transformation between these data bases – Grid support for knowledge discovery – Promote defacto standards – Raise awareness in important industrial sectors What is SIMDAT ? SIMDAT - Overview Four sectors of international economic importance: Automotive Pharmaceutical Aerospace Meteorology Seven Grid-technology development areas: Grid infrastructure Distributed Data Access VO Administration Workflows Ontologies Analysis Services Knowledge Services The solution of industrially relevant complex problems using data-centric Grid technology. Drivers for joining SIMDAT SIMDAT provides a platform to look at all the following requirements and develop novel strategies to deliver them: – IT/IX tasked with delivering cutting edge analysis and data management systems – Need to develop an environment to drive integration and introduce flexibility. – Buy not build when technology or tools are available and appropriate – Lower “Activation Energy” around external collaborations – Look to Academia for “cutting edge” research – Look to Vendors for solid support / continuity & Value – Ensure Pharma discovery process continues as a production system with inbuilt flexibility – Look for opportunities to Virtualise GSK infrastructure – Support the needs for Of shoring and Outsourcing GSK architectural Direction SIMDAT Applications Solution Vendors Security PSTUD. B* Service Enactment GATE Gene GSK Sample Microarray ?? Payment Continuity Value Relationships ?? ?? Other Pharma / Biotech First B2B Scenario GSK GnG database Integrate and Output Inpharmatica via GRIA SIMDAT gives us Framework for Service Consumption – Specialised Software Granularity – Data and Algorithm Services – Accounting and Consumption Model – Stratified Information – Open Service Market – Choreography to Orchestration Relationship Management – Specialised Services – Focused Licensing – Zero Setup – Negotiated Catalogue – Dynamic Fee Dimensioning Implemented Components of the B2B Scenario Licensed Services Job execution Annotate project portal User Workflow Semantic Broker Knowledge Portal InforSense KDE Updating SB E2E Job execution Job execution E2E BIOCLIP Service GRIA wrapper Inpharmatica GRIA Job execution GRIA wrapper Chemisry Services Service Updater Final Scenario; B2A Industrial Service Provider Web Portal GRIA layer Internet Local applications Academic Service Provider Academic Consumer Web Portal GRIA layer Local applications In Conclusion Virtualisation & Globalisation are key objectives for GSK IT must move to support these initiatives Workflow and semantic directions can support this goal but MUST move into production Consolidation of the various efforts in Grid and Ontology would make this move simpler. Acknowledgements MDR-IT Richard Ashe Simon Dear Mike Moore John Armstrong Bart Ailey Thank you ! Biology technical architect Biology information architect Biology POC Biology Business Analysis SIMDAT Contractor