Information Technology Leadership & Strategies in the Life Sciences Craig A. Stewart [email protected] Indiana University 2 December 2004 Kelley School of Business © Trustees of Indiana University.
Download ReportTranscript Information Technology Leadership & Strategies in the Life Sciences Craig A. Stewart [email protected] Indiana University 2 December 2004 Kelley School of Business © Trustees of Indiana University.
1 Information Technology Leadership & Strategies in the Life Sciences Craig A. Stewart [email protected] Indiana University 2 December 2004 Kelley School of Business © Trustees of Indiana University License Terms • • • • Please cite this presentation as: Stewart, C.A. Information Technology Leadership & Strategies in the Life Sciences. 2004. Presentation. (Kelley School of Business, Bloomington, IN, 2 Dec 2004). Available from: http://hdl.handle.net/2022/14781 Portions of this document that originated from sources outside IU are shown here and used by permission or under licenses indicated within this document. Items indicated with a © or denoted with a source url are under copyright and used here with permission. Such items may not be reused without permission from the holder of copyright except where license terms noted on a slide permit reuse. Except where otherwise noted, the contents of this presentation are copyright 2004 by the Trustees of Indiana University. This content is released under the Creative Commons Attribution 3.0 Unported license (http://creativecommons.org/licenses/by/3.0/). This license includes the following terms: You are free to share – to copy, distribute and transmit the work and to remix – to adapt the work under the following conditions: attribution – you must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). For any reuse or distribution, you must make clear to others the license terms of this work. 3 Outline • IT Governance in the Life Sciences • What’s different in the life sciences – A quick primer on life sciences – A sample bioinformatics application: BLAST • Life science strategies – Health care – Biomedical research and development – Creation of new markets – IU’s strategies • High performance computing challenges in life sciences • The response of the IT market to the life sciences • Predictions about the future NB: Most of the slides presented herein were generated at Indiana University. Some slides were graciously provided by colleagues at other institutions, and sources are indicated on those slides 4 IT governance in the life sciences Archetypes from Weill & Ross • Monarchy (business or IT) • Feudal • Federal • Anarchy IT strategies in the life science as effective outcomes • Intentional mediocrity • Catch as catch can • Commitment to excellence with centralized control – and provision – and management • IT for IT’s sake 5 Subdomains within the Life Sciences • Health care • Biomedical research and development – Precompetitive – Competitive • Drug development • Drug testing • Creation of new markets 6 What’s different in the life sciences? • Computing in life sciences is not new • What is new: highthroughput sequencing & the possibility of going from a knowledge of the DNA sequence to an understanding of diseases and health • The elusive electronic personal medical record! http://www.ncbi.nlm.nih.gov/Genbank/genbankstats.html 7 Genome Projects Timeline • • • • • • • • • • • 1978 1986 1994 1995 1996 1997 1998 1998 1999 2000 2003 First virus (SV40) sequenced (5224 base pairs) DOE announces Human Genome Initiative First complete map of all human chromosomes First living organism sequenced (H. influenzae) 2 Mb Yeast (S. cerevisiae) - 12 Mb Intestinal bacterium (E. coli) - 5 Mb Nematode worm (C. elegans) - 100 Mb Celera announcement; Public effort regroups Human Chromosome 22 – 34 Mb Joint announcement by NHGRI – Celera “As good as it gets” human genome This slide based on slide by Manfred D. Zorn 8 Definitions • Computational Biology: any use of advanced information technology in the study of biological problems. • “Bioinformatics applies the principles of information sciences and technologies to make the vast, diverse and complex life sciences data mnore understandable and useful” (NIH BISTIC Committee grants1.nih.gov/grants/bistic/CompuBioDef.pdf) • Genomics – study of genomes and gene function • Proteomics – study of proteins and protein function • ___omics – 9 Complexity of life sciences • Chip design – All components known – Device physics for individual components known 8 – Itanium has 3 x 10 connections and 2 x 108 devices – Unified basic currency (electrons) – Computer program required to understand (e.g. SPICE) • Cells – Components not known – Function of individual components not known – # components ~1013 – No unified basic currency Why is it important to know some biology? Anopheles gambiae From www.sciencemag.org/feature/data/ mosquito/mtm/index.html Source Library:Centers for Disease Control Photo Credit:Jim Gathany 10 • Would you invest in the stock market without knowing how to calculate a P/E ratio? • Much current biological knowledge is very specific to particular organisms, genes, or diseases • If you just wade into the available data (or hyperbole) online you can do some very silly things. 11 Central dogma of biology • The central dogma of biology is that genes act to create phenotypes through a flow of information form DNA to RNA to proteins, to interactions among proteins (regulatory circuits and metabolic pathways), and ultimately to phenotypes. Collections of individual phenotypes constitute a population (first put forward by Crick in 1958) http://www.ncbi.nlm.nih.gov/About/primer/genetics_cell.html 12 Four (or Five) Bases • DNA consists of four nucleotides: Cytosine, Thymine, Adenine, and Guanine. • In the double helix, A&T are always bound, and C&G are always bound to each other • RNA consists of four nucleotides as well: Cytosine, Uracil, Adenine, and Guanine • RNA may loop back on itself but it does not form a double helix www.ornl.gov/TechResources/Human_Genome/graphics/slides/images/structur.gif 13 Genetic Code Ala Alanine Arg Arginine Asn Asparagine Asp Aspartic acid Cys Cysteine Glu Glutamic acid Gln Glutamine Gly Glycine His Histidine Ile Isoleucine http://www.ncbi.nlm.nih.gov/Class/MLACourse/ Original8Hour/Genetics/geneticcode.html Leu Leucine Lys Lysine Met Methionine Phe Phenylalanine Pro Proline Ser Serine Thr Threonine Trp Tryptophan Tyr Tyrosine Val Valine Translating DNA to RNA and Transcribing RNA to Proteins DNA AAAAAGGAGCAAATT 1 RNA One possible amino acid string 2 4 3 6 5 UUUUUCCUCGUUUAA Phe Asn Asp Ala 14 15 Alternate splicing http://www.blc.arizona.edu/marty/411/Modules/altsplice.html 16 http://www.ornl.gov/TechResources/Human_Genome/graphics/slides/images/98-647.jpg 17 Sickle Cell Normal RBC • GAG codes for Glutamine • disc-Shaped, soft • easily flow through small blood vessels • lives for 120 days Sickle RBC • GTG codes for Valine • sickle-Shaped, hard • often get stuck in small blood vessels • lives for 20 days or less Malaria vs. Anaemia! http://www.nlm.nih.gov/medlineplus/ ency/imagepages/1223.htm 18 Biomedical information online http://www.nlm.nih.gov/ • Abstracts of biomedical lit. largely available online • Text processing itself is an interesting problem • U.S. National Library of Medicine – NLM Medline http://www.nlm.nih.gov/ • ~12 million references on life sciences/biomedicine. • Actual sequence data available online at http://www.ncbi.nlm.nih.gov and other places! Why pattern matching (and what are the problems) and… US! Bonobo http://www.sandiegozoo.org/special/zoo-featured/pygmy_chimps.html 19 20 Alignments Matches are good: they get a positive value • Mismatches are bad: they get a negative value • Gaps are bad: they get a negative value – Gap opening penalty – Gap extension penalty – Score = Matches –Mismatches -∑{gap opening penalty +(length)*gap length penalty} CGTACCGTTAATAT CGTTCCG . . .ATAT CGTACCGTTAATAT CGT. C . GTT .ATAT 21 BLAST Algorithm • BLAST is a heuristic local alignment and search tool • Given a search sequence, e.g. ACGTAGGCATGAA • BLAST first makes a list of all “words” of a given length that would possibly have a score of at least T against the search string. • BLAST then tries to extend the matches as far as possible • BLAST reports list of the top scoring matches • Aniridia, cancer • BLAST can eat you alive! • Secrecy, BLAST, and public data sources • Note: there are other algorithms, including algorithms based on dynamic programming 22 High Performance Computing • HPC, Supercomputers, Clusters – “The Mythical Man Month” by Frederick Brooks – Amdahl’s Law: Speedup = • • • • N -------------------S*N + (1-S) Grid Computing Massive Data Storage Systems Visualization What’s it take to fold a protein? 23 IT Strategies in Health Care • Production and quality (six sigma) environments – Real time and transaction style environments • Regulatory issues – HIPAA (Indianapolis VA hospital example) – FDA approval • Standards – HL7 – LOINC – SnoMed CT 24 Biomedical Research & Development • Precompetitive – Basic science research – Target identification • Competitive – – – – Target validation Early phase drug development Clinical trials Goal: fail early, fail cheap 25 The biotech/biomedical industries • • • • Predator/Prey? Parasite/Host? Symbiotic? IP protection, profit margins, drugs and devices IT may be strategic and yet incremental, not fundamental! Challenges: – Protein folding – In silico predictive biology – Personalized medicine • Counterintuitive effects of personalized medicine! • Economics of drug development 26 Creation of new markets • Doubletwist www.bioitworld.com/archive/050702/survivor_sidebar_252.html • • • • Lion SRS Rosetta Beyond Genomics (“the systems biology company”) Analytic platforms or information technology platforms? 27 IU strategies in IT • IU’s goal is to be a leader, in absolute terms, in the creation and use of information technology. • Enable achievement of goal set by President Herbert: double IU’s research funding by end of the decade • Enlightened monarchy with federalized advice and extensive attention to customer engagement • Transparency and accountability • IU IT Strategic Plan (support.uits.iu.edu/scripts/ose.cgi?anvz.help&osecat=about) • How do you measure value in an academic environment? – Cost avoidance – Enhanced grant competitiveness – Tech transfer – Rankings IU strategies in advanced computing and life sciences • The mission of the Research and Academic Computing (RAC) division of UITS is to provide and support the world-class research computing resources that enable new scientific and artistic breakthroughs at Indiana University. RAC supports IU's researchers, scientists, artists, clinicians, and students; fosters collaborations; and aids innovations that advance information technology at IU and in the state of Indiana. RAC systems and services support all IU campuses. • Heavily centralized environment • Low barriers to entry • Leverage and flexibility 28 29 A mission to support researchers and artists in co-creating the future A foundation of reliable services This slide from Dr. Bradley C. Wheeler. All computing/research images from Indiana University sites 30 Research & Academic Computing Our Work Front Office Our Objective Reliable Services Co-Creating the Future Researcher Consulting & Education Grant Initiation, Collaboration, Fulfillment This slide from Dr. Bradley C. Wheeler Back Office Systems Administration Engineering Computing Frontiers Dr. Kate Pilachoski, Professor of Astronomy 31 “Double External Funding by AY10-11” Win Grant $$ Deliver Results ∞ Grant Initiation, Collaboration, Fulfillment Acquire IT & Staff Develop Competencies Ever Advancing Frontiers… •High Performance Computing •Mass Research Storage •Visualization •Networks (Telecom) •Consulting (Stat, Linux) •Digital Libraries Engineering Computing Frontiers Researcher Consulting & Education Systems Administration RAC Works via Relationships & Technical and Domain Competence This slide from Dr. Bradley C. Wheeler. 32 Centralized control & provision 33 IBM Research SP (Aries/Orion Complex) • 1.005 TeraFLOPS. 1st University-owned supercomputer in US to exceed 1 TFLOPS peak theoretical processing capacity. • Geographically distributed at IUB and IUPUI • Initially 50th on Top500 supercomputer list in 2001 Photo: Tyagan Miller. May be reused by IU for noncommercial purposes. To license for commercial use, contact the photographer 34 AVIDD • Analysis and Visualization of Instrument-Driven Data • Distributed Linux cluster. Three locations: IUN, IUPUI, IUB • 2.164 TFLOPS, 0.5 TB RAM, 10 TB Disk • First distributed Linux cluster to achieve more than 1 TFLOPS on Linpack benchmark – initially 50th on Top500 list in 2003 35 Massive Data Storage System • HPSS (High Performance Software System) • Automatic replication of data between Indianapolis and Bloomington, via I-light. • 180 TB capacity with existing tapes; total capacity of 2.4 PB. • 100 TB currently in use; >5 TB for biomedical data • Used to hold data for many studies, including an international study of Fetal Alcohol Spectrum Disorder Photo: Tyagan Miller. May be reused by IU for noncommercial purposes. To license for commercial use, contact the photographer IT 414 High-Resolution Display Wall IT 403 Reconfigurable Virtual Reality Theater This slide from Dr. Bradley C. Wheeler. 36 ICTC Advanced Visualization Facilities 37 Applications! • Commercial – Site licenses (e.g. SPSS, Mathematica) – Central provisioning of apps that provide differentiation • Role of Open Source – Niche apps vs Sakai – Examples: • fastDNAml • PENELOPE • Hybrids (e.g. SBML & Mathematica) • Apps… and support 38 Centralized coordination & harvesting • Computation – Condor & SMBL • Coordinated software purchases • Data federation – Federated database approach focuses on establishing glue between existing databases – “Private” databases stay where they are – under local control – “Public” databases may be replicated locally for performance Lab Result s DL Clinica l Data Toxicit y Data 39 40 Hereditary Diseases and Family Studies Division, Dept. of Medical and Molecular Genetics, IU School of Medicine. Supported in part by NIH R01 NS37167. 41 IU Life science strategies • HPC – No distinction between supercomputer users and nonsupercomputer users – The Fritos model – Engagement – Accountability (racinfo.indiana.edu) – Not Irish elk, but attention to proofs of excellence • Storage – Unique capabilities and massive capacity • Visualization – Unique capabilities and excellent support 42 50 IBM SPd IBM SPd AVIDD IBM SPd AVIDD Top500 Rank SGI 250 “ACTION 29: In order to maintain its positionSGI of leadership in the constantly IBM SPc changing field of high performance computing, the University should plan to continuously upgrade and replace its high-performance computing SGI facilities to keep them at a level that satisfies the increasing demand for IBM SPb computational power.” --- IT Strategic Plan IBM SPa IBM SPd AVIDD IBM SPd IBM SPc SGI 450 IBM SPd Sun IBM SPc 1996 1997 1997 1998 1998 1999 1999 2000 2000 2001 2001 2002 2002 2003 2003 2004 Nov June Nov June Nov June Nov June Nov June Nov June Nov June Nov June Year This slide from Dr. Bradley C. Wheeler Data from http://top500.org 43 Rating IU’s governance effectiveness • Questions from Weill & Ross – – – – Cost-effective use of IT Effective use of IT for asset utilization Effective use of IT for growth Effective use of IT for flexibility 44 Data from Balanced Scorecard 45 46 IU at SC04 Special purpose Computational Grid: IU/HLRS 2003 HPC Challenge • Global analysis of Arthropod evolution • One application: fastDNAml • 8 types of systems; 641 processors; 6 continents • 200 trees analyzed 47 IU and life sciences IT on 7 characteristics from Weill & Ross • More managers in Leadership Positions could describe IT Governance • Engage, engage, engage • More direct involvement of the senior leaders in IT governance • Clearer business objectives for IT investment • More differentiated business strategies • Fewer renegade and more formally approved exceptions – Within UITS, use of standard proposal template • Fewer changes in governance from year to year 48 49 A bit about Ohio Supercomputer Center Strategies A Growing Awareness of HPC’s importance as a competitive tool 50 July 2004 COC/IDC Survey of 33 CIO/CTOs: • Over 70% indicated their companies could not function without HPC; • Over 25% of companies could quantify HPC’s ROI to their businesses: - saved millions of dollars, or - shortened production development cycles, or - provided faster product-to-market timing. This slide courtesy and © Dr. S. Ahalt, Director, Ohio Supercomputer Center 51 But industrial adoption of HPC lags… From the same Council on Competitiveness IDC report: • 65% of the reporting companies have important, but currently unsolved computational problems; • 35% need faster computers for their problems HPC has the potential to impact: • Workforce productivity • Engineering design • Manufacturing This slide courtesy and © Dr. S. Ahalt, Director, Ohio Supercomputer Center 52 Tools – The Biggest Barrier • GUI made desktop computing broadly accessible, and, • Web browsers made networking popular. • HPC hardware and software are hard to use, but, • HPC companies have little reason to forge new tools and utilities, although • Industry needs to tackle more complex models in a much wider context, • Cost of developing HPC tools versus other business investments is problematic…. This slide courtesy and © Dr. S. Ahalt, Director, Ohio Supercomputer Center 53 A “typical” job distribution at OSC This slide courtesy and © Dr. S. Ahalt, Director, Ohio Supercomputer Center A Proposed Goal: Full-Spectrum HPC, aka, “Blue Collar Computing™” 54 • Full spectrum focus – from small jobs to large jobs. • Large jobs of today must become small jobs tomorrow • Need scalable applications – scale up AND scale down! • Industrial application focus • Emphasis on productivity This slide courtesy and © Dr. S. Ahalt, Director, Ohio Supercomputer Center Blue-Collar Computing Ideal Market for HPC Number of Applications Number of Tasks Number of Users 8 55 Blue-Collar HPC Increased Productivity Gains In Industry and Engineering Easy Pickings Competitive Necessity Business ROI Programmer Productivity Increased Gains in Scientific Discovery Current Market for HPC Heroes 1 2 4 64 DoD NSF DoE Amount of Computing Power , Storage , & Capability # of Dollars This slide courtesy and © Dr. S. Ahalt, Director, Ohio Supercomputer Center 56 A sampling of challenge areas in the life sciences 57 Protein structure prediction • Homology – Rosetta • Ab initio – Blue Gene 58 Gene expression microarrays • Usual goal to find groups of genes and/or subjects that behave similarly in an experiment • Data analysis and storage a tremendous challenge! 59 Systems Biology • Special issue of Science: 295, Mar. 2002 • Special issue of Nature: 420, Nov. 2002 • “Systems biology is a new field in biology that aims at a systems-level understanding of biological systems.” • Nobody’s quite sure what it is, but it sure is hot! http://www.ornl.gov/TechResources/Human_Genome/ graphics/slides/images/01-0052_web.gif 60 Example - MCell • MCell is: A General Monte Carlo Simulator of Cellular Microphysiology. http://www.mcell.cnl.salk.edu/ • MCell focuses on simulations using a Brownian dynamics random walk algorithm. • MCell's use to date has been focused on the microphysiology of synaptic transmission. • Images and MCell-related material courtesy of Joel R. Stiles, Pittsburgh SupercomputingCenter and Carnegie Mellon University, and Thomas M. Bartol, Computational Neurobiology Laboratory, The Salk Institute.http://www.mcell.cnl.salk.edu/ 61 62 Gamma Knife • Used to treat inoperable tumors • Treatment methods currently use a standardized head model • UITS is working with IU School of Medicine to adapt Penelope code to work with detailed model of an individual patient’s head 63 Drug Design • • • • • • FDA compliance – 21 CFR part 11 Target generation – so what Target verification – that’s important! Toxicity prediction – VERY important!! (Cholesterol example) Coming fuzziness of the boundary between clinical testing and clinical service • Counterintuitive problem: the more personalized a therapy is, the smaller its target audience! 64 IT vendors and life sciences • Everyone’s for it! • Impact on the market may be strong • Strategies – Hardware vendors establishing competence and proofs of concept: IBM, Sun, Apple – Specialized hardware vendors: TimeLogic, Paracel, Peta Computing – Software companies making tools: Lion, Avaki – IT companies trying to be life science companies – New business models • Beyond Genomics, Rosetta • Entropia, etc. • On demand computing • Biodefense 65 Biomedical research, IT & the future • IT will radically change understanding of biological function and the way biomedical research & development is done. • Life Science IT strategies must take advantage of new capabilities for business advantage • IT will continue to be strategic yet incremental for a long time to come • Advanced IT implementations in health care settings will be only partially successful • The IT companies that prosper will be hardware companies and tool builders, not those that think they are also in the life science business • A “computer rule” to someday replace the FDA animal rule? • BLAST will continue to eat IT shops alive 66 Acknowledgments • Some of the research described herein was supported by the following: – The Indiana Genomics Initiative of Indiana University, supported in part by Lilly Endowment Inc. – Shared University Research grants from IBM, Inc. to Indiana University. – National Science Foundation under Grant No. 0116050 and Grant No. CDA-9601632. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. • Some of the ideas presented here were developed while the senior author was a visiting scientist at Höchstleistungsrechenzentrum Universität Stuttgart. John Herrin, Malinda Lingwall, & W. Lester Teach assisted with graphics 67 Thank you. Questions? For further information about the Research & Academic Computing Division of UITS: racinfo.indiana.edu Several papers of potential interest are linked from www.indiana.edu/~rac/stewart.html A good source of info about IT in life science industries: www.bioitworld.com