Information Technology Leadership & Strategies in the Life Sciences Craig A. Stewart [email protected] Indiana University 2 December 2004 Kelley School of Business © Trustees of Indiana University.

Download Report

Transcript Information Technology Leadership & Strategies in the Life Sciences Craig A. Stewart [email protected] Indiana University 2 December 2004 Kelley School of Business © Trustees of Indiana University.

1
Information Technology Leadership
& Strategies in the Life Sciences
Craig A. Stewart
[email protected]
Indiana University
2 December 2004
Kelley School of Business
© Trustees of Indiana University
License Terms
•
•
•
•
Please cite this presentation as: Stewart, C.A. Information Technology Leadership &
Strategies in the Life Sciences. 2004. Presentation. (Kelley School of Business,
Bloomington, IN, 2 Dec 2004). Available from: http://hdl.handle.net/2022/14781
Portions of this document that originated from sources outside IU are shown here and
used by permission or under licenses indicated within this document.
Items indicated with a © or denoted with a source url are under copyright and used here
with permission. Such items may not be reused without permission from the holder of
copyright except where license terms noted on a slide permit reuse.
Except where otherwise noted, the contents of this presentation are copyright 2004 by
the Trustees of Indiana University. This content is released under the Creative Commons
Attribution 3.0 Unported license (http://creativecommons.org/licenses/by/3.0/). This
license includes the following terms: You are free to share – to copy, distribute and
transmit the work and to remix – to adapt the work under the following conditions:
attribution – you must attribute the work in the manner specified by the author or
licensor (but not in any way that suggests that they endorse you or your use of the work).
For any reuse or distribution, you must make clear to others the license terms of this
work.
3
Outline
• IT Governance in the Life Sciences
• What’s different in the life sciences
– A quick primer on life sciences
– A sample bioinformatics application: BLAST
• Life science strategies
– Health care
– Biomedical research and development
– Creation of new markets
– IU’s strategies
• High performance computing challenges in life sciences
• The response of the IT market to the life sciences
• Predictions about the future
NB: Most of the slides presented herein were generated at Indiana University. Some slides were
graciously provided by colleagues at other institutions, and sources are indicated on those slides
4
IT governance in the life sciences
Archetypes from Weill & Ross
• Monarchy (business or IT)
• Feudal
• Federal
• Anarchy
IT strategies in the life science as effective outcomes
• Intentional mediocrity
• Catch as catch can
• Commitment to excellence with centralized control
– and provision
– and management
• IT for IT’s sake
5
Subdomains within the Life Sciences
• Health care
• Biomedical research and development
– Precompetitive
– Competitive
• Drug development
• Drug testing
• Creation of new markets
6
What’s different in the life sciences?
• Computing in life sciences
is not new
• What is new: highthroughput sequencing &
the possibility of going from
a knowledge of the DNA
sequence to an
understanding of diseases
and health
• The elusive electronic
personal medical record!
http://www.ncbi.nlm.nih.gov/Genbank/genbankstats.html
7
Genome Projects Timeline
•
•
•
•
•
•
•
•
•
•
•
1978
1986
1994
1995
1996
1997
1998
1998
1999
2000
2003
First virus (SV40) sequenced (5224 base pairs)
DOE announces Human Genome Initiative
First complete map of all human chromosomes
First living organism sequenced (H. influenzae) 2 Mb
Yeast (S. cerevisiae) - 12 Mb
Intestinal bacterium (E. coli) - 5 Mb
Nematode worm (C. elegans) - 100 Mb
Celera announcement; Public effort regroups
Human Chromosome 22 – 34 Mb
Joint announcement by NHGRI – Celera
“As good as it gets” human genome
This slide based on slide by Manfred D. Zorn
8
Definitions
• Computational Biology: any use of advanced information
technology in the study of biological problems.
• “Bioinformatics applies the principles of information sciences
and technologies to make the vast, diverse and complex life
sciences data mnore understandable and useful” (NIH BISTIC
Committee grants1.nih.gov/grants/bistic/CompuBioDef.pdf)
• Genomics – study of genomes and gene function
• Proteomics – study of proteins and protein function
• ___omics –
9
Complexity of life sciences
•
Chip design
– All components known
– Device physics for
individual components
known
8
– Itanium has 3 x 10
connections and 2 x 108
devices
– Unified basic currency
(electrons)
– Computer program
required to understand
(e.g. SPICE)
•
Cells
– Components not known
– Function of individual
components not known
– # components ~1013
– No unified basic
currency
Why is it important to know some
biology?
Anopheles gambiae
From www.sciencemag.org/feature/data/
mosquito/mtm/index.html
Source Library:Centers for Disease Control
Photo Credit:Jim Gathany
10
• Would you invest in the
stock market without
knowing how to calculate a
P/E ratio?
• Much current biological
knowledge is very specific
to particular organisms,
genes, or diseases
• If you just wade into the
available data (or hyperbole)
online you can do some very
silly things.
11
Central dogma of biology
• The central dogma of
biology is that genes act to
create phenotypes through a
flow of information form
DNA to RNA to proteins, to
interactions among proteins
(regulatory circuits and
metabolic pathways), and
ultimately to phenotypes.
Collections of individual
phenotypes constitute a
population (first put forward
by Crick in 1958) http://www.ncbi.nlm.nih.gov/About/primer/genetics_cell.html
12
Four (or Five) Bases
• DNA consists of four nucleotides:
Cytosine, Thymine, Adenine, and
Guanine.
• In the double helix, A&T are
always bound, and C&G are
always bound to each other
• RNA consists of four nucleotides
as well: Cytosine, Uracil,
Adenine, and Guanine
• RNA may loop back on itself but
it does not form a double helix
www.ornl.gov/TechResources/Human_Genome/graphics/slides/images/structur.gif
13
Genetic Code
Ala Alanine
Arg Arginine
Asn Asparagine
Asp Aspartic acid
Cys Cysteine
Glu Glutamic acid
Gln Glutamine
Gly Glycine
His Histidine
Ile Isoleucine
http://www.ncbi.nlm.nih.gov/Class/MLACourse/
Original8Hour/Genetics/geneticcode.html
Leu Leucine
Lys Lysine
Met Methionine
Phe Phenylalanine
Pro Proline
Ser Serine
Thr Threonine
Trp Tryptophan
Tyr Tyrosine
Val Valine
Translating DNA to RNA and
Transcribing RNA to Proteins
DNA
AAAAAGGAGCAAATT
1
RNA
One possible amino
acid string
2
4
3
6
5
UUUUUCCUCGUUUAA
Phe
Asn
Asp
Ala
14
15
Alternate splicing
http://www.blc.arizona.edu/marty/411/Modules/altsplice.html
16
http://www.ornl.gov/TechResources/Human_Genome/graphics/slides/images/98-647.jpg
17
Sickle Cell
Normal RBC
• GAG codes for Glutamine
• disc-Shaped, soft
• easily flow through small
blood vessels
• lives for 120 days
Sickle RBC
• GTG codes for Valine
• sickle-Shaped, hard
• often get stuck in small
blood vessels
• lives for 20 days or less
Malaria vs. Anaemia!
http://www.nlm.nih.gov/medlineplus/
ency/imagepages/1223.htm
18
Biomedical information online
http://www.nlm.nih.gov/
• Abstracts of biomedical lit.
largely available online
• Text processing itself is an
interesting problem
• U.S. National Library of
Medicine – NLM Medline
http://www.nlm.nih.gov/
• ~12 million references on
life sciences/biomedicine.
• Actual sequence data
available online at
http://www.ncbi.nlm.nih.gov
and other places!
Why pattern matching (and what are
the problems)
and…
US!
Bonobo
http://www.sandiegozoo.org/special/zoo-featured/pygmy_chimps.html
19
20
Alignments
Matches are good: they get a positive value
• Mismatches are bad: they get a negative value
• Gaps are bad: they get a negative value
– Gap opening penalty
– Gap extension penalty
– Score = Matches –Mismatches
-∑{gap opening penalty +(length)*gap length penalty}
CGTACCGTTAATAT
CGTTCCG . . .ATAT
CGTACCGTTAATAT
CGT. C . GTT .ATAT
21
BLAST Algorithm
• BLAST is a heuristic local alignment and search tool
• Given a search sequence, e.g. ACGTAGGCATGAA
• BLAST first makes a list of all “words” of a given length that
would possibly have a score of at least T against the search
string.
• BLAST then tries to extend the matches as far as possible
• BLAST reports list of the top scoring matches
• Aniridia, cancer
• BLAST can eat you alive!
• Secrecy, BLAST, and public data sources
• Note: there are other algorithms, including algorithms based
on dynamic programming
22
High Performance Computing
• HPC, Supercomputers, Clusters
– “The Mythical Man Month” by Frederick Brooks
– Amdahl’s Law:
Speedup =
•
•
•
•
N
-------------------S*N + (1-S)
Grid Computing
Massive Data Storage Systems
Visualization
What’s it take to fold a protein?
23
IT Strategies in Health Care
• Production and quality (six sigma) environments
– Real time and transaction style environments
• Regulatory issues
– HIPAA (Indianapolis VA hospital example)
– FDA approval
• Standards
– HL7
– LOINC
– SnoMed CT
24
Biomedical Research & Development
• Precompetitive
– Basic science research
– Target identification
• Competitive
–
–
–
–
Target validation
Early phase drug development
Clinical trials
Goal: fail early, fail cheap
25
The biotech/biomedical industries
•
•
•
•
Predator/Prey? Parasite/Host? Symbiotic?
IP protection, profit margins, drugs and devices
IT may be strategic and yet incremental, not fundamental!
Challenges:
– Protein folding
– In silico predictive biology
– Personalized medicine
• Counterintuitive effects of personalized medicine!
• Economics of drug development
26
Creation of new markets
• Doubletwist
www.bioitworld.com/archive/050702/survivor_sidebar_252.html
•
•
•
•
Lion SRS
Rosetta
Beyond Genomics (“the systems biology company”)
Analytic platforms or information technology platforms?
27
IU strategies in IT
• IU’s goal is to be a leader, in absolute terms, in the creation
and use of information technology.
• Enable achievement of goal set by President Herbert: double
IU’s research funding by end of the decade
• Enlightened monarchy with federalized advice and extensive
attention to customer engagement
• Transparency and accountability
• IU IT Strategic Plan
(support.uits.iu.edu/scripts/ose.cgi?anvz.help&osecat=about)
• How do you measure value in an academic environment?
– Cost avoidance
– Enhanced grant competitiveness
– Tech transfer
– Rankings
IU strategies in advanced computing and
life sciences
• The mission of the Research and Academic Computing (RAC)
division of UITS is to provide and support the world-class
research computing resources that enable new scientific and
artistic breakthroughs at Indiana University. RAC supports
IU's researchers, scientists, artists, clinicians, and students;
fosters collaborations; and aids innovations that advance
information technology at IU and in the state of Indiana. RAC
systems and services support all IU campuses.
• Heavily centralized environment
• Low barriers to entry
• Leverage and flexibility
28
29
A mission to support researchers and artists in co-creating the future
A foundation of reliable services
This slide from Dr. Bradley C. Wheeler. All computing/research images from Indiana University sites
30
Research & Academic Computing
Our Work
Front Office
Our Objective
Reliable
Services
Co-Creating
the Future
Researcher
Consulting
& Education
Grant Initiation,
Collaboration,
Fulfillment
This slide from Dr. Bradley C. Wheeler
Back Office
Systems
Administration
Engineering
Computing
Frontiers
Dr. Kate Pilachoski, Professor of Astronomy
31
“Double External Funding by AY10-11”
Win
Grant $$
Deliver
Results
∞
Grant Initiation,
Collaboration,
Fulfillment
Acquire
IT & Staff
Develop
Competencies
Ever Advancing Frontiers…
•High Performance Computing
•Mass Research Storage
•Visualization
•Networks (Telecom)
•Consulting (Stat, Linux)
•Digital Libraries
Engineering
Computing
Frontiers
Researcher
Consulting
& Education
Systems
Administration
RAC Works via Relationships &
Technical and Domain Competence
This slide from Dr. Bradley C. Wheeler.
32
Centralized control & provision
33
IBM Research SP
(Aries/Orion Complex)
• 1.005 TeraFLOPS. 1st
University-owned
supercomputer in US to
exceed 1 TFLOPS peak
theoretical processing
capacity.
• Geographically distributed
at IUB and IUPUI
• Initially 50th on Top500
supercomputer list in 2001
Photo: Tyagan Miller. May be reused by IU for noncommercial
purposes. To license for commercial use, contact the photographer
34
AVIDD
• Analysis and Visualization
of Instrument-Driven Data
• Distributed Linux cluster.
Three locations: IUN,
IUPUI, IUB
• 2.164 TFLOPS, 0.5 TB
RAM, 10 TB Disk
• First distributed Linux
cluster to achieve more than
1 TFLOPS on Linpack
benchmark – initially 50th
on Top500 list in 2003
35
Massive Data Storage System
• HPSS (High Performance
Software System)
• Automatic replication of data
between Indianapolis and
Bloomington, via I-light.
• 180 TB capacity with
existing tapes; total capacity
of 2.4 PB.
• 100 TB currently in use; >5
TB for biomedical data
• Used to hold data for many
studies, including an
international study of Fetal
Alcohol Spectrum Disorder
Photo: Tyagan Miller. May be reused by IU for noncommercial
purposes. To license for commercial use, contact the photographer
IT 414 High-Resolution Display Wall
IT 403 Reconfigurable Virtual Reality Theater
This slide from Dr. Bradley C. Wheeler.
36
ICTC
Advanced
Visualization
Facilities
37
Applications!
• Commercial
– Site licenses (e.g. SPSS, Mathematica)
– Central provisioning of apps that provide differentiation
• Role of Open Source
– Niche apps vs Sakai
– Examples:
• fastDNAml
• PENELOPE
• Hybrids (e.g. SBML & Mathematica)
• Apps… and support
38
Centralized coordination & harvesting
• Computation – Condor & SMBL
• Coordinated software purchases
• Data federation
– Federated database approach
focuses on establishing glue
between existing databases
– “Private” databases stay
where they are – under local
control
– “Public” databases may be
replicated locally for
performance
Lab
Result
s
DL
Clinica
l Data
Toxicit
y Data
39
40
Hereditary Diseases and Family Studies Division, Dept. of Medical and Molecular
Genetics, IU School of Medicine. Supported in part by NIH R01 NS37167.
41
IU Life science strategies
• HPC
– No distinction between supercomputer users and nonsupercomputer users
– The Fritos model
– Engagement
– Accountability (racinfo.indiana.edu)
– Not Irish elk, but attention to proofs of excellence
• Storage
– Unique capabilities and massive capacity
• Visualization
– Unique capabilities and excellent support
42
50
IBM SPd
IBM SPd
AVIDD
IBM SPd
AVIDD
Top500 Rank
SGI
250
“ACTION 29: In order to maintain its
positionSGI
of leadership in the constantly
IBM SPc
changing field of high performance
computing, the University should plan
to continuously upgrade and replace
its high-performance
computing
SGI
facilities to keep them at a level that
satisfies the increasing demand for
IBM SPb
computational power.”
--- IT Strategic Plan
IBM SPa
IBM SPd
AVIDD
IBM SPd
IBM SPc
SGI
450
IBM SPd
Sun
IBM SPc
1996 1997 1997 1998 1998 1999 1999 2000 2000 2001 2001 2002 2002 2003 2003 2004
Nov June Nov June Nov June Nov June Nov June Nov June Nov June Nov June
Year
This slide from Dr. Bradley C. Wheeler
Data from http://top500.org
43
Rating IU’s governance effectiveness
• Questions from Weill & Ross
–
–
–
–
Cost-effective use of IT
Effective use of IT for asset utilization
Effective use of IT for growth
Effective use of IT for flexibility
44
Data from Balanced Scorecard
45
46
IU at SC04
Special purpose Computational Grid:
IU/HLRS 2003 HPC Challenge
• Global analysis of Arthropod evolution
• One application: fastDNAml
• 8 types of systems; 641 processors; 6 continents
• 200 trees analyzed
47
IU and life sciences IT on 7 characteristics
from Weill & Ross
• More managers in Leadership Positions could describe IT
Governance
• Engage, engage, engage
• More direct involvement of the senior leaders in IT
governance
• Clearer business objectives for IT investment
• More differentiated business strategies
• Fewer renegade and more formally approved exceptions
– Within UITS, use of standard proposal template
• Fewer changes in governance from year to year
48
49
A bit about Ohio Supercomputer
Center Strategies
A Growing Awareness of HPC’s
importance as a competitive tool
50
July 2004 COC/IDC Survey of 33 CIO/CTOs:
• Over 70% indicated their companies could not
function without HPC;
• Over 25% of companies could quantify HPC’s
ROI to their businesses:
- saved millions of dollars, or
- shortened production development cycles, or
- provided faster product-to-market timing.
This slide courtesy and © Dr. S. Ahalt, Director, Ohio Supercomputer Center
51
But industrial adoption of HPC lags…
From the same Council on Competitiveness IDC report:
• 65% of the reporting companies have important, but currently
unsolved computational problems;
• 35% need faster computers for their problems
HPC has the potential to impact:
• Workforce productivity
• Engineering design
• Manufacturing
This slide courtesy and © Dr. S. Ahalt, Director, Ohio Supercomputer Center
52
Tools – The Biggest Barrier
• GUI made desktop computing
broadly accessible, and,
• Web browsers made networking
popular.
• HPC hardware and software are hard
to use, but,
• HPC companies have little reason to
forge new tools and utilities, although
• Industry needs to tackle more
complex models in a much wider
context,
• Cost of developing HPC tools versus
other business investments is
problematic….
This slide courtesy and © Dr. S. Ahalt, Director, Ohio Supercomputer Center
53
A “typical” job distribution at OSC
This slide courtesy and © Dr. S. Ahalt, Director, Ohio Supercomputer Center
A Proposed Goal: Full-Spectrum HPC,
aka, “Blue Collar Computing™”
54
• Full spectrum focus – from
small jobs to large jobs.
• Large jobs of today must
become small jobs
tomorrow
• Need scalable applications –
scale up AND scale down!
• Industrial application focus
• Emphasis on productivity
This slide courtesy and © Dr. S. Ahalt, Director, Ohio Supercomputer Center
Blue-Collar Computing
Ideal Market for HPC
Number of Applications
Number of Tasks
Number of Users
8
55
Blue-Collar HPC
Increased
Productivity Gains
In Industry and
Engineering
Easy Pickings
Competitive Necessity
Business ROI
Programmer Productivity
Increased
Gains in
Scientific Discovery
Current Market
for HPC
Heroes
1 2
4
64
DoD NSF DoE
Amount of Computing Power , Storage , & Capability
# of Dollars
This slide courtesy and © Dr. S. Ahalt, Director, Ohio Supercomputer Center
56
A sampling of challenge areas in the
life sciences
57
Protein structure prediction
• Homology
– Rosetta
• Ab initio
– Blue Gene
58
Gene expression microarrays
• Usual goal to find groups of
genes and/or subjects that
behave similarly in an
experiment
• Data analysis and storage a
tremendous challenge!
59
Systems Biology
• Special issue of Science:
295, Mar. 2002
• Special issue of Nature:
420, Nov. 2002
• “Systems biology is a
new field in biology that
aims at a systems-level
understanding of
biological systems.”
• Nobody’s quite sure what
it is, but it sure is hot!
http://www.ornl.gov/TechResources/Human_Genome/
graphics/slides/images/01-0052_web.gif
60
Example - MCell
• MCell is: A General Monte Carlo Simulator of Cellular
Microphysiology. http://www.mcell.cnl.salk.edu/
• MCell focuses on simulations using a Brownian dynamics random
walk algorithm.
• MCell's use to date has been focused on the microphysiology of
synaptic transmission.
• Images and MCell-related material courtesy of Joel R. Stiles,
Pittsburgh SupercomputingCenter and Carnegie Mellon University,
and Thomas M. Bartol, Computational Neurobiology Laboratory,
The Salk Institute.http://www.mcell.cnl.salk.edu/
61
62
Gamma Knife
• Used to treat inoperable
tumors
• Treatment methods
currently use a standardized
head model
• UITS is working with IU
School of Medicine to adapt
Penelope code to work with
detailed model of an
individual patient’s head
63
Drug Design
•
•
•
•
•
•
FDA compliance – 21 CFR part 11
Target generation – so what
Target verification – that’s important!
Toxicity prediction – VERY important!!
(Cholesterol example)
Coming fuzziness of the boundary between clinical testing and
clinical service
• Counterintuitive problem: the more personalized a therapy is,
the smaller its target audience!
64
IT vendors and life sciences
• Everyone’s for it!
• Impact on the market may be strong
• Strategies
– Hardware vendors establishing competence and proofs of concept:
IBM, Sun, Apple
– Specialized hardware vendors: TimeLogic, Paracel, Peta Computing
– Software companies making tools: Lion, Avaki
– IT companies trying to be life science companies
– New business models
• Beyond Genomics, Rosetta
• Entropia, etc.
• On demand computing
• Biodefense
65
Biomedical research, IT & the future
• IT will radically change understanding of biological function
and the way biomedical research & development is done.
• Life Science IT strategies must take advantage of new
capabilities for business advantage
• IT will continue to be strategic yet incremental for a long time
to come
• Advanced IT implementations in health care settings will be
only partially successful
• The IT companies that prosper will be hardware companies
and tool builders, not those that think they are also in the life
science business
• A “computer rule” to someday replace the FDA animal rule?
• BLAST will continue to eat IT shops alive
66
Acknowledgments
• Some of the research described herein was supported by the
following:
– The Indiana Genomics Initiative of Indiana University,
supported in part by Lilly Endowment Inc.
– Shared University Research grants from IBM, Inc. to Indiana
University.
– National Science Foundation under Grant No. 0116050 and
Grant No. CDA-9601632. Any opinions, findings and
conclusions or recommendations expressed in this material are
those of the author(s) and do not necessarily reflect the views
of the National Science Foundation.
• Some of the ideas presented here were developed while the senior
author was a visiting scientist at Höchstleistungsrechenzentrum
Universität Stuttgart. John Herrin, Malinda Lingwall, & W. Lester
Teach assisted with graphics
67
Thank you.
Questions?
For further information about the Research & Academic
Computing Division of UITS:
racinfo.indiana.edu
Several papers of potential interest are linked from
www.indiana.edu/~rac/stewart.html
A good source of info about IT in life science industries:
www.bioitworld.com