Transcript 投影片 1

Introduction to Systems
Biology
國立台灣大學資訊工程系
博士後研究員 詹鎮熊
What is a system?
Features of a system









Components
Interrelated components
Boundary
Purpose
Environment
Interfaces
Input
Output
Constrain
Examples of Systems
Life‘s Complexity Pyramid
System
Functional
modules
Building
blocks
Components
Z. N. Oltvai and A.-L. Barabási, Science 298, 763 (2002)
生物圈
個體
生態體系
器官系統
社區
組織
族群
細胞
個體
分子
原子
個體 – 細胞 – 胞器 – 分子
Organism – Cell – Organelle – Molecules
人體由上兆個細胞組成
每個細胞具有:
46 條染色體
2 米長的DNA
30 億個鹼基 (A, T, G, C)
2~3萬個基因
The
Central
Dogma
Bottom-up
 From genes to phenotypes
 If the genome sequence can be fully
sequenced, can we resolve all the
secrets hidden in the DNA?
The -omics (-ome) era
Genomics (Genome)
 Human Genome Project
 Other Genome Projects







Mouse
Fly
Dog
Worm
Bacteria
…
Most recently … Cat
Human genome project
 Sequence the whole genome
sequence of several individuals
 Competition between Celera and NIH
 Took over a decade
 Draft in 2000, complete in 2003
The next stage: HapMap
 HapMap is a catalog of common
genetic variants that occur in human
beings
 It describes:
 what these variants are
 where they occur in our DNA
 and how they are distributed among
people within populations and among
populations in different parts of the
world
Single Nucleotide Polymorphism
(SNP)
Personalized genome
 James Watson (454 Life Science)
 Craig Venter (Venter Institute)
 23andme (backed by Google, focus
on social/family relationships)
 Navigenics (focus on medical
conditions)
 Personal Genome Project (PGP,
Harvard)
Proteomics (Proteome)
 Categorize all proteins (and their
relationships) in a temporal-spatial
confined system
 Identities of these proteins
 Quantities
 Variants of these proteins
 Alternative splicing forms
 Post-translational modifications
(Phosphorylation, Methylation, Ubiqutination, …)
Proteomics
Mass Spectrometry
Fluorescence Resonance Energy
Transfer (FRET)
 Co-localization
(interaction)
between proteinprotein, proteinDNA pairs
Transcriptome
 Identify all transcription factors (TF)
functioning in a specific temporalspatial confined system
 Identify all genes regulated by
specific TFs
 ChIP-chip
 TransFac database
Chromatin Immuno-Precipitation
(ChIP)
 a well-established
procedure used to
investigate
interactions
between DNAbinding proteins
and DNA in vivo
ChIP-chip
Transcription Factor Binding Motifs
Interactome
 Categorized all interactions (proteinprotein or protein-DNA) within an
organism





Yeast Two-Hybrid
Immuno-coprecipitation (co-IP)
Mass Spectrometry
FRET
…
Yeast Two-hybrid
Metabolomics (Metabolome)
 “systematic study of the unique
chemical fingerprints that specific
cellular processes leave behind”
 Collection of all metabolites in a
biological organism
Analytical methods for
metabolomics
 Separation
 Gas Chromatography (GC)
 High performance liquid chromatography
(HPLC)
 Capillary electrophoresis (CE)
 Detection
 Mass Spectrometry
 Nuclear magnetic resonance (NMR)
spectroscopy
Glycomics
 Oligosaccharide
 Glycoprotein/Proteoglycan
 Proteins attached to oligosaccharides
 Important to cell recognition
 Cancer targeting
 Influenza
Model Organisms




Yeast (S. cerevisiae)
Worm (C. elegans)
Fruit Fly (D. melanogaster)
Mouse (M. musculus)
Monitoring the System
 High throughput monitoring of gene
expression
 Microarray
 Protein microarray
 GC/HPLC/MASS/Tandem MASS
 Phenotype/Disease
Microarray
Protein Microarray
Phenotypes
 Lethality
 Synthetic lethal




Developmental
Morphological
Behavioral
Diseases
Genotypes and Phenotypes
+ environment
→ phenotype
genotype +genotype
environment
+ random-variation
→ phenotype
Importance of Computer
Models
 Interactions in cell are too complex to
handle by pen-and-paper
 With high-throughput tools, biology
shifts from descriptive to predictive
 Computers are required to store,
processing, assemble, and model all
high-throughput data into networks
Types of Computer Models
 Chemical Kinetic Model
 Defined by concentrations of different molecular
species in the cell
 Represented with a number of equations
 Some processes may be stochastic
 Simplified Discrete Circuit
 Network with nodes and arrows
 Nodes represent quantity or other attributes
 Directed edges represent effect of nodes on
other nodes
Different Mathematical
Formulations
 Differential Equations
 Linear (ordinary)
 Partial
 Stochastic
 S-Systems
 Power-law
formulation
 Captures complicate
dynamics
 Parameter estimation
is computation
intensive
Model details
 Selection of genes, gene products,
and other molecules to be included
 Cellular compartments: nucleus, golgi,
or other organelles
 Too much details may lead to more
noises
 Minimal model able to predict system
properties (mRNA level, growth rate,
etc) is sufficient
Construct Model from Global
Patterns
 Microarray gene expression patterns:
Up-regulated/down-regulated
 Gene expression profiles under
different conditions: Tumor/normal,
cell cycle, drug treatment, …
 Methods:
 Bayesian Inferences
 Machine learning (clustering,
classification)
 …
Framework for Systems Biology
Tools for Simulation
 E-cell
 Cell Illustrator
 Virtual Cell
 Standardizing efforts:
 BioJake
 SBML (systems biology markup language)
 Facilitate the exchange of models
E-Cell System
 A software to construct object models
equivalent to a cell system or a part
of the cell system
 Employing Structured VariableProcess model (previously called the
Substance-Reactor model, or SRM)
 Objects:
 Variables, Processes, Systems
Cell Illustrator
Computational Databases
 Protein-protein interaction
 DIP, BIND, MIPS, MINT, IntAct, POINT, BioGRID
 Protein-DNA interaction
 TRANSFAC, SCPD
 Metabolic pathways
 KEGG, EcoCyc, WIT, Reactome
 Gene Expression
 GEO, ArrayExpress, GNF, NCI60, commercial
 Gene Ontology
Network Biology
 The entities within a system form
intertwined complex networks




Genes
Proteins
Metabolites
External factors…
Gene (Transcription) Regulatory
Network
Protein-Protein Interaction Network
Metabolic Pathways
KEGG
metabolic
pathway
Gene Ontology
 The Gene Ontology project provides a
controlled vocabulary to describe
gene and gene product attributes in
any organism
 Annotations
 Molecular Function
 Cellular Components
 Biological Processes
Challenges of Databases
 Provide information other than simple
entries (e.g. PPI with functional
annotation or binding strength)
 Data maintenance – update
 Integration with other databases
Applications
Target identification and drug
discovery
Disease Gene Identification







From networks
From literature
From microarray
Quantitative Trait Loci (QTL)
Genome-Wide Association Study (GWAS)
Endeavour
Systems biology (integrated) approaches?
Drug Targets
Gene identification from
network
 Nodes
 Hubs
 Edges (interactions)
 Define critical genes from connected
edges?
 Shortest path, alternative path?
 Weights
 Metabolic pathways as well
Gene identification from literature
 OMIM (Online Mendelian Inheritance
in Men)
 Single gene disease
 Complex disease
 Defects identified, target for drugs
and cures
Gene identification from microarray





Up-regulated genes
Down-regulated genes
Too many?
Cluster of genes
Regulator (transcription factors) for
the important clusters
Quantitative Trait Loci (QTL)
 Region of DNA that is associated with
a particular phenotypic trait
 Phenotypic characteristic varies in
degree and attributes to interaction
between two or more genes
 QTL may not be gene itself, but as a
sequence of DNA, is closely linked
with the target gene
Quantitative Trait Loci
 LOD (log odd ratio): how likely to
observe a locus for a group with
specific trait (phenotype)
 Expression QTL (e-QTL): combine
microarray for gene expression
(identify transcription regulatory
elements as QTL)
 cM: centimorgan, 1,000,000 bases in
chromosome
Genome-Wide Association Studies
(GWAS)
 Genome-wide association studies
(GWAS) rely on newly available
research tools and technologies to
rapidly and cost-effectively analyze
genetic differences between people
with specific illnesses, such as
diabetes or heart disease, compared
to healthy individuals.
Keys to success of GWAS
 Population Resource
 Large sample size required for significant
detection
 SNP Map and Genotyping
 High-throughput genotyping
 IT and Analysis Tool
 Storage and analysis (1000 microarrays
for billions of data points)
What have GWAS found?
 Genes associated with risks of:






type 2 diabetes
Parkinson's disease
heart disorders
Obesity
prostate cancer
…
An integrated approach: Endeavour
 Genes can obtain various scores regarding
their association with disease
 These scores include those mentioned
above
 The various ranks of these genes according
to different scores are determined
 With a consensus scoring scheme (data
fusion), the resulting prediction accuracy
could be improved
Aerts, et al. (2006)
Toward personalized medicine
Targeted therapy
 Using antibody against biomarkers
(cancer or other infectious agents)
 Require prior knowledge of patient
response (through lab tests or
biochips)
Gene therapy
 Replace or inhibit genes in patients
 Vectors
 Adenovirus (AAV)
 Silencing the disease gene
 RNAi
 microRNA
RNA interference
Putting All Together
Network of Networks
 Gene regulation (protein-DNA)
 Protein-protein interaction
 Metabolic pathway
 How…?
Questions?