Transcript 投影片 1
Introduction to Systems
Biology
國立台灣大學資訊工程系
博士後研究員 詹鎮熊
What is a system?
Features of a system
Components
Interrelated components
Boundary
Purpose
Environment
Interfaces
Input
Output
Constrain
Examples of Systems
Life‘s Complexity Pyramid
System
Functional
modules
Building
blocks
Components
Z. N. Oltvai and A.-L. Barabási, Science 298, 763 (2002)
生物圈
個體
生態體系
器官系統
社區
組織
族群
細胞
個體
分子
原子
個體 – 細胞 – 胞器 – 分子
Organism – Cell – Organelle – Molecules
人體由上兆個細胞組成
每個細胞具有:
46 條染色體
2 米長的DNA
30 億個鹼基 (A, T, G, C)
2~3萬個基因
The
Central
Dogma
Bottom-up
From genes to phenotypes
If the genome sequence can be fully
sequenced, can we resolve all the
secrets hidden in the DNA?
The -omics (-ome) era
Genomics (Genome)
Human Genome Project
Other Genome Projects
Mouse
Fly
Dog
Worm
Bacteria
…
Most recently … Cat
Human genome project
Sequence the whole genome
sequence of several individuals
Competition between Celera and NIH
Took over a decade
Draft in 2000, complete in 2003
The next stage: HapMap
HapMap is a catalog of common
genetic variants that occur in human
beings
It describes:
what these variants are
where they occur in our DNA
and how they are distributed among
people within populations and among
populations in different parts of the
world
Single Nucleotide Polymorphism
(SNP)
Personalized genome
James Watson (454 Life Science)
Craig Venter (Venter Institute)
23andme (backed by Google, focus
on social/family relationships)
Navigenics (focus on medical
conditions)
Personal Genome Project (PGP,
Harvard)
Proteomics (Proteome)
Categorize all proteins (and their
relationships) in a temporal-spatial
confined system
Identities of these proteins
Quantities
Variants of these proteins
Alternative splicing forms
Post-translational modifications
(Phosphorylation, Methylation, Ubiqutination, …)
Proteomics
Mass Spectrometry
Fluorescence Resonance Energy
Transfer (FRET)
Co-localization
(interaction)
between proteinprotein, proteinDNA pairs
Transcriptome
Identify all transcription factors (TF)
functioning in a specific temporalspatial confined system
Identify all genes regulated by
specific TFs
ChIP-chip
TransFac database
Chromatin Immuno-Precipitation
(ChIP)
a well-established
procedure used to
investigate
interactions
between DNAbinding proteins
and DNA in vivo
ChIP-chip
Transcription Factor Binding Motifs
Interactome
Categorized all interactions (proteinprotein or protein-DNA) within an
organism
Yeast Two-Hybrid
Immuno-coprecipitation (co-IP)
Mass Spectrometry
FRET
…
Yeast Two-hybrid
Metabolomics (Metabolome)
“systematic study of the unique
chemical fingerprints that specific
cellular processes leave behind”
Collection of all metabolites in a
biological organism
Analytical methods for
metabolomics
Separation
Gas Chromatography (GC)
High performance liquid chromatography
(HPLC)
Capillary electrophoresis (CE)
Detection
Mass Spectrometry
Nuclear magnetic resonance (NMR)
spectroscopy
Glycomics
Oligosaccharide
Glycoprotein/Proteoglycan
Proteins attached to oligosaccharides
Important to cell recognition
Cancer targeting
Influenza
Model Organisms
Yeast (S. cerevisiae)
Worm (C. elegans)
Fruit Fly (D. melanogaster)
Mouse (M. musculus)
Monitoring the System
High throughput monitoring of gene
expression
Microarray
Protein microarray
GC/HPLC/MASS/Tandem MASS
Phenotype/Disease
Microarray
Protein Microarray
Phenotypes
Lethality
Synthetic lethal
Developmental
Morphological
Behavioral
Diseases
Genotypes and Phenotypes
+ environment
→ phenotype
genotype +genotype
environment
+ random-variation
→ phenotype
Importance of Computer
Models
Interactions in cell are too complex to
handle by pen-and-paper
With high-throughput tools, biology
shifts from descriptive to predictive
Computers are required to store,
processing, assemble, and model all
high-throughput data into networks
Types of Computer Models
Chemical Kinetic Model
Defined by concentrations of different molecular
species in the cell
Represented with a number of equations
Some processes may be stochastic
Simplified Discrete Circuit
Network with nodes and arrows
Nodes represent quantity or other attributes
Directed edges represent effect of nodes on
other nodes
Different Mathematical
Formulations
Differential Equations
Linear (ordinary)
Partial
Stochastic
S-Systems
Power-law
formulation
Captures complicate
dynamics
Parameter estimation
is computation
intensive
Model details
Selection of genes, gene products,
and other molecules to be included
Cellular compartments: nucleus, golgi,
or other organelles
Too much details may lead to more
noises
Minimal model able to predict system
properties (mRNA level, growth rate,
etc) is sufficient
Construct Model from Global
Patterns
Microarray gene expression patterns:
Up-regulated/down-regulated
Gene expression profiles under
different conditions: Tumor/normal,
cell cycle, drug treatment, …
Methods:
Bayesian Inferences
Machine learning (clustering,
classification)
…
Framework for Systems Biology
Tools for Simulation
E-cell
Cell Illustrator
Virtual Cell
Standardizing efforts:
BioJake
SBML (systems biology markup language)
Facilitate the exchange of models
E-Cell System
A software to construct object models
equivalent to a cell system or a part
of the cell system
Employing Structured VariableProcess model (previously called the
Substance-Reactor model, or SRM)
Objects:
Variables, Processes, Systems
Cell Illustrator
Computational Databases
Protein-protein interaction
DIP, BIND, MIPS, MINT, IntAct, POINT, BioGRID
Protein-DNA interaction
TRANSFAC, SCPD
Metabolic pathways
KEGG, EcoCyc, WIT, Reactome
Gene Expression
GEO, ArrayExpress, GNF, NCI60, commercial
Gene Ontology
Network Biology
The entities within a system form
intertwined complex networks
Genes
Proteins
Metabolites
External factors…
Gene (Transcription) Regulatory
Network
Protein-Protein Interaction Network
Metabolic Pathways
KEGG
metabolic
pathway
Gene Ontology
The Gene Ontology project provides a
controlled vocabulary to describe
gene and gene product attributes in
any organism
Annotations
Molecular Function
Cellular Components
Biological Processes
Challenges of Databases
Provide information other than simple
entries (e.g. PPI with functional
annotation or binding strength)
Data maintenance – update
Integration with other databases
Applications
Target identification and drug
discovery
Disease Gene Identification
From networks
From literature
From microarray
Quantitative Trait Loci (QTL)
Genome-Wide Association Study (GWAS)
Endeavour
Systems biology (integrated) approaches?
Drug Targets
Gene identification from
network
Nodes
Hubs
Edges (interactions)
Define critical genes from connected
edges?
Shortest path, alternative path?
Weights
Metabolic pathways as well
Gene identification from literature
OMIM (Online Mendelian Inheritance
in Men)
Single gene disease
Complex disease
Defects identified, target for drugs
and cures
Gene identification from microarray
Up-regulated genes
Down-regulated genes
Too many?
Cluster of genes
Regulator (transcription factors) for
the important clusters
Quantitative Trait Loci (QTL)
Region of DNA that is associated with
a particular phenotypic trait
Phenotypic characteristic varies in
degree and attributes to interaction
between two or more genes
QTL may not be gene itself, but as a
sequence of DNA, is closely linked
with the target gene
Quantitative Trait Loci
LOD (log odd ratio): how likely to
observe a locus for a group with
specific trait (phenotype)
Expression QTL (e-QTL): combine
microarray for gene expression
(identify transcription regulatory
elements as QTL)
cM: centimorgan, 1,000,000 bases in
chromosome
Genome-Wide Association Studies
(GWAS)
Genome-wide association studies
(GWAS) rely on newly available
research tools and technologies to
rapidly and cost-effectively analyze
genetic differences between people
with specific illnesses, such as
diabetes or heart disease, compared
to healthy individuals.
Keys to success of GWAS
Population Resource
Large sample size required for significant
detection
SNP Map and Genotyping
High-throughput genotyping
IT and Analysis Tool
Storage and analysis (1000 microarrays
for billions of data points)
What have GWAS found?
Genes associated with risks of:
type 2 diabetes
Parkinson's disease
heart disorders
Obesity
prostate cancer
…
An integrated approach: Endeavour
Genes can obtain various scores regarding
their association with disease
These scores include those mentioned
above
The various ranks of these genes according
to different scores are determined
With a consensus scoring scheme (data
fusion), the resulting prediction accuracy
could be improved
Aerts, et al. (2006)
Toward personalized medicine
Targeted therapy
Using antibody against biomarkers
(cancer or other infectious agents)
Require prior knowledge of patient
response (through lab tests or
biochips)
Gene therapy
Replace or inhibit genes in patients
Vectors
Adenovirus (AAV)
Silencing the disease gene
RNAi
microRNA
RNA interference
Putting All Together
Network of Networks
Gene regulation (protein-DNA)
Protein-protein interaction
Metabolic pathway
How…?
Questions?