Transcript Slide 1

Approach
Component
examined
Techniques
Genomics
Genes
Transcriptomics
mRNA
DNA arrays
GeneChip
Proteomics
Proteins
2D PAGE
MALDI-MS
ESI-MS
Metabolomics
Metabolites
GC-MS
Sequencing Programs
mRNA level  expressed protein level nor does it
indicate the nature of the functional protein product
Genomic
Sequence
mRNA
Protein
Product
Functional
Protein
Product
Translational
Control
Transcriptional
Control
Post-Translational
Control
Temporal Changes in mRNA and protein
t
Gene
t
Expression
t
Protein
When you measure expression affects what you find
Does mRNA level correlate with protein level?
1000
1000
100
10
1
0.1
Glutathione-S-transferase
in 60 human cell lines
mRNA (Northern)
mRNA (EST clones)
20 liver proteins
and corresponding mRNAs
R=0.48
0.1
1
10
Protein (2D gels)
Anderson & Seilhamer
Electrophoresis
1997 18:533-537
100
100
x
10
x
x
xx
1.0
R = 0.43
0.1
0.1
1.0
10
100
Protein (Affinity-HPLC)
Anderson & Anderson
Electrophoresis
1998 19:1853-1861
From Tew et al 1996
Lung
Ovarian
x
CNS
Leukemia
Renal
Melanoma
Breast
Challenges of proteins vs DNA
DNA
• Static
• Can be amplified
• Little complexity:
Single component
• Good solubility
characteristics
Protein
• Very dynamic
• Cannot be amplified
• Very complex:
post-translational
modification
• Variable solubility
Identifying new protein complexes:
Isolation of proteins using:
Classical Purification +1D PAGE
Tag Purification +1D PAGE
Phenotypic Complexity of the Eukaryotic Proteome
Domain Expansion
Evolution
Somatic
Domain Accretion
• Duplication
• Divergence
• Recombination
Recombination
Protein Architecture
Paralogous Expansion
Somatic Rearrangement
Horizontal Transfer
Protein Diversity
Alternative Splicing
Modifications
Functional Diversity
Protein Interactions
de novo
Biological Processes
Systems
Eukaryotic Proteomes
Proteome
Human
Number of Genes
% of DB Matches*
Fly
Worm
31,778 13,338 18,266
51
56
50
Yeast
Mustard Weed
6,144
25,706
50
52
(* Similarity search of protein sequences in the database)
Comparative Analysis of Proteomic Pheno-Complexity
Functional Diversity
Eubacteria
Protein Diversity
Eukarya
Domain Accretion
Archaea
Unicelluar
Organisms
Invertebrates
Conserved
Core Proteins
Vertebrates
LineageSpecific Proteins
Protein Architecture
Mammals
VertebrateSpecific Proteins
Human
Protein Sequence Homology
(1) Protein Match with Known or Unknown Function
Query
Match
(2) Domain Match with Known or Unknown Function
Query
Match
Ortholog: A evolutionarily conserved gene that arose during speciation
Paralogs: Genes that arose due to intra-genome duplication in a species
Protein Sequence Comparison
(I) Homology
• > 40 % : Same Function
• 25-40 % : Similar Function
• < 25 % : Different Function
(II) Distance
• Phylogenetic Tree
Comparative Proteomics
Domain/Protein*
Yeast
1
0
Worm
1
1
Fly
1
1
Weed
1
1
Human
1
1
0
1
1
0
1
Animal-specific
0
0
0
0
1
Vertebrate-specific
Eukaryote-specific
Metazoan-specific
*: The domain/protein is present (1) or absent (0) in the proteome.
Eukaryotic Proteomes Shared with Humans
Human
61%
43%
Fly
Worm
46%
Yeast
Conserved Core Groups in Eukaryotes
Human
(3,109 Proteins)
Conserved
Fly
Yeast
Core Proteins in
(1,445 Proteins)
(1,441 Proteins)
1,308 Groups
Worm
(1,503 Proteins)
Vertebrate-specific Proteins
Unicelluar
Organisms
Invertebrates
Eukaryote and
Prokaryote
21%
32%
Other Eukaryotes
And Animals
Vertebrates
Human
Mammals
22%
Human
VertebrateSpecific Proteins
24%
Vertebrates and
Other Animals
Comparative Pheno-Complexity
Functional Diversity
Bacteria
Protein Diversity
Eukarya
Domain Accretion
Archeae
Unicelluar
Organisms
Invertebrates
Conserved
Core Proteins
Housekeeping Functions
• Engery/Metabolism
• DNA replication/Repair
• Translation
Vertebrates
VertebrateSpecific Proteins
Physiological Differences
• Defense & Immunity
• Cell-Cell Communications
• Nervous System
Protein Architecture
Mammals
Human
LineageSpecific Proteins
Protein Diversity in Eukaryotes
• Horizontal Gene Transfer
• Invention of Protein Domain
• Expansion of Protein/Domain Families
• Evolution of New Protein Architectures
Lateral Gene Transfer
Bacteria
223 Genes
• Hydrolase
• Oxidoreductase
• Dehydrogenase
• Monoamine Oxidase
• Transporter
Human
• Lineage Specific
• Intron Acquisition
Comparative Pheno-Complexity
Functional Diversity
Bacteria
Protein Diversity
Eukarya
Domain Accretion
Archeae
Unicelluar
Organisms
Invertebrates
Conserved
Core Proteins
Housekeeping Functions
• Engery/Metabolism
• DNA replication/Repair
• Translation
Vertebrates
VertebrateSpecific Proteins
Physiological Differences
• Defense & Immunity
• Cell-Cell Communications
• Nervous System
Protein Architecture
Mammals
Human
LineageSpecific Proteins
Protein Function Assignment
12 Function Categories (Gene Ontology Project)
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
Cellular Processes
Metabolism
DNA Replication/Modification
Transcription/Translation
Intracellular Signaling
Cell-Cell Communication
Protein Folding/Degradation
Transport
Multifunctional Proteins
Cytoskeletal/Structural
Defense and Immunity
Miscellaneous Function
Classification of Proteome
(1) Functional Categories
(2) Evolutionary Conservation
(3) Structural Classification
Cellular Function
Protein
Sequence
Functional
Annotation
Domain/Motif
Databases
PRINTS, Prosite,
Pfam, Prosite Profile
~50% of Eukaryotes
New Proteins and Domains in Vertebrates
Bacteria
Eukarya
Archeae
Unicelluar
Organisms
Yeast
Invertebrates
Worm
Vertebrates
Mammals
Human
Fly
VertebrateSpecific Proteins
94 (7%)/1,262 InterPro Families
70 Proteins
24 Domains
Functions
Physiological Differences
• Defense & Immunity
• Cell-Cell Communications
• Nervous System
• Few new protein domains invented
• Common ancestral domains in animals
Protein Domain
• An evolutionary unit
• The coding sequence can be duplicated and/or recombined
• ~100 to 250 residues
• In small proteins or parts of large ones in a domain family
• Descending from a common ancestor
• Duplication: to give arise one or more domains
• Divergence: to generate modified proteins by mutations or In/Del
• Recombination: to produce new domain arrangements
Protein Domain Architecture
(1) Single-domain Protein
(II) Multi-domain Protein
Domain
A
B
C
D
• Prokaryotic Proteome: 2/3 proteins are > 2 domains
• Eukaryotic Proteome: 4/5 proteins are multi-domain
Invention of Protein Domain
Number of Proteins
Domain
Yeast
Worm
Fly
Human
48
7
151
54
357
115
706
188
115
392
EGF-like
0
113
81
222
17
TIR
0
2
8
18
131
Immunoglobulin
0
64
140
765
0
CRAB box
0
0
0
0
0
0
171
1
0
0
C2H2 zinc finger
Leu-rich repeats
Q14 repeats
Weed
• Expansion of paralogous proteins in metazoan
• Invention of new domains in eukaryotic genome evolution
Domain Expansion: Duplication
Number of Proteins(Domains)
Domain
Yeast
Worm
Fly
Human
Weed
RasGAP
3
8
5
11
0
RhoGAP
9
20
19
59
8
ArfGAP
6
8
9
16
15
Ig
0
24
67(323)
65(68)
125(291)
72(78)
381(930)
193(212)
0
23
PH
SH3
23(27)
46(61)
55(75)
143(182)
4
Ank
12(20)
75(223)
72(269)
145(404)
66(111)
Domains are expandable in metazoan!
Rosetta Stone
Similarity Search of Protein Databases
Function 1
Protein A
Protein B
Function 2
Protein X
Functions 1 and 2 due to
domain recombination
Domain Accretion: Recombination
Ancetral Domains in Different Proteins
A
B
C
D
Combinatorial Architecture
A
C
B
D
?
A
C
B
D
Superdomain:
Domain recombination in sequential order
Rho
ArfGAP
Ank Ank Ank
X
PH
ArfGAP
Ank Ank
X
PH
ArfGAP
Ank Ank Ank
X
PH
ArfGAP
Ank Ank Ank
PBS
SH3
Classification of Multi-domain ArfGAP Gene Family
Class
Rho
ArfGAP
Ank Ank Ank
X
PH
ArfGAP
Ank Ank
X
PH
ArfGAP
Ank Ank Ank
X
PH
ArfGAP
Ank Ank Ank
Rho
Ras-like GTPases
X
Domain X
PH
Plecstrin homology domain
ArfGAP
Zinc finger domain
Ank
Ankyrin repeat
PBS
SH3
Paxillin-binding subdomain
PBS
SH3
Src homology domain
Expression of Variants in Multiple Human Tissues:
KIAA1099.0 and .1
KIAA1099.0
C1
1
11
//
12
//
15
16
17
C6
KIAA1099.1
C1
AW993140 (159)
//
11
12
//
15
16
17
1
2
3
4
5
6
7
8
9
LN
Spleen
Amygdala
Brain
S. Muscle
Heart
S. I.
Stomach
C6
Leukocytes
10
11
12
13
14
15
16
Placenta
Testis
Uterus
Lung
Kidney
Liver
KIAA1099.1
KIAA1099.0
M. Gland
1
Expression of Variants in Multiple Human Tissues:
KIAA1099.2 and .3
KIAA1099.2
C1
BE780934 (395)
1
11
//
12
//
15
C5
KIAA1099.3
C1
AW993140 (159)
1
11
//
BE780934 (395)
12
//
15
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Leukocytes
LN
Spleen
Amygdala
Brain
S. Muscle
Heart
S. I.
Stomach
M. Gland
Liver
Kidney
Lung
Uterus
Testis
Placenta
C5
KIAA1099.3
KIAA1099.2
Expressed Diversities of Functional Domains
Class II ArfGAP: KIAA 1099
Transcription
Alternatively Spliced Variant Transcripts
Rho
X
PH
ArfGAP
Rho
X
PH
ArfGAP
Rho
X
PH
ArfGAP
Ank Ank
Ank Ank
• One alternatively spliced transcript lacks ankyrin repeats.
• Other variants have an altered PH domain.
Eukaryotic Protein Diversity
(I) Genome Evolution (Germ-line)
• Lateral Gene Transfer: Bacterial Genes
• Domain Invention: Vertebrate-specific Proteins
• New Architecture: Combinatorial Domain Accretion
• Domain Expansion: Multiple Domains in a Protein
• Paralogous Expansion: Gene Duplication
(II) Gene Expression (Somatic)
• Somatic Rearrangement: Ig & TCR Gene Families
• Alternative Splicing: Protein Isoforms
Alternative Splicing: Domain Ablation or Alteration
Phenotypic Complexity of the Eukaryotic Proteome
Domain Expansion
Evolution
Somatic
Domain Accretion
• Duplication
• Divergence
• Recombination
Recombination
Protein Architecture
Paralogous Expansion
Somatic Rearrangement
Horizontal Transfer
Protein Diversity
Alternative Splicing
Modifications
• Domain ablation
• Domain alteration
Functional Diversity
Protein Interactions
de novo
Biological Processes
Systems
Integrated Life Sciences in the Post-Genomic Era
Genome
Proteome
Protein Diversity
Functional Proteomics
Gene Repertoire
Protein Repertoire
Functional Diversity
Structural Proteomics
Biological Processes
Physiome
Cellome
Metabolome
Patholome
Systems Biology