Transcript Slide 1
Approach Component examined Techniques Genomics Genes Transcriptomics mRNA DNA arrays GeneChip Proteomics Proteins 2D PAGE MALDI-MS ESI-MS Metabolomics Metabolites GC-MS Sequencing Programs mRNA level expressed protein level nor does it indicate the nature of the functional protein product Genomic Sequence mRNA Protein Product Functional Protein Product Translational Control Transcriptional Control Post-Translational Control Temporal Changes in mRNA and protein t Gene t Expression t Protein When you measure expression affects what you find Does mRNA level correlate with protein level? 1000 1000 100 10 1 0.1 Glutathione-S-transferase in 60 human cell lines mRNA (Northern) mRNA (EST clones) 20 liver proteins and corresponding mRNAs R=0.48 0.1 1 10 Protein (2D gels) Anderson & Seilhamer Electrophoresis 1997 18:533-537 100 100 x 10 x x xx 1.0 R = 0.43 0.1 0.1 1.0 10 100 Protein (Affinity-HPLC) Anderson & Anderson Electrophoresis 1998 19:1853-1861 From Tew et al 1996 Lung Ovarian x CNS Leukemia Renal Melanoma Breast Challenges of proteins vs DNA DNA • Static • Can be amplified • Little complexity: Single component • Good solubility characteristics Protein • Very dynamic • Cannot be amplified • Very complex: post-translational modification • Variable solubility Identifying new protein complexes: Isolation of proteins using: Classical Purification +1D PAGE Tag Purification +1D PAGE Phenotypic Complexity of the Eukaryotic Proteome Domain Expansion Evolution Somatic Domain Accretion • Duplication • Divergence • Recombination Recombination Protein Architecture Paralogous Expansion Somatic Rearrangement Horizontal Transfer Protein Diversity Alternative Splicing Modifications Functional Diversity Protein Interactions de novo Biological Processes Systems Eukaryotic Proteomes Proteome Human Number of Genes % of DB Matches* Fly Worm 31,778 13,338 18,266 51 56 50 Yeast Mustard Weed 6,144 25,706 50 52 (* Similarity search of protein sequences in the database) Comparative Analysis of Proteomic Pheno-Complexity Functional Diversity Eubacteria Protein Diversity Eukarya Domain Accretion Archaea Unicelluar Organisms Invertebrates Conserved Core Proteins Vertebrates LineageSpecific Proteins Protein Architecture Mammals VertebrateSpecific Proteins Human Protein Sequence Homology (1) Protein Match with Known or Unknown Function Query Match (2) Domain Match with Known or Unknown Function Query Match Ortholog: A evolutionarily conserved gene that arose during speciation Paralogs: Genes that arose due to intra-genome duplication in a species Protein Sequence Comparison (I) Homology • > 40 % : Same Function • 25-40 % : Similar Function • < 25 % : Different Function (II) Distance • Phylogenetic Tree Comparative Proteomics Domain/Protein* Yeast 1 0 Worm 1 1 Fly 1 1 Weed 1 1 Human 1 1 0 1 1 0 1 Animal-specific 0 0 0 0 1 Vertebrate-specific Eukaryote-specific Metazoan-specific *: The domain/protein is present (1) or absent (0) in the proteome. Eukaryotic Proteomes Shared with Humans Human 61% 43% Fly Worm 46% Yeast Conserved Core Groups in Eukaryotes Human (3,109 Proteins) Conserved Fly Yeast Core Proteins in (1,445 Proteins) (1,441 Proteins) 1,308 Groups Worm (1,503 Proteins) Vertebrate-specific Proteins Unicelluar Organisms Invertebrates Eukaryote and Prokaryote 21% 32% Other Eukaryotes And Animals Vertebrates Human Mammals 22% Human VertebrateSpecific Proteins 24% Vertebrates and Other Animals Comparative Pheno-Complexity Functional Diversity Bacteria Protein Diversity Eukarya Domain Accretion Archeae Unicelluar Organisms Invertebrates Conserved Core Proteins Housekeeping Functions • Engery/Metabolism • DNA replication/Repair • Translation Vertebrates VertebrateSpecific Proteins Physiological Differences • Defense & Immunity • Cell-Cell Communications • Nervous System Protein Architecture Mammals Human LineageSpecific Proteins Protein Diversity in Eukaryotes • Horizontal Gene Transfer • Invention of Protein Domain • Expansion of Protein/Domain Families • Evolution of New Protein Architectures Lateral Gene Transfer Bacteria 223 Genes • Hydrolase • Oxidoreductase • Dehydrogenase • Monoamine Oxidase • Transporter Human • Lineage Specific • Intron Acquisition Comparative Pheno-Complexity Functional Diversity Bacteria Protein Diversity Eukarya Domain Accretion Archeae Unicelluar Organisms Invertebrates Conserved Core Proteins Housekeeping Functions • Engery/Metabolism • DNA replication/Repair • Translation Vertebrates VertebrateSpecific Proteins Physiological Differences • Defense & Immunity • Cell-Cell Communications • Nervous System Protein Architecture Mammals Human LineageSpecific Proteins Protein Function Assignment 12 Function Categories (Gene Ontology Project) 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. Cellular Processes Metabolism DNA Replication/Modification Transcription/Translation Intracellular Signaling Cell-Cell Communication Protein Folding/Degradation Transport Multifunctional Proteins Cytoskeletal/Structural Defense and Immunity Miscellaneous Function Classification of Proteome (1) Functional Categories (2) Evolutionary Conservation (3) Structural Classification Cellular Function Protein Sequence Functional Annotation Domain/Motif Databases PRINTS, Prosite, Pfam, Prosite Profile ~50% of Eukaryotes New Proteins and Domains in Vertebrates Bacteria Eukarya Archeae Unicelluar Organisms Yeast Invertebrates Worm Vertebrates Mammals Human Fly VertebrateSpecific Proteins 94 (7%)/1,262 InterPro Families 70 Proteins 24 Domains Functions Physiological Differences • Defense & Immunity • Cell-Cell Communications • Nervous System • Few new protein domains invented • Common ancestral domains in animals Protein Domain • An evolutionary unit • The coding sequence can be duplicated and/or recombined • ~100 to 250 residues • In small proteins or parts of large ones in a domain family • Descending from a common ancestor • Duplication: to give arise one or more domains • Divergence: to generate modified proteins by mutations or In/Del • Recombination: to produce new domain arrangements Protein Domain Architecture (1) Single-domain Protein (II) Multi-domain Protein Domain A B C D • Prokaryotic Proteome: 2/3 proteins are > 2 domains • Eukaryotic Proteome: 4/5 proteins are multi-domain Invention of Protein Domain Number of Proteins Domain Yeast Worm Fly Human 48 7 151 54 357 115 706 188 115 392 EGF-like 0 113 81 222 17 TIR 0 2 8 18 131 Immunoglobulin 0 64 140 765 0 CRAB box 0 0 0 0 0 0 171 1 0 0 C2H2 zinc finger Leu-rich repeats Q14 repeats Weed • Expansion of paralogous proteins in metazoan • Invention of new domains in eukaryotic genome evolution Domain Expansion: Duplication Number of Proteins(Domains) Domain Yeast Worm Fly Human Weed RasGAP 3 8 5 11 0 RhoGAP 9 20 19 59 8 ArfGAP 6 8 9 16 15 Ig 0 24 67(323) 65(68) 125(291) 72(78) 381(930) 193(212) 0 23 PH SH3 23(27) 46(61) 55(75) 143(182) 4 Ank 12(20) 75(223) 72(269) 145(404) 66(111) Domains are expandable in metazoan! Rosetta Stone Similarity Search of Protein Databases Function 1 Protein A Protein B Function 2 Protein X Functions 1 and 2 due to domain recombination Domain Accretion: Recombination Ancetral Domains in Different Proteins A B C D Combinatorial Architecture A C B D ? A C B D Superdomain: Domain recombination in sequential order Rho ArfGAP Ank Ank Ank X PH ArfGAP Ank Ank X PH ArfGAP Ank Ank Ank X PH ArfGAP Ank Ank Ank PBS SH3 Classification of Multi-domain ArfGAP Gene Family Class Rho ArfGAP Ank Ank Ank X PH ArfGAP Ank Ank X PH ArfGAP Ank Ank Ank X PH ArfGAP Ank Ank Ank Rho Ras-like GTPases X Domain X PH Plecstrin homology domain ArfGAP Zinc finger domain Ank Ankyrin repeat PBS SH3 Paxillin-binding subdomain PBS SH3 Src homology domain Expression of Variants in Multiple Human Tissues: KIAA1099.0 and .1 KIAA1099.0 C1 1 11 // 12 // 15 16 17 C6 KIAA1099.1 C1 AW993140 (159) // 11 12 // 15 16 17 1 2 3 4 5 6 7 8 9 LN Spleen Amygdala Brain S. Muscle Heart S. I. Stomach C6 Leukocytes 10 11 12 13 14 15 16 Placenta Testis Uterus Lung Kidney Liver KIAA1099.1 KIAA1099.0 M. Gland 1 Expression of Variants in Multiple Human Tissues: KIAA1099.2 and .3 KIAA1099.2 C1 BE780934 (395) 1 11 // 12 // 15 C5 KIAA1099.3 C1 AW993140 (159) 1 11 // BE780934 (395) 12 // 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Leukocytes LN Spleen Amygdala Brain S. Muscle Heart S. I. Stomach M. Gland Liver Kidney Lung Uterus Testis Placenta C5 KIAA1099.3 KIAA1099.2 Expressed Diversities of Functional Domains Class II ArfGAP: KIAA 1099 Transcription Alternatively Spliced Variant Transcripts Rho X PH ArfGAP Rho X PH ArfGAP Rho X PH ArfGAP Ank Ank Ank Ank • One alternatively spliced transcript lacks ankyrin repeats. • Other variants have an altered PH domain. Eukaryotic Protein Diversity (I) Genome Evolution (Germ-line) • Lateral Gene Transfer: Bacterial Genes • Domain Invention: Vertebrate-specific Proteins • New Architecture: Combinatorial Domain Accretion • Domain Expansion: Multiple Domains in a Protein • Paralogous Expansion: Gene Duplication (II) Gene Expression (Somatic) • Somatic Rearrangement: Ig & TCR Gene Families • Alternative Splicing: Protein Isoforms Alternative Splicing: Domain Ablation or Alteration Phenotypic Complexity of the Eukaryotic Proteome Domain Expansion Evolution Somatic Domain Accretion • Duplication • Divergence • Recombination Recombination Protein Architecture Paralogous Expansion Somatic Rearrangement Horizontal Transfer Protein Diversity Alternative Splicing Modifications • Domain ablation • Domain alteration Functional Diversity Protein Interactions de novo Biological Processes Systems Integrated Life Sciences in the Post-Genomic Era Genome Proteome Protein Diversity Functional Proteomics Gene Repertoire Protein Repertoire Functional Diversity Structural Proteomics Biological Processes Physiome Cellome Metabolome Patholome Systems Biology