Transcript Document
Basics of protein structure and modeling Rui Alves Proteins are the primary functional manifestation of genomes DNA sequence transcription atgcaaactctttctgaacgcctcaagaagaggcgaattgcgttaaaaatgacgcaaaccgaa ctggcaaccaaagccggtgttaaacagcaatcaattcaactgattgaagctggagtaaccaa gcgaccgcgcttcttgtttgagattgctatggcgcttaactgtgatccggtttggttacagtacgg aactaaacgcggtaaagccgcttaa Being able to predict the protein sequence augcaaacucuuucugaacgccucaagaagaggcgaauugcguuaaaaaugacgcaaacc gaacuggcaaccaaagccgguguuaaacagcaaucaauucaacugauugaagcuggagua RNA sequence from the gene sequence allows us to accaagcgaccgcgcuucuuguuugagauugcuauggcgcuuaacugugauccgguuug guuacaguacggaacuaaacgcgguaaagccgcuuaa translation predict structure, which in turn helps us MQTLSERLKKRRIALKMTQTELATKAGVKQQSIQLIEAGVT protein understand KRPRFLFEIAMALNCDPVWLQYGTKRGKAA how the protein does what it sequence does protein structure Protein function Outline • DNA sequence to protein sequence • From protein sequence to secondary structure • Protein tertiary structure • Predicting protein structure Predicting protein sequence from DNA sequence • Protein sequence can be predicted by translating the cDNA and using the genetic code. Translating cDNA to protein ATGTCTCTTATATGA… No Gene!!!!! MetSer Leu Ile Ter Translating cDNA to Protein Translating yeast mitochondrial cDNA into protein sequence ATGTCTCTTATATGA………SECIS sequence There is a Gene with a considerably different MetSer Thr Met sCys protein sequence from the one we would predict Leu fromIle the Ter universal genetic code!!!!! MetSer Outline • DNA sequence to protein sequence • From protein sequence to secondary structure • Protein tertiary structure • Predicting protein structure Amino acids are the primary building blocks of proteins • • • • The sequence of AAs is the primary structure of proteins Sequence determines structure Amino acids don’t fall neatly into classes How we casually speak of them can affect the way we think about their behavior. For example, if you think of Cys as a polar residue, you might be surprised to find it in the hydrophobic core of a protein unpaired to any other polar group. But this does happen. • The properties of a residue type can also vary with conditions/environment Grouping the amino acids by properties Livingstone & Barton, CABIOS, 9, 745-756, 1993. Proteins are made by controlled polymerization of amino acids wa te r is e limina te d O two a mino a cids conde ns e to form... H2 N CH O C OH H2 N CH R1 OH R2 N or a mino te rminus H2 N ...a dipe ptide . If the re a re more it be come s a polype ptide . S hort polype ptide cha ins a re us ua lly ca lle d pe ptide s while longe r one s a re ca lle d prote ins . C O CH C R1 O NH CH C R2 pe ptide bond is forme d re s idue 1 re s idue 2 C or ca rboxy te rminus OH + HOH Repeating torsion angles f/y angles characterize the secondary structure Secondary structure elements in proteins Reflect the tendency of backbone to hydrogen bond with itself in a semi-ordered fashion when compacted beta-strand (nonlocal interactions) A secondary structure element is a contiguous region of a protein sequence characterized by a repeating pattern of main-chain hydrogen bonds and backbone phi/psi angles alpha-helix (local interactions) Principal types of secondary structure found in proteins Repeating (f,y) values f a-helix (15) y -63o -42o 310 helix (14) -57o -30o Parallel b-sheet -119o +113o Antiparallel b-sheet -139o +135o (right-handed) The alpha-helix: repeating i,i+4 h-bonds 11 180 120 10 12 9 right-handed helical region of phi-psi space 60 8 0 7 -60 5 hydrogen 6 a-helix (15) 3 4 f -120 bond (right-handed) 2 -63o -180 1 y -180 -120 -42o -60 0 60 120 By DSSP definitions, which of residues 1-12 are in the helix? Does this coincide with the residues in the helical region of phi-psi space? b strands/sheets beta-strand region of phipsi space 180 57 56 Parallel b-sheet 120 f 60 -119o 0 54 y +113o -60 53 -120 52 -180 -180 51 50 49 -120 -60 0 60 120 180 Is this a parallel or anti-parallel sheet? By DSSP definitions, which of res 49-57 are in the sheet? Does this coincide with the residues in the beta-strand region of phi-psi space? Contact maps of protein structures -both axes are the sequence of the protein map of Ca-Ca distances < 6 Å near diagonal: local contacts in the sequence off-diagonal: long-range (nonlocal) contacts rainbow ribbon diagram blue to red: N to C 1avg--structure of triabin What does secondary structure teach • If, from the primary structure one can predict secondary structure, then this may help in predicting protein function, via evolutionary relationships with known folds Outline • DNA sequence to protein sequence • From protein sequence to secondary structure • Protein tertiary structure • Predicting protein structure Tertiary structure in proteins • Single polypeptide chain • The number and order of secondary structures in the sequence (connectivity) and their arrangement in space defines a protein’s fold or topology • Pattern of contacts between side chains/backbone also an aspect of tertiary structure • Outer surface and interior Obvious interactions in native protein structures hydrophobic interactions R R1 2 NH disulfide crosslinks S R3 O S NH3 CO2 polar interactions (hydrogen bond/salt bridge) The protein databank The protein databank is a central repository of protein structures http://www.rcsb.org/pdb/home/home.do Major structure classification systems SCOP (Structural Classification of Proteins) CATH (Class-Architecture-Topology-Homology) DALI/FSSP (Fold classification based on StructureStructure Alignment) SCOP and CATH are quite similar and generally combine automated and manual aspects. They are both “curated” by human experts. Outline • DNA sequence to protein sequence • From protein sequence to secondary structure • Protein tertiary structure • Predicting protein structure The knuts and bolts behind fold predition a-helix coil Database Training Test set of set of of known known structures b-strand ACDEFGTYAEE… … Database Training Test set of set of of corresponding sequences Predict 2ary structure Compare Good Bad Predictions: Predictions: ndary repeat Reshuffle Method ready training for new set and sequence test set2and structure until p(aa1-coil) predictions prediction are correct p(aa1-helix) p(aa1-strand) p(a-helix) p(coil) p(b-strand) … p(a-helix) p(coil) p(b-strand) A A 0.23 A…C… A…C.. A…C… 0.1…0.03 0.04…0.002 0.1…0.21 0.28 0.5 How does a fold prediction server work? Fold Prediction … Database of known structures Database of corresponding sequences Database of probabilities of aa in 2ndary structure YOUR SEQUENCE Homology based helix coil-strand profile folds database Server Strong Homology Helix-coil-strand profile prediction Weak/No Homology Fold Prediction … Predicting protein folding Predicting protein structure • Homology Modeling – 3D-JIGSAW, SWISSMODEL • Ab initio Modeling – ROBETTA Predicting protein structure by homology How does a homology modeling server work? …YDVRSEQVENCE… … Optimization via energy minimization, etc… Server/ Program … Database of corresponding sequences Strong Homologues Database of Thread sequence to known predict over known structures structure according to alignment Best possible alignment …YDVR-SEQVENCE… …YDVRMSD-VDNCD… … (Sequence+ …YDVR-SEQVENCE… …YDVRMSD-VDNCD… Structure) Predicting protein structure • Homology Modeling – 3D-JIGSAW,SWISSMODEL • Ab initio Modeling – ROSETTA Predicting protein structure by ab initio methods …YDVRSEQVENCE… Database of structures for smaller amino acid runs Server/ Program Database of corresponding sequences …VENCE… …YDNCD… …VENCE… …VEQCE… NO Homologues … …YDVR-SEQ …YDVRMSD-… …YDVR-SEQ …YPVRMSD-… … … … Assemble Energy minimization & optimization Accuracy of modelling • Accuracy is widely varying. • The quality of the model is VERY dependent on the quality of the alignment • Globular proteins are more accurately predicted • Membrane proteins are still a big problem • Homology modelling is “bad” if Homology<30% • CASP is a bienial meeting where accuracy of the different methods is predicted – Baker group is usually and consistently more accurate than others http://www.predictioncenter.org/ Summary • DNA sequence to protein sequence • From protein sequence to secondary structure • Protein tertiary structure • Predicting protein structure “Accessible Surface” represent atoms as spheres w/appropriate radii and eliminate overlapping parts... mathematically roll a sphere all around that surface... the sphere’s center traces out a surface as it rolls... Lee & Richards, 1971 Shrake & Rupley, 1973 The outer surface: water in protein structures Structures of water-soluble proteins determined at reasonably high resolution will be decorated on their outer surfaces with water molecules (cyan balls) with relatively welldefined positions, and waters may also occur internally Water is not just surrounding the protein--it is interacting with it Water interacts with protein surfaces most waters visible in structures make hydrogen bonds to each other and/or to the protein, as donor/acceptor/both second shell water: only contacts other waters first shell waters: in contact with/ hydrogen bound to protein Side chain conformation • side chains differ in their number of degrees of conformational freedom (some don’t have any, such as Ala and Gly) •but side chains of very different size can have the same number of c angles. Supersecondary structures/structural motifs • just as there are certain secondary structure elements that are common, there are also particular arrangements of multiple secondary structure elements that are common • supersecondary structures emphasize issue of topology in protein structure b-a-b motif greek key motif Topology: differences in connectivity • example: a four-stranded antiparallel b sheet can have many different topologies based on the order in which the four b strands are connected: “up-and-down” “greek key” Topology: differences in handedness • • example: An extremely common supersecondary structure in proteins is the beta-alpha-beta motif, in which two adjacent beta-strands are arranged in parallel and are separated in the sequence by a helix which packs against them. if the two parallel strands are oriented to face toward you, the helix can be either above or below the plane of the strands. huge preference for right-handed arrangement in proteins DIY: The sequence DIY: The server DIY: The reply DIY: fine tuning DIY: That is it! The CATH Hierarchy 1. Divide PDB structure entries into domains (using domain recognition algorithms--domain is the fundamental unit of structure classification 2. Classify each domain according to a five level hierarchy: Class Architecture Topology Homologous Superfamily Sequence Family There is no purely phyletic system of protein classification! (also unlikely that there is any common ancestor to all proteins) the top 3 levels of the hierarchy are purely phenetic--based on characteristics of the structure, not on evolutionary relationships the bottom two levels include some phyletic classification as well-groupings according to putative common ancestry based on structural similarity, functional similarity, and sequence similarity SCOP: A different (but similar) taxonomy system Correspondences between SCOP and CATH hierarchies: SCOP class fold superfamily family domain CATH class architecture topology homologous superfamily sequence family domain CATH more directed toward structural classification, whereas SCOP pays more attention to evolutionary relationships. Both have in common that they have manual aspects and are curated by experts. Internal interactions in a protein Amino acids: the building blocks of proteins a lpha ca rbon O H2 N CH C O OH H3 N R a mino group CH C O R ca rboxylic a cid group The zwitte rionic form is the pre domina nt form a t ne utra l pH s ide cha in O C H3 N O C H R The a lpha ca rbon is a chira l ce nte r--na tura l prote ins a re ma de of L a mino a cids (s hown a bove ) a s oppos e d to D