Transcript Title
Protein Analysis Tools
2nd April, 2012
Ansuman Chattopadhyay, PhD,
Head Molecular Biology Information Service
Health Sciences Library System
University of Pittsburgh
[email protected]
http://www.hsls.pitt.edu/guides/genetics
What we’ll do:
Brief overview of CLC Main Workbench
find genomic context of a protein sequence
search for the presence of conserved
domains
create a multiple sequence alignment plot
What we’ll do:
analyze primary structure such as, hydrophobicity,
hydrophylicity, antigenicity, repeat sequence detection
etc.
predict secondary structure
predict post translational modification such as,
Phosphorylation, glycosylation, ….
search for interacting partners
predict domain driven protein-protein interactions
Workshop Resources
http://www.hsls.pitt.edu/molbio/tutorials
HSLS MolBio Videos
Sequence Analysis Software Suits
Wisconsin GCG
VectorNTI
DNA STAR-LaserGene
Geneious
CLC
Main
Why CLC Main ?
Windows
Mac
Linux
DNA, RNA, Protein,
Microarray Data Analysis
Regular Update
HSLS Licensed
CLC Main Access
HSLS CLC Main Registration
Link: http://www.hsls.pitt.edu/molbio/clcmain
Access via Pitt - Network Connect
Instruction video: http://goo.gl/JNjMt
CLC Main Workbench Overview
Graphical Users Interface
Protein sequences Import
Sequence Navigation
CLC Main Graphical User Interface
(GUI)
CLC Main
Navigate a protein
sequence
Videos
CLC Main –getting started (basic navigation
steps): http://media.hsls.pitt.edu/media/molbi
ovideos/clc-navigation-ac0312.swf
CLC Main Workbench Walkthrough (Part1):
http://media.hsls.pitt.edu/media/molbiovideos/
clcmain-walkthrough-part1-ac0112.swf
CLC Main Workbench Walkthrough (Part2):
http://media.hsls.pitt.edu/media/molbiovideos/
clcmain-walkthrough-part2-ac0112.swf
Import a Protein
Sequence
Protein Sequence
Human PLCg1
Refseq no: NP_002651
Uniprot Accession Number: P19174
FASTA file
Raw sequence
CLC features:
Search, Import, Create new sequence
Videos
Import a DNA /Protein sequence into CLC
Main
(Part1):http://media.hsls.pitt.edu/media/molbi
ovideos/clc-import-part1-ac0112.swf
Import a DNA /Protein sequence into CLC
Main (Part
2):http://media.hsls.pitt.edu/media/molbiovide
os/clc-import-part2-ac0112.swf
CLC protein sequence
Protein sequence manipulation
Create a new protein with PLCg1 SH2-SH2SH3 domains
Sequence Alignment
Pair-wise Alignment
Global
Local
Multiple Sequence Alignment
Sequence Alignment
Pair-wise Sequence Alignment
Multiple Sequence Alignment
Multiple Sequence Alignment
Tools: ClustalW and T-coffee
PLCg1 Orthologous sequences
PLCg1:
Mouse:
Rat:
Cow:
Dog:
Zebra fish:
NP_067255
NP_037319
NP_776850
XP_542998
NP_919388
Human:
NP_002651
NP_067255,NP_037319,NP_776850,XP_542998,NP_919388,NP_002651
Videos
Create a multiple sequence alignment plot
using CLC(part1):
http://media.hsls.pitt.edu/media/molbiovideos/msf-clcmain-ac0212
part1.swf
Create a multiple sequence alignment plot
using CLC (part2):
http://media.hsls.pitt.edu/media/molbiovideos/msf-clcmain-ac0212part2.swf
Create a multiple sequence alignment plot:
http://media.hsls.pitt.edu/media/clres2705/msa.swf
Compare two peptide sequences.:
http://media.hsls.pitt.edu/media/clres2705/blast2.swf
Starting with a short peptide sequence find:
the whole protein sequence
orthologs in other species (nematode)
Tool:
UCSC BLAT
NCBI BLAST against SwissProt
Peptide to whole protein
Peptide seq: SPEGCWGPEPRDCVSCRNVSRGRECVDKCNLLEGEPR
Videos
Place a mRNA or peptide sequence into
the human genome (BLAT):
http://www.hsls.pitt.edu/molbio/videos/play?v=12e
Find homologous sequences:
http://media.hsls.pitt.edu/media/clres2705/blast.swf
Find homologous sequence
SPEGCWGPEPRDCVSCRNVSRGRECVDKCNLLEGEPR
Sequence Manipulation & Format Conversion
Sequence Manipulation Suite
http://bioinformatics.org/sms2/
Readseq
http://thr.cit.nih.gov/molbio/readseq/
GenePept
FASTA
Hands-On
Retrieve amino acid sequence present
between position 25 to 45 in Sequence A
(MS Word Doc)
Identify the rat gene which encodes this peptide
fragment and retrieve its whole protein sequence
Find the fruit fly homolog of this protein.
What % identity the fruit fly protein shares with its rat
homolog?
Predict potential MAPK phosphorylation sites present in
the fruit fly protein
Protein Domain Search: InterPro Scan
InterPro is a database of protein families, domains,
regions, repeats and sites in which identifiable features
found in known proteins can be applied to new protein
sequences.
>gi|72198189|ref|NP_000624.2| B-cell
lymphoma protein 2 alpha isoform
MAHAGRTGYDNREIVMKYIHYKLSQRG
YEWDAGDVGAAPPGAAPAPGIFSSQPG
HTPHPAASRDPVARTSPLQTPAAPGAAA
GPALSPVPPVVHLTLRQAGDDFSRRYRR
DFAEMSSQLHLTPFTARGRFATVVEELF
RD
GVNWGRIVAFFEFGGVMCVESVNREMS
PLVDNIALWMTEYLNRHLHTWIQDNGG
WDAFVELYGPSMRPLFDFSWLSLKTLLS
LALVGACITLGAYLGHK
Videos:
Find protein domains, PTM, secondary str etc:
http://media.hsls.pitt.edu/media/clres2705/unipro
t.swf
Start with a protein pattern and find what
proteins posses that domain:
http://media.hsls.pitt.edu/media/clres2705/scanp
rosite.swf
Search for protein domains,repeats and sites:
http://media.hsls.pitt.edu/media/clres2705/interpr
o.swf
Protein Domain Search: ScanProsite
>gi|72198189|ref|NP_000624.2| B-cell
lymphoma protein 2 alpha isoform
MAHAGRTGYDNREIVMKYIHYKLSQRG
YEWDAGDVGAAPPGAAPAPGIFSSQPG
HTPHPAASRDPVARTSPLQTPAAPGAAA
GPALSPVPPVVHLTLRQAGDDFSRRYRR
DFAEMSSQLHLTPFTARGRFATVVEELF
RD
GVNWGRIVAFFEFGGVMCVESVNREMS
PLVDNIALWMTEYLNRHLHTWIQDNGG
WDAFVELYGPSMRPLFDFSWLSLKTLLS
LALVGACITLGAYLGHK
Pattern Search
[AC]-x-V-x(4)-{ED}:
This pattern is translated as: [Ala or Cys]-any-Valany-any-any-any-{any but Glu or Asp}
F-[GSTV]-P-R-L-[G>]
Pattern Search
Protein Primary Structure Analysis
Tool: ExPASy from SIB
Calculated Mol Wt
Theoritical PI
Extinction coefficients
Estimated half-life
Hydropathicity plot : Kyte & Doolittle
Hydrophilicity plot: Hopp T.P., Woods K.R
Antigenic Site Prediction
Tool: Emboss Antigenic
>gi|72198189|ref|NP_000624.2| B-cell
lymphoma protein 2 alpha isoform
MAHAGRTGYDNREIVMKYIHYKLSQRG
YEWDAGDVGAAPPGAAPAPGIFSSQPG
HTPHPAASRDPVARTSPLQTPAAPGAAA
GPALSPVPPVVHLTLRQAGDDFSRRYRR
DFAEMSSQLHLTPFTARGRFATVVEELF
RD
GVNWGRIVAFFEFGGVMCVESVNREMS
PLVDNIALWMTEYLNRHLHTWIQDNGG
WDAFVELYGPSMRPLFDFSWLSLKTLLS
LALVGACITLGAYLGHK
EmBoss Antigenic
Antigenic predicts potentially antigenic regions of a protein sequence, using
the method of Kolaskar and Tongaonkar.Analysis of data from
experimentally determined antigenic sites on proteins
has
revealed that the hydrophobic residues Cys, Leu
and Val, if they occur on the surface of a protein,
are more likely to be a part of antigenic sites. A
semi-empirical method which makes use of physicochemical properties of
amino acid residues and their frequencies of occurrence in experimentally
known segmental epitopes was developed by Kolaskar and Tongaonkar to
predict antigenic determinants on proteins. Application of this method to a
large number of proteins has shown that their method can predict antigenic
determinants with about 75% accuracy which is better than most of the
known methods. This method is based on a single parameter and thus very simple
to use.
Transmembrane Region prediction
Transmembrane Site Prediction
Tool: TMHMM Server
>gi|72198189|ref|NP_000624.2| B-cell
lymphoma protein 2 alpha isoform
MAHAGRTGYDNREIVMKYIHYKLSQRG
YEWDAGDVGAAPPGAAPAPGIFSSQPG
HTPHPAASRDPVARTSPLQTPAAPGAAA
GPALSPVPPVVHLTLRQAGDDFSRRYRR
DFAEMSSQLHLTPFTARGRFATVVEELF
RD
GVNWGRIVAFFEFGGVMCVESVNREMS
PLVDNIALWMTEYLNRHLHTWIQDNGG
WDAFVELYGPSMRPLFDFSWLSLKTLLS
LALVGACITLGAYLGHK
Protein Secondary Structure
>gi|72198189|ref|NP_000624.2| B-cell
lymphoma protein 2 alpha isoform
MAHAGRTGYDNREIVMKYIHYKLSQRG
YEWDAGDVGAAPPGAAPAPGIFSSQPG
HTPHPAASRDPVARTSPLQTPAAPGAAA
GPALSPVPPVVHLTLRQAGDDFSRRYRR
DFAEMSSQLHLTPFTARGRFATVVEELF
RD
GVNWGRIVAFFEFGGVMCVESVNREMS
PLVDNIALWMTEYLNRHLHTWIQDNGG
WDAFVELYGPSMRPLFDFSWLSLKTLLS
LALVGACITLGAYLGHK
Protein-Protein Interactions Prediction
Tool: STRING
>gi|72198189|ref|NP_000624.2| B-cell
lymphoma protein 2 alpha isoform
MAHAGRTGYDNREIVMKYIHYKLSQRG
YEWDAGDVGAAPPGAAPAPGIFSSQPG
HTPHPAASRDPVARTSPLQTPAAPGAAA
GPALSPVPPVVHLTLRQAGDDFSRRYRR
DFAEMSSQLHLTPFTARGRFATVVEELF
RD
GVNWGRIVAFFEFGGVMCVESVNREMS
PLVDNIALWMTEYLNRHLHTWIQDNGG
WDAFVELYGPSMRPLFDFSWLSLKTLLS
LALVGACITLGAYLGHK
Hands-on
Take the human BCL2 protein sequence and
Find its domain architecture
Predict the topology of its transmembrane region
Design suitable antigenic site for antibody generation
What is its calculated Mol Wt and Ext Coefficient?
Predict its secondary structure
What % of this protein possesses alpha helical structure?
Predict its potential interacting partners
Hands-on
Prediction of potential phosphorylation sites
present in a protein sequence.
Sequence: human BCL2
>gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform
MAHAGRTGYDNREIVMKYIHYKLSQRGYEWDAGDVGAAPPGAAPAPGIF
SSQPGHTPHPAASRDPVARTSPLQTPAAPGAAAGPALSPVPPVVHLTLR
QAGDDFSRRYRRDFAEMSSQLHLTPFTARGRFATVVEELFRD
GVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRHLHTWI
QDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHK
Phosphorylation Site Prediction:
Tool: NetPhos
>gi|72198189|ref|NP_000624.2| B-cell
lymphoma protein 2 alpha isoform
MAHAGRTGYDNREIVMKYIHYKLSQRG
YEWDAGDVGAAPPGAAPAPGIFSSQPG
HTPHPAASRDPVARTSPLQTPAAPGAAA
GPALSPVPPVVHLTLRQAGDDFSRRYRR
DFAEMSSQLHLTPFTARGRFATVVEELF
RD
GVNWGRIVAFFEFGGVMCVESVNREMS
PLVDNIALWMTEYLNRHLHTWIQDNGG
WDAFVELYGPSMRPLFDFSWLSLKTLLS
LALVGACITLGAYLGHK
Phosphorylation Site Prediction:
Tool: GPS
>gi|72198189|ref|NP_000624.2| B-cell
lymphoma protein 2 alpha isoform
MAHAGRTGYDNREIVMKYIHYKLSQRG
YEWDAGDVGAAPPGAAPAPGIFSSQPG
HTPHPAASRDPVARTSPLQTPAAPGAAA
GPALSPVPPVVHLTLRQAGDDFSRRYRR
DFAEMSSQLHLTPFTARGRFATVVEELF
RD
GVNWGRIVAFFEFGGVMCVESVNREMS
PLVDNIALWMTEYLNRHLHTWIQDNGG
WDAFVELYGPSMRPLFDFSWLSLKTLLS
LALVGACITLGAYLGHK
Thank you!
Any questions?
Carrie Iwema
[email protected]
412-383-6887
Ansuman Chattopadhyay
[email protected]
412-648-1297
http://www.hsls.pitt.edu/guides/genetics