Statistics of small peptides

Download Report

Transcript Statistics of small peptides

Statistics of small peptides
This tour guides you through a
computational experiment that you can
perform within BioBIKE.
To get to BioBIKE, go to:
http://ixion.csbc.vcu.edu:8003/biologin
Enter a login name (letters only, no spaces)
No password necessary
This demonstration is best viewed as a slide show,
enabling you to simulate a session and make
changes
in cursor
more
Click
anywhere
to position
go on to
theobvious.
next slide
To do this, click Slide Show on the top tool bar, then View show.
Statistics of small peptides
How many types of peptides
are there of each size class?
How many peptides are there with a single
amino acid? In other words, how many ways
can you fill the box below with a different
amino acid?
Amino acid goes here
(how many different amino acids are there?)
Statistics of small peptides
How many types of peptides
are there of each size class?
How about peptides with two amino acids?
How many ways can you fill the boxes
below with a different amino acids?
Amino acids go here
(If you don't see the answer, then simplify
the problem and count by hand)
To verify your answer in BioBIKE…
(though you should be so certain of
your answer that if BioBIKE were to
disagree, you'd think that BioBIKE is
wrong, not you!)
Strategy: Generate all possible
proteins of a given length, then count
them.
To verify your answer in BioBIKE…
(though you should be so certain of
your answer that if BioBIKE were to
disagree, you'd think that BioBIKE is
wrong, not you!)
Strategy: Generate all possible
proteins of a given length, then count
them.
That gives you all the peptide
sequences of length 1.
Is the list correct?
How many are there?
With this list you can count by hand,
but later this won't be possible. To
automate the process, wrap the
function in COUNT-OF.
That gives you the number of all the
peptide sequences of length 1.
Now for something more interesting.
Change the length from 1 to 5
(remembering to close the entry by
pressing Enter).
Whoops! A problem. BioBIKE is
attempting to save you from doing
something potentially stupid by
accident. You could easily use this
command to ask for more sequences
than there are electrons in the
universe.
But read the advice carefully and note
that there is a way out.
You can go on from there on your
own.
Statistics of small peptides
Identification of a protein
from a peptide sequence
If you were given a peptide sequence, say
"QWER" (glutamine-tryptophan-glutamatearginine), is this enough information to
identify the protein it came from?
This is sort of like a variation on the birthday
problem: How likely is it that someone in the
room has the same birthday as you do?
It depends on how many people there are in
the room and how many birthdays there are
to choose from.
With 365 people in the room, what would be
your chances? (ignore leap years)
Statistics of small peptides
Identification of a protein
from a peptide sequence
Even without doing the calculation, you can
see that only if the number of birthdays is
much greater than the number of people do
you stand a good chance of having a unique
birthday.
So how many possible peptides (analogous
to birthdays) are there? You did this already.
And how many 4-aa peptides are in the
proteins of, say, ss120 (analogous to the
number of people in the room)?
Simplify: How many 4-aa peptides are there
in a single protein? Suppose the protein has
100 amino acids.
Statistics of small peptides
Identification of a protein
from a peptide sequence
Imagine that protein, with 100 amino acids:
aa1- aa2- aa3- aa4- aa5- aa6- …aa95- aa96- aa97- aa98- aa99- aa100
How many 4-aa sequences are there in this
protein?
You might want to simplify. Suppose the
protein were only 4 amino acids in length.
How many would there be? Suppose it were
5 amino acids in length? 10? What's the
rule? If I tell you the length of the protein,
can you tell me the number of 4-aa peptides?
Statistics of small peptides
Identification of a protein
from a peptide sequence
Now imagine that there are many 100's of
proteins in an organism (say ss120), with
different lengths. What do you need to know
to calculate the total number of 4-aa
sequences in the proteins of ss120?
You can get all the information you need in
BioBIKE using the functions illustrated on
the following slides.
Assembling these functions should get you
the number of 4-amino acid peptides there
are in ss120 proteins.
How does this number compare with the
number of possible 4-amino acid peptide
sequences you calculated earlier?
Statistics of small peptides
How much overlap is there in the
molecular weights of different peptides?
There several problems in attempting to
identify a protein from a single small
peptide. Let's examine one of them.
Mass spectrometry directly gives you not the
sequence of a peptide but rather its
molecular weight. If every peptide has a
different molecular weight, then one can go
directly from molecular weight to sequence.
Is this the case?
Consider the set of 3-amino acid peptides as
an example.
Statistics of small peptides
How much overlap is there in the
molecular weights of different peptides?
Strategy:
- Calculate the molecular weights of all
3-amino-acid peptides
- Bin (count) each size class
- Write the results to a file
- Download the file
- Upload the file into Excel
- Make a histogram of the results
You'll want to consider the BioBIKE
functions on the following slides.
MW-OF (from the GENES-PROTEIN
menu; Translation submenu)
Use it to get the molecular weights of all
protein sequences of length 3.
Use the SEQUENCE option so that the
function knows enough to interpret a
sequence like "PHE" as "proline-histidineglutamate", using the one-letter code, rather
than the abbreviation of phenylalanine, using
the three-letter code.
BIN-DATA-OF (you used this in the
previous tour)
Use it to count the instances of each
molecular weight.
The interval should be set to 1 so each size
class is counted individually.
The max should be set to the biggest
molecular weight a 3-amino acid peptide can
have. That would be 3 times the molecular
weight of the biggest amino acid. What's
that?
WRITE (you used this in the previous tour)
Use it to write the counts of the binned
molecular weights, i.e. the previous result.
(PREVIOUS-RESULT from the OTHERFUNCTIONS menu may be of use here)
Make up any file name you want, so long as
you put it in quotes.
Select TAB-DELIMITED from the Options
menu, since the file will be uploaded into
Excel.
Statistics of small peptides
How much overlap is there in the
molecular weights of different peptides?
You should now be in a position to create a
histogram within Excel. If you do, you'll see
something remarkable, like…
Statistics of small peptides
How much overlap is there in the
molecular weights of different peptides?
Instances
molecular
weights
amongst 3-aa peptides
Instances of molecular
weightsofamongst
3-aa
peptides
30
25
25
20
20
Instances
Instances
30
15
part of the histogram,
blown up to show detail
15
10
10
5
5
0
0
1511
1
51
101
201
51
251
101
301
151
351
201
Molecular Weight
This is peculiar…
401
251
451
301
501
351
Molecular Weight
551
401
451
501
551
Statistics of small peptides
How much overlap is there in the
molecular weights of different peptides?
25
25
20
20
Instances
Instances
Instances of molecular Instances
weights amongst
of molecular
3-aa peptides
weights amongst 3-aa peptides
15
10
5
part of the histogram,
blown up to show detail
15
10
5
0
1
51
101
0
151 1 201 51 251 101301 151351 201401 251451 301501 351551 401
Molecular Weight
451
Molecular Weight
This is peculiar… The numbers of instances
each molecular weight class appears to skip
by a discrete unit. Why is that?
Let's examine the peptides and their
molecular weights more closely.
501
551
Repeat the molecular weight calculation, but
this time labeling the result (you'll see what
labeling does in a moment)
Execute the resulting function.
Note that each molecular weight now comes
with the peptide that is associated with it.
To compare this result with the histogram,
we need to sort the result by molecular
weight.
Surround the MW-OF function and wrap
SORT around it.
We want to sort by the molecular weight (the
second position), not the peptide (the first
position).
Execute the function and compare the results
closely with your histogram in Excel. What
accounts for the numbers?
Why are molecular weights with only one
peptide so rare? How many are there?
Statistics of small peptides
In this tour, you've seen:
- How to determine the number of peptides in each
size class.
- Problems related to the identification of proteins
from their peptides.
- The degeneracy of molecular weights in peptides.
- Some causes of this degeneracy.