Multiple Sequence Alignment

Download Report

Transcript Multiple Sequence Alignment

Home Work
I. Running Blast with BioPerl
Input: 1) Sequence or Acc.Num.
2) Threshold (E value cutoff)
Output:
1) Blast results – sequence names, alignment score, E-value.
2) Near each result provide a link that redirects to Pairwise Alignment
(from the previous exercise). The page for Pairwise Alignment should
be pre-filled with the two sequences (first - the original sequence,
second – the selected sequence from the Blast run).
* You should also submit data flow diagram with BioPerl class names.
Home Work (continued)
•Doc: bioperl tutorial section III.4.1 Running BLAST remotely (using
RemoteBlast.pm)
•Use sleep function
•Data-Flow diagram example for
retrieving sequence:
$gb = new Bio::DB::GenBank();
$seq = $gb->get_Seq_by_acc('AF303112');
print $seq1->seq();
GenBank
get_Seq_by_acc('AF303112');
Seq
$seq1->seq();
string
Home Work (continued)
II. Translate PROSITE pattern into Perl
regular expression.
Profile Analysis
M. Gribskov, D. Eisenberg.
Profile Analysis - detection of distantly
related proteins by sequence comparison.
The information is expressed in a positionspecific scoring table (profile).
Profiles
Seq1->
Seq2->
Seq3->
Seq4->
Profile alignment
•Sequence – Profile Alignment.
•Profile – Profile Alignment.
Dynamic Programming.
(the same idea as in Pairwise Sequence
Alignment)
reminder:
Pairwise Sequence
Alignment
Sequence-Profile alignment:
S(x,j) – aligning ‘x’ with column ‘j’
S(x,j)= Σy σ(x,y) p(y,j)/p(y)
The position-specific gap
coefficients penalize gaps
in conserved regions more
heavily than gaps in more
variable regions
σ(x,y) – any regular score for Pairwise Alignment
(PAM-k, BLOSUM-k …)
p(y,j) – frequency that character y appears in mult.
align. column ‘j’
p(y) – frequency that character y appears anywhere
in all sequences from mult.align.
Profiles in GCG
PileUp creates a multiple sequence alignment from a group of
related sequences.
ProfileMake makes a profile from a multiple sequence alignment.
ProfileSearch uses the profile to search a database for sequences
with similarity to the group of aligned sequences.
ProfileSegments displays optimal alignments between each
sequence in the ProfileSearch output list and the group of aligned
sequences (represented by the profile consensus).
ProfileGap makes optimal alignments between one or more
sequences and a group of aligned sequences represented as a
profile.
ProfileScan uses a database of profiles to find structural and
sequence motifs in protein sequences.
Iterative profile pairwise alignment
1. Align some pair.
2. While (not done)
(a)Pick an unaligned string which is ”near” some
aligned one(s).
(b)Align with the profile of the previously aligned
group.
Resulting new spaces are inserted into all
strings in the group.
Progressive Profile Alignment
ClustalW (algorithm of Thompson, Higgins, Gibson 1994)
(the idea is close to Feng-Doolittle 1987, implemented in PileUp,
GCG package)
1. Calculate the
pairwise alignment scores, and
convert them to distances.
2. Use a neighbor-joining algorithm to build a tree from
the distances.
3. Align sequence - sequence, sequence - profile, profile
- profile in decreasing similarity order.
Alignment tree built by ClustalW