Longest Common Subsequence (LCS) Dr. Nancy Warter-Perez Functions Function definition def adder(a, b, c): return a+b+c Function calls adder(1, 2, 3) -> 6 Introduction to Python –
Download ReportTranscript Longest Common Subsequence (LCS) Dr. Nancy Warter-Perez Functions Function definition def adder(a, b, c): return a+b+c Function calls adder(1, 2, 3) -> 6 Introduction to Python –
Longest Common Subsequence (LCS)
Dr. Nancy Warter-Perez
Functions
Function definition def adder(a, b, c): return Function calls adder(1, 2, 3) -> 6 a+b+c Introduction to Python – Part III 2
Functions – Polymorphism
>>>def fn2(c): … a = c * 3 … >>> print fn2(5) 15 >>> print fn2(1.5) 4.5
>>> print fn2([1,2,3]) [1,2,3,1,2,3,1,2,3] >>> print fn2("Hi") HiHiHi return a Introduction to Python – Part III 3
Functions - Recursion
def fn_Rec(x): if x == []: return fn_Rec(x[1:]) print x[0], y = [1,2,3,4] fn_Rec(y) >>> 4 3 2 1 Introduction to Python – Part III 4
Scopes
Scopes divine the “visibility” of a variable Variables defined outside of a function are visible to all of the functions within a module (file) Variables defined within a function are local to that function To make a variable that is defined within a function global, use the global keyword Ex 1: x = 5 def fnc(): x = 2 print x, fnc() print x >>> 2 5 Developing Pairwise Sequence Alignment Algorithms Ex 2: x = 5 def fnc(): global x x = 2 print x, fnc() print x >>> 2 2 5
Modules
Why use?
Code reuse System namespace partitioning (avoid name clashes) Implementing shared services or data How to structure a Program One top-level file Main control flow of program Zero or more supplemental files known as modules Libraries of tools Developing Pairwise Sequence Alignment Algorithms 6
Modules - Import
Import – used to gain access to tools in modules Ex: contents of file b.py
def spam(text): print text, 'spam' contents of file a.py
import b b.spam('gumby') Developing Pairwise Sequence Alignment Algorithms 7
Dynamic Programming
Applied to optimization problems Useful when Problem can be recursively divided into sub-problems Sub-problems are not independent Examples Needleman-Wunch Global Alignment Algorithm Smith-Waterman Local Alignment Algorithm Developing Pairwise Sequence Alignment Algorithms 8
Longest Common Subsequence (LCS) Problem
Can have insertion and deletions but no substitutions (no mismatches) Ex: V: ATCTGAT W: TGCATA LCS: TCTA Developing Pairwise Sequence Alignment Algorithms 9
LCS Problem (cont.)
Similarity score s i-1,j s i,j = max { s i,j-1 s i-1,j-1 + 1, if vi = wj On board example V: ATCTGAT W:TGCATA Developing Pairwise Sequence Alignment Algorithms 10
Indels – insertions and deletions (e.g., gaps)
alignment of V and W V = rows of similarity matrix (vertical axis) W = columns of similarity matrix (horizontal axis) Space (gap) in W insertion (UP) Space (gap) in V deletion Match (no mismatch in LCS) (LEFT) (DIAG) Developing Pairwise Sequence Alignment Algorithms 11
LCS(V,W) Algorithm
for i = 0 to n si,0 = 0 for j = 1 to m s0,j = 0 for i = 1 to n for j = 1 to m if vi = wj si,j = si-1,j-1 + 1; bi,j = DIAG else if si-1,j >= si,j-1 si,j = si-1,j; bi,j = UP else si,j = si,j-1; bi,j = LEFT Developing Pairwise Sequence Alignment Algorithms 12
Print-LCS(b,V,i,j)
if i == 0 or j == 0 return if bi,j = DIAG PRINT-LCS(b, V, i-1, j-1) print vi else if bi,j = UP PRINT-LCS(b, V, i-1, j) else PRINT-LCS(b, V, i, j-1) Developing Pairwise Sequence Alignment Algorithms Note: b and V could be global variables 13
Programming Workshop and Homework – Implement LCS
Workshop/Homework – Write a Python script to implement LCS (V, W). Prompt the user for 2 sequences (V and W) and display b and s Homework (due Thursday, November 15 th with Automatic Extension to Tuesday, November 27 th ) – Add the Print-LCS(V, i, j) function to your Python script. The script should prompt the user for 2 sequences and print the longest common sequence.
Developing Pairwise Sequence Alignment Algorithms 14
Classic Papers
Needleman, S.B. and Wunsch, C.D. A General Method Applicable to the Search for Similarities in Amino Acid Sequence of Two Proteins.
J. Mol. Biol.
, 48, pp. 443-453, 1970. ( http://www.cs.umd.edu/class/spring2003/cmsc838t/ papers/needlemanandwunsch1970.pdf
) Smith, T.F. and Waterman, M.S. Identification of Common Molecular Subsequences.
J. Mol. Biol.
, 147, pp. 195-197, 1981.( http://www.cmb.usc.edu/papers/msw_papers/ msw-042.pdf
) Developing Pairwise Sequence Alignment Algorithms 15