Longest Common Subsequence (LCS) Dr. Nancy Warter-Perez Functions  Function definition   def adder(a, b, c): return a+b+c Function calls  adder(1, 2, 3) -> 6 Introduction to Python –

Download Report

Transcript Longest Common Subsequence (LCS) Dr. Nancy Warter-Perez Functions  Function definition   def adder(a, b, c): return a+b+c Function calls  adder(1, 2, 3) -> 6 Introduction to Python –

Longest Common Subsequence (LCS)

Dr. Nancy Warter-Perez

Functions

  Function definition  def adder(a, b, c): return Function calls  adder(1, 2, 3) -> 6 a+b+c Introduction to Python – Part III 2

Functions – Polymorphism

>>>def fn2(c): … a = c * 3 … >>> print fn2(5) 15 >>> print fn2(1.5) 4.5

>>> print fn2([1,2,3]) [1,2,3,1,2,3,1,2,3] >>> print fn2("Hi") HiHiHi return a Introduction to Python – Part III 3

Functions - Recursion

def fn_Rec(x): if x == []: return fn_Rec(x[1:]) print x[0], y = [1,2,3,4] fn_Rec(y) >>> 4 3 2 1 Introduction to Python – Part III 4

Scopes

    Scopes divine the “visibility” of a variable Variables defined outside of a function are visible to all of the functions within a module (file) Variables defined within a function are local to that function To make a variable that is defined within a function global, use the global keyword Ex 1: x = 5 def fnc(): x = 2 print x, fnc() print x >>> 2 5 Developing Pairwise Sequence Alignment Algorithms Ex 2: x = 5 def fnc(): global x x = 2 print x, fnc() print x >>> 2 2 5

Modules

  Why use?

  Code reuse System namespace partitioning (avoid name clashes)  Implementing shared services or data How to structure a Program   One top-level file  Main control flow of program Zero or more supplemental files known as modules  Libraries of tools Developing Pairwise Sequence Alignment Algorithms 6

Modules - Import

 Import – used to gain access to tools in modules Ex: contents of file b.py

def spam(text): print text, 'spam' contents of file a.py

import b b.spam('gumby') Developing Pairwise Sequence Alignment Algorithms 7

Dynamic Programming

   Applied to optimization problems Useful when   Problem can be recursively divided into sub-problems Sub-problems are not independent Examples  Needleman-Wunch Global Alignment Algorithm  Smith-Waterman Local Alignment Algorithm Developing Pairwise Sequence Alignment Algorithms 8

Longest Common Subsequence (LCS) Problem

  Can have insertion and deletions but no substitutions (no mismatches) Ex: V: ATCTGAT W: TGCATA LCS: TCTA Developing Pairwise Sequence Alignment Algorithms 9

LCS Problem (cont.)

 Similarity score s i-1,j s i,j = max { s i,j-1 s i-1,j-1 + 1, if vi = wj  On board example  V: ATCTGAT  W:TGCATA Developing Pairwise Sequence Alignment Algorithms 10

Indels – insertions and deletions (e.g., gaps)

 alignment of V and W  V = rows of similarity matrix (vertical axis)     W = columns of similarity matrix (horizontal axis) Space (gap) in W  insertion  (UP) Space (gap) in V  deletion Match (no mismatch in LCS)  (LEFT) (DIAG) Developing Pairwise Sequence Alignment Algorithms 11

LCS(V,W) Algorithm

for i = 0 to n si,0 = 0 for j = 1 to m s0,j = 0 for i = 1 to n for j = 1 to m if vi = wj si,j = si-1,j-1 + 1; bi,j = DIAG else if si-1,j >= si,j-1 si,j = si-1,j; bi,j = UP else si,j = si,j-1; bi,j = LEFT Developing Pairwise Sequence Alignment Algorithms 12

Print-LCS(b,V,i,j)

if i == 0 or j == 0 return if bi,j = DIAG PRINT-LCS(b, V, i-1, j-1) print vi else if bi,j = UP PRINT-LCS(b, V, i-1, j) else PRINT-LCS(b, V, i, j-1) Developing Pairwise Sequence Alignment Algorithms Note: b and V could be global variables 13

Programming Workshop and Homework – Implement LCS

  Workshop/Homework – Write a Python script to implement LCS (V, W). Prompt the user for 2 sequences (V and W) and display b and s Homework (due Thursday, November 15 th with Automatic Extension to Tuesday, November 27 th ) – Add the Print-LCS(V, i, j) function to your Python script. The script should prompt the user for 2 sequences and print the longest common sequence.

Developing Pairwise Sequence Alignment Algorithms 14

Classic Papers

  Needleman, S.B. and Wunsch, C.D. A General Method Applicable to the Search for Similarities in Amino Acid Sequence of Two Proteins.

J. Mol. Biol.

, 48, pp. 443-453, 1970. ( http://www.cs.umd.edu/class/spring2003/cmsc838t/ papers/needlemanandwunsch1970.pdf

) Smith, T.F. and Waterman, M.S. Identification of Common Molecular Subsequences.

J. Mol. Biol.

, 147, pp. 195-197, 1981.( http://www.cmb.usc.edu/papers/msw_papers/ msw-042.pdf

) Developing Pairwise Sequence Alignment Algorithms 15