Cyclic string-to-string correction

Download Report

Transcript Cyclic string-to-string correction

Cyclic string-to-string
correction
Vida Movahedi
Elderlab, October 2009
Contents
• Problem Definition
• Linear string-to-string correction
• Dynamic Programming
• Cyclic strings
• A faster approach
• Application: curve similarity
Problem Definition
• Two strings:
A  a1a2 ....an
B  b1b2 ...bm
mn
• Edit operation
s:a b
insert :   b
delete: a  
change: a  b, a, b  
edit sequence: S  s1s2 ...sk
Taking A to B
Linear string-to-string correction
• Cost of edit
cost of edit operation:  (s)
k
cost of edit sequence :  ( S  s1s2 ...sk )    ( si )
i 1
• Example: edit ‘high’ to ‘low’
• Edit sequence: delete ‘h’, change ‘i’ to ‘l’,
delete ‘g’, change ‘h’ to ‘o’, insert ‘w’
 (S )   (h  )   (i  l )   ( g  )   (h  o)   (  w)
• Goal: find edit sequence with minimum
cost  ( A, B) : min  ( S )
S
Edit Graph, path and trace
Dynamic Programming
• Why is dynamic programming an option?
(i, j) :  A 1, i , B 1, j
• Complexity: O(nm)

Cyclic strings
• Cyclic shifts
 k (a1a2 ...an )  ak 1...an a1...ak ,1  k  n, and  0 ( A)  A
• Edit cost if cyclic shifts
 ([A],[ B]) : min  k ( A), l ( B) | 0  k  n,0  l  m
 ([ A],[ B])   ( A, [ B])
• m possible shifts, m runs of dynamic
programming: O(nm2)
A faster approach
• All edit graphs are included in edit graph
of A and BB (let’s call it graph H)
Non-crossing Paths
• Consider shifts j, k, l where
0 j k l m
• Traces corresponding to the optimal edit
sequences are non-crossing on graph H:
P(j), P(k), P(l)
• Reducing necessary calculations
Non-crossing paths
O(nmlogm) algorithm
An Application: Curve Similarity
• Two curves as two
strings A and B
• Edit cost: Euclidean
distance
• Minimum edit cost
corresponds to
optimal matching
• Symmetric cost for
each edit operation 
Symmetric distance
Contour Mapping Distance=7.73
References
Maurice Maes (1990), “On a cyclic stringto-string correction problem”, Information
Processing Letters, vol. 35, pp. 73-78.