Cyclic string-to-string correction
Download
Report
Transcript Cyclic string-to-string correction
Cyclic string-to-string
correction
Vida Movahedi
Elderlab, October 2009
Contents
• Problem Definition
• Linear string-to-string correction
• Dynamic Programming
• Cyclic strings
• A faster approach
• Application: curve similarity
Problem Definition
• Two strings:
A a1a2 ....an
B b1b2 ...bm
mn
• Edit operation
s:a b
insert : b
delete: a
change: a b, a, b
edit sequence: S s1s2 ...sk
Taking A to B
Linear string-to-string correction
• Cost of edit
cost of edit operation: (s)
k
cost of edit sequence : ( S s1s2 ...sk ) ( si )
i 1
• Example: edit ‘high’ to ‘low’
• Edit sequence: delete ‘h’, change ‘i’ to ‘l’,
delete ‘g’, change ‘h’ to ‘o’, insert ‘w’
(S ) (h ) (i l ) ( g ) (h o) ( w)
• Goal: find edit sequence with minimum
cost ( A, B) : min ( S )
S
Edit Graph, path and trace
Dynamic Programming
• Why is dynamic programming an option?
(i, j) : A 1, i , B 1, j
• Complexity: O(nm)
Cyclic strings
• Cyclic shifts
k (a1a2 ...an ) ak 1...an a1...ak ,1 k n, and 0 ( A) A
• Edit cost if cyclic shifts
([A],[ B]) : min k ( A), l ( B) | 0 k n,0 l m
([ A],[ B]) ( A, [ B])
• m possible shifts, m runs of dynamic
programming: O(nm2)
A faster approach
• All edit graphs are included in edit graph
of A and BB (let’s call it graph H)
Non-crossing Paths
• Consider shifts j, k, l where
0 j k l m
• Traces corresponding to the optimal edit
sequences are non-crossing on graph H:
P(j), P(k), P(l)
• Reducing necessary calculations
Non-crossing paths
O(nmlogm) algorithm
An Application: Curve Similarity
• Two curves as two
strings A and B
• Edit cost: Euclidean
distance
• Minimum edit cost
corresponds to
optimal matching
• Symmetric cost for
each edit operation
Symmetric distance
Contour Mapping Distance=7.73
References
Maurice Maes (1990), “On a cyclic stringto-string correction problem”, Information
Processing Letters, vol. 35, pp. 73-78.