Transcript CS222 Algorithms Lecture 7 String Matching 2 + Greedy Approach
CS222 Algorithms
First Semester 2003/2004 Dr. Sanath Jayasena Dept. of Computer Science & Eng.
University of Moratuwa Lecture 7 (28/10/2003)
String Matching Part 2 Greedy Approach
Overview
• Previous lecture: String Matching Part 1 – Naïve Algorithm, Rabin-Karp Algorithm • This lecture – String Matching Part 2 • • String Matching using Finite Automata Knuth-Morris-Pratt (KMP) Algorithm – Greedy Approach to Algorithm Design October 2003 Sanath Jayasena 7-2
String Matching
PART 2
Finite Automata
• A
finite automaton A
, , δ ), where
M
is a 5-tuple (
Q
,
q
0 , – – – –
Q q
0
A
is a finite set of ε
Q Q
is the is a finite
states start state
is a set of
accepting states input alphabet
– δ is the
transition function
that gives the next state for a given current state and input October 2003 Sanath Jayasena 7-4
How a Finite Automaton Works
• • • • • The finite automaton
M
begins in state
q
0 Reads characters from one at a time If
M
is in state
q
character
a
,
M
and reads input moves to state δ (
q
,
a
) If its current state
q
is in
A
,
M
is said to have
accepted
the string read so far An input string that is not accepted is said to be
rejected
October 2003 Sanath Jayasena 7-5
• • •
Example
Q
= {0,1},
q
0 = 0, A={1}, = {a, b} δ (
q
,
a
) shown in the transition table/diagram This accepts strings that end in an odd number of a’s; e.g., abbaaa is accepted, aa is rejected a state 0 1 input a b 1 0 0 0 transition table October 2003 Sanath Jayasena b 0 transition diagram a b 1 7-6
String-Matching Automata
• Given the pattern
P
[1..
m
], build a finite automaton
M
– The state set is
Q
={0, 1, 2, …,
m
} – The start state is 0 – The only accepting state is
m
• Time to build
M
can be large if is large October 2003 Sanath Jayasena 7-7
String-Matching Automata
…contd • Scan the text string
T
[1..
n
] to find all occurrences of the pattern
P
[1..
m
] • • String matching is efficient: Θ(
n
) – Each character is examined exactly once – Constant time for each character But …time to compute – δ Has
O
(
m
| | ) entries δ is
O
(
m
| |) October 2003 Sanath Jayasena 7-8
Algorithm
Input
: Text string
T
[1..
n
], δ and
m
Result
: All valid shifts displayed
FINITE-AUTOMATON-MATCHER
n
←
length
[
T
]
q
← 0 for i ← 1 to n
q
← δ (
q
,
T
[
i
]) (
T
,
m,
δ) if q =
m
print “pattern occurs with shift”
i-m
October 2003 Sanath Jayasena 7-9
Knuth-Morris-Pratt (KMP) Method
• • Avoids computing δ (transition function) Instead computes a
prefix function O
(
m
) time π in – π has only
m
entries • Prefix function stores info about how the pattern matches against shifts of itself – Can avoid testing useless shifts October 2003 Sanath Jayasena 7-10
Terminology/Notations
• • • String
w
is a
prefix
of string
x
, if
x=wy
for some string
y
(e.g., “srilan” of “srilanka”) String
w
is a
suffix
of string
x
, if
x=yw
for some string
y
(e.g., “anka” of “srilanka”) The
k
-character prefix of the pattern
P
[1..
m
] denoted by
P k
– E.g.,
P
0 = ε,
P m
=
P
=
P
[1..
m
] October 2003 Sanath Jayasena 7-11
Prefix Function for a Pattern
• Given that pattern prefix
P
[1..
q
] matches text characters
T
[(
s
+1)..(
s
+
q
)], what is the least shift
s
’ >
s
such that
P
[1..
k
] =
T
[(
s
’+1)..(
s
’+
k
)] where
s
’+
k
=
s
+
q
?
• At the new shift
s
’, no need to compare the first
k
characters of
P
with corresponding characters of
T
– Since we know that they match October 2003 Sanath Jayasena 7-12
b
Prefix Function: Example 1
a c b a b a b a a b c b a
T s
a b
q
a b a c a
P
b a
s
’ c b a b a b a a b a b a b a
P q
October 2003 Sanath Jayasena a b a
P k
c b a
T
a b a b a c a
P k
Compare pattern against itself; longest prefix of
P
suffix of
P
5 that is also a is
P
3 ; so π [5]= 3 7-13
Prefix Function: Example 2
October 2003 Sanath Jayasena
i
1 2 3 4 5 6 7 8 9 10
P
[
i
]
a b a b a b a b c a
π [
i
] 0 0 1 2 3 4 5 6 0 1 7-14
Knuth-Morris-Pratt (KMP) Algorithm
• • • Information stored in prefix function – Can speed up both the naïve algorithm and the finite-automaton matcher KMP Algorithm on the board – 2 parts: KMP-MATCHER, PREFIX Running time – PREFIX takes
O
(
m
) – KMP-MATCHER takes
O
(
m
+
n
) October 2003 Sanath Jayasena 7-15
Greedy Approach to Algorithm Design
Introduction
• • Greedy methods typically apply to
optimization problems
in which a set of choices must be made to arrive at an
optimal solution
Optimization problem – There can be many solutions – Each solution has a value – We wish to find a solution with the
optimal
(minimum or maximum) value October 2003 Sanath Jayasena 7-17
Example Optimization Problems
• • • How to give a balance in minimum number of coins?
How to allocate resources to maximize profit from your business?
A thief has a knapsack of capacity
c
; what items to put in it to maximize profit?
– 0-1 knapsack problem (binary choice) – Fractional knapsack problem October 2003 Sanath Jayasena 7-18
Greedy Approach
• • Make each choice in a
locally optimal
manner – Always makes the choice that looks best at the moment – We hope that this will lead to a globally optimal solution Greedy method doesn’t always give optimal solutions, but for many problems it does October 2003 Sanath Jayasena 7-19
Example
• • • A cashier gives change using coins of Rs.10, 5, 2 and 1 Suppose the amount is Rs. 37 Need to minimize the number of coins – Try to use the largest coin to cover the remaining balance – So, we get 10 + 10 + 10 + 5 + 2 – Does this give the optimal solution?
October 2003 Sanath Jayasena 7-20
Elements of Greedy Approach
1. Greedy-choice property
– A globally optimal solution can be arrived at by making a locally optimal (greedy) choice – Proving this may not be trivial
2. Optimal substructure
– Optimal solution to the problem contains within it optimal solutions to subproblems October 2003 Sanath Jayasena 7-21
Applications of Greedy Approach
• • • • Graph algorithms – Minimum spanning tree – Shortest path Data compression – Huffman coding Activity selection (scheduling) problems Fractional knapsack problem – Not the 0-1 knapsack problem October 2003 Sanath Jayasena 7-22
Announcements
• Assignment 4 – assigned today – due next week • Next 2 lectures – Topic: Graphs – By Ms Sudanthi Wijewickrema October 2003 Sanath Jayasena 7-23