CS222 Algorithms Lecture 7 String Matching 2 + Greedy Approach

Download Report

Transcript CS222 Algorithms Lecture 7 String Matching 2 + Greedy Approach

CS222 Algorithms

First Semester 2003/2004 Dr. Sanath Jayasena Dept. of Computer Science & Eng.

University of Moratuwa Lecture 7 (28/10/2003)

String Matching Part 2 Greedy Approach

Overview

• Previous lecture: String Matching Part 1 – Naïve Algorithm, Rabin-Karp Algorithm • This lecture – String Matching Part 2 • • String Matching using Finite Automata Knuth-Morris-Pratt (KMP) Algorithm – Greedy Approach to Algorithm Design October 2003 Sanath Jayasena 7-2

String Matching

PART 2

Finite Automata

• A

finite automaton A

,  , δ ), where

M

is a 5-tuple (

Q

,

q

0 , – – – –

Q q

0

A

 is a finite set of ε 

Q Q

is the is a finite

states start state

is a set of

accepting states input alphabet

– δ is the

transition function

that gives the next state for a given current state and input October 2003 Sanath Jayasena 7-4

How a Finite Automaton Works

• • • • • The finite automaton

M

begins in state

q

0 Reads characters from  one at a time If

M

is in state

q

character

a

,

M

and reads input moves to state δ (

q

,

a

) If its current state

q

is in

A

,

M

is said to have

accepted

the string read so far An input string that is not accepted is said to be

rejected

October 2003 Sanath Jayasena 7-5

• • •

Example

Q

= {0,1},

q

0 = 0, A={1},  = {a, b} δ (

q

,

a

) shown in the transition table/diagram This accepts strings that end in an odd number of a’s; e.g., abbaaa is accepted, aa is rejected a state 0 1 input a b 1 0 0 0 transition table October 2003 Sanath Jayasena b 0 transition diagram a b 1 7-6

String-Matching Automata

• Given the pattern

P

[1..

m

], build a finite automaton

M

– The state set is

Q

={0, 1, 2, …,

m

} – The start state is 0 – The only accepting state is

m

• Time to build

M

can be large if  is large October 2003 Sanath Jayasena 7-7

String-Matching Automata

…contd • Scan the text string

T

[1..

n

] to find all occurrences of the pattern

P

[1..

m

] • • String matching is efficient: Θ(

n

) – Each character is examined exactly once – Constant time for each character But …time to compute – δ Has

O

(

m

|  | ) entries δ is

O

(

m

|  |) October 2003 Sanath Jayasena 7-8

Algorithm

Input

: Text string

T

[1..

n

], δ and

m

Result

: All valid shifts displayed

FINITE-AUTOMATON-MATCHER

n

length

[

T

]

q

← 0 for i ← 1 to n

q

← δ (

q

,

T

[

i

]) (

T

,

m,

δ) if q =

m

print “pattern occurs with shift”

i-m

October 2003 Sanath Jayasena 7-9

Knuth-Morris-Pratt (KMP) Method

• • Avoids computing δ (transition function) Instead computes a

prefix function O

(

m

) time π in – π has only

m

entries • Prefix function stores info about how the pattern matches against shifts of itself – Can avoid testing useless shifts October 2003 Sanath Jayasena 7-10

Terminology/Notations

• • • String

w

is a

prefix

of string

x

, if

x=wy

for some string

y

(e.g., “srilan” of “srilanka”) String

w

is a

suffix

of string

x

, if

x=yw

for some string

y

(e.g., “anka” of “srilanka”) The

k

-character prefix of the pattern

P

[1..

m

] denoted by

P k

– E.g.,

P

0 = ε,

P m

=

P

=

P

[1..

m

] October 2003 Sanath Jayasena 7-11

Prefix Function for a Pattern

• Given that pattern prefix

P

[1..

q

] matches text characters

T

[(

s

+1)..(

s

+

q

)], what is the least shift

s

’ >

s

such that

P

[1..

k

] =

T

[(

s

’+1)..(

s

’+

k

)] where

s

’+

k

=

s

+

q

?

• At the new shift

s

’, no need to compare the first

k

characters of

P

with corresponding characters of

T

– Since we know that they match October 2003 Sanath Jayasena 7-12

b

Prefix Function: Example 1

a c b a b a b a a b c b a

T s

a b

q

a b a c a

P

b a

s

’ c b a b a b a a b a b a b a

P q

October 2003 Sanath Jayasena a b a

P k

c b a

T

a b a b a c a

P k

Compare pattern against itself; longest prefix of

P

suffix of

P

5 that is also a is

P

3 ; so π [5]= 3 7-13

Prefix Function: Example 2

October 2003 Sanath Jayasena

i

1 2 3 4 5 6 7 8 9 10

P

[

i

]

a b a b a b a b c a

π [

i

] 0 0 1 2 3 4 5 6 0 1 7-14

Knuth-Morris-Pratt (KMP) Algorithm

• • • Information stored in prefix function – Can speed up both the naïve algorithm and the finite-automaton matcher KMP Algorithm on the board – 2 parts: KMP-MATCHER, PREFIX Running time – PREFIX takes

O

(

m

) – KMP-MATCHER takes

O

(

m

+

n

) October 2003 Sanath Jayasena 7-15

Greedy Approach to Algorithm Design

Introduction

• • Greedy methods typically apply to

optimization problems

in which a set of choices must be made to arrive at an

optimal solution

Optimization problem – There can be many solutions – Each solution has a value – We wish to find a solution with the

optimal

(minimum or maximum) value October 2003 Sanath Jayasena 7-17

Example Optimization Problems

• • • How to give a balance in minimum number of coins?

How to allocate resources to maximize profit from your business?

A thief has a knapsack of capacity

c

; what items to put in it to maximize profit?

– 0-1 knapsack problem (binary choice) – Fractional knapsack problem October 2003 Sanath Jayasena 7-18

Greedy Approach

• • Make each choice in a

locally optimal

manner – Always makes the choice that looks best at the moment – We hope that this will lead to a globally optimal solution Greedy method doesn’t always give optimal solutions, but for many problems it does October 2003 Sanath Jayasena 7-19

Example

• • • A cashier gives change using coins of Rs.10, 5, 2 and 1 Suppose the amount is Rs. 37 Need to minimize the number of coins – Try to use the largest coin to cover the remaining balance – So, we get 10 + 10 + 10 + 5 + 2 – Does this give the optimal solution?

October 2003 Sanath Jayasena 7-20

Elements of Greedy Approach

1. Greedy-choice property

– A globally optimal solution can be arrived at by making a locally optimal (greedy) choice – Proving this may not be trivial

2. Optimal substructure

– Optimal solution to the problem contains within it optimal solutions to subproblems October 2003 Sanath Jayasena 7-21

Applications of Greedy Approach

• • • • Graph algorithms – Minimum spanning tree – Shortest path Data compression – Huffman coding Activity selection (scheduling) problems Fractional knapsack problem – Not the 0-1 knapsack problem October 2003 Sanath Jayasena 7-22

Announcements

• Assignment 4 – assigned today – due next week • Next 2 lectures – Topic: Graphs – By Ms Sudanthi Wijewickrema October 2003 Sanath Jayasena 7-23