Horspool Algorithm - National Chi Nan University

Download Report

Transcript Horspool Algorithm - National Chi Nan University

Raita Algorithm
Tuning the Boyer-Moore-Horspool string searching algorithm,
Software - Practice & Experience, 22(10):879-884
T. RAITA
Advisor: Prof. R. C. T. Lee
Speaker: H. M. Chen
1
String Matching Problem
• Given a pattern string P of length m and a text string T
of length n, we would like to know whether there
exists an occurrence of P in T.
Text
Pattern
2
Basic idea
• For each position of the search window, we compare its last
character(ß) with the last character of the pattern.
• If they match, we compare the first character of the pattern
with that of the window. If they match , we compare the
middle character of the pattern with that of the window. Then,
we search the pattern from second character to the end until
we either find the pattern
or search
fail on a text character.
Forward
Text
ß
2
3
1
Pattern
3
Basic idea
• Then, whether there was a match or not, we shift the window
according to the next occurrence of the letter ß in the pattern.
Text
Safe shift
ß
ß
no ß α in this part
4
• If a mismatch or a complete match occurs,
the Raita algorithm uses the Horspool
Algorithm.
5
Preprocessing phase
raBc table
P = p0p1…pm-2pm-1.
The value of pm-2 is set to 1, the value of p0 is m-1. The value is
increase towards to left from pm-2 to p0. We choose the smallest
value of each alphabet.
The value of other alphabet(*) is set to m.
Example :
T:G
P: G
7
C A T C G C A G A G A G T A T A C A G T A C G
C A G A G A G
6
a
raBc[a]
5
4
3
2
1
A C G *
1 6
2
8
6
Example(1/4)
ACG*
1 62 8
First attempt
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
G C A T C G C A G A G A G T A T A C A G T A C G
mismatch
G C A G A G A G
Secnod attempt
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
G C A T C G C A G A G A G T A T A C A G T A C G
2 mismatch
1
G C A G A G A G
7
Example(2/4)
ACG*
1 62 8
Third attempt
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
G C A T C G C A G A G A G T A T A C A G T A C G
2 mismatch
1
G C A G A G A G
Fourth attempt
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
G C A T C G C A G A G A G T A T A C A G T A C G
2
4 5
6 3
7 8
1
G C A G A G A G
8
Example(3/4)
ACG*
1 62 8
Fifth attempt
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
G C A T C G C A G A G A G T A T A C A G T A C G
mismatch
G C A G A G A G
Sixth attempt
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
G C A T C G C A G A G A G T A T A C A G T A C G
mismatch
G C A G A G A G
9
Example(4/4)
ACG*
1 62 8
Seventh attempt
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
G C A T C G C A G A G A G T A T A C A G T A C G
2 mismatch
1
G C A G A G A G
The Raita algorithm performs 18 character comparisons on the example.
10
Time complexity
• preprocessing phase in O(m+σ) time and O(σ) space
complexity.
• searching phase in O(mn) time complexity.
(σ is the number of storing characters)
11
References
• RAITA T., 1992, Tuning the Boyer-Moore-Horspool string
searching algorithm, Software - Practice & Experience,
22(10):879-884.
• SMITH, P.D., 1994, On tuning the Boyer-Moore-Horspool
string searching algorithms, Software - Practice & Experience,
24(4):435-436.
12
THANK YOU
13