Transcript L3.ppt

Algorithms
and
Efficiency of Algorithms
February 4th
Today Outline
More algorithms
• Variations of sequential search
Practical applications
• Pattern matching
• Data cleanup
There are many different algorithms to solve the
same problem.
•
•
So, how do we know when we have a good one???
I.e., how do we measure the efficiency of an
algorithm?
Write algorithms for
•
•
•
•
•
•
•
Find all occurences of target
Find number of occurences of target
Find number of values larger than target
Find largest
Find smallest
Find sum
Find average
A Search Application in
Bioinformatics
• Human genome: sequence of billions of nucleotides
• Gene
– Determines human behavior
– Sequence of tens of thousands of nucleotides{T,C,A,G}
– The sequence is not fully known, only a portion of it
• Problem: How to determine a gene in the human genome?
Genome: …….TCAGGCTAATCGTAGG…….
Gene probe:
TAATC
Idea: Find all matches of the probe within the genome and then examine the
nucleotides in that neighborhood
A Search Application in
Bioinformatics
• Problem:
– Suppose we have a text T = TCAGGCTAATCGTAGG and a pattern P =
TA. Design an algorithm that searches T to find the position of every
instance of P that appears
• E.g., for this text, the algorithm should return the answer:
There is a match at position 7
There is a match at position 13
This problem is a variation of the search algorithm, except that for
every possible starting position every character of P must be compared
with a character of T.
Pattern Matching
• Input
– Text of n characters T1, T2, …, Tn
– Pattern of m (m < n) characters P1, P2, …Pm
• Output:
– Location (index) of every occurrence of pattern within
text
• Algorithm:
– What is the idea?
Pattern Matching
• Algorithm idea:
– Check if pattern matches starting at position 1
– Then check if it matches starting at position 2
– …and so on
• How to check if pattern matches text starting at
position k?
– Check that every character of pattern matches
corresponding character of text
• How many loops will you need?
Pattern Matching
• Algorithm idea
– Get input (text and pattern)
– Set starting location k to 1
– Repeat until reach end of text
• Attempt to match every character in the pattern beginning at
pos k in text
• If there was a match, print k
• Add 1 to k
– Stop
• Question: is this an algorithm?
– Yes, at a high level of abstraction
– Now we need to write in pseudocode
Pattern Matching Algorithm (Fig. 2.12)
Get values for n, m, the text T1T2…Tn and the pattern
P1P2…Pm
Set k to 1
Repeat until k>n-m+1
Set i to 1
Set Mismatch to NO
Repeat until either (i>m) or (Mismatch = YES)
if Pi ≠ Tk+(i-1) then
Set Mismatch to YES
else Increment i by 1
if Mismatch = NO then
Print the message “There is a match at position” k
increment k by 1
Variations on the pattern matching algorithm
•Find only the first match for P in T.
•Find only the last match for P in T.
Comparing Algorithms
• Algorithm
–
–
–
–
Design
Correctness
Efficiency
Also, clarity, elegance, ease of understanding
• There are many ways to solve a problem
– Conceptually
– Also different ways to write pseudocode for the same
conceptual idea
• How to compare algorithms?
Efficiency of Algorithms
• Efficiency: Amount of resources used by an
algorithm
• Space (number of variables)
• Time (number of instructions)
• When design algorithm must be aware of its use of
resources
• If there is a choice, pick the more efficient algorithm!
Efficiency of Algorithms
Does efficiency matter?
• Computers are so fast these days…
• Yes, efficiency matters a lot!
– There are problems (actually a lot of them) for which
all known algorithms are so inneficient that they are
impractical
– Remember the shortest-path-through-all-cities problem
from Lab1…
Efficiency of Algorithms
How to measure time efficiency?
• Running time: let it run and see how long it takes
– On what machine?
– On what inputs?
Time efficiency depends on input
• Example: the sequential search algorithm
– In the best case, how fast can the algorithm halt?
– In the worst case, how fast can the algorithm halt?
Time Efficiency
• We want a measure of time efficiency which is independent
of machine, speed etc
– Look at an algorithm pseudocode and estimate its running time
– Look at 2 algorithm pseudocodes and compare them
• Efficiency of an algorithm:
– the number of pseudocode instructions (steps) executed
• Is this accurate?
– Not all instructions take the same amount of time…
– But..Good approximation of running time in most cases
Data Cleanup Algorithms
What are they?
A systematic strategy for removing errors from data.
Why are they important?
Errors occur in all real computing situations.
How are they related to the search algorithm?
To remove errors from a series of values, each value must
be examined to determine if it is an error.
E.g., suppose we have a list d of data values, from which we
want to remove all the zeroes (they mark errors), and
pack the good values to the left. Legit is the number of
good values remaining when we are done.
5
d1
3 4 0 6 2
d2 d3 d4 d5 d6
4 0
d7 d8
Legit
Data Cleanup: Copy-Over algorithm
Idea: Scan the list from left to right and copy non-zero values
to a new list
Copy-Over Algorithm (Fig 3.2)
Get values for n and the list of n values A1, A2, …, An
Set left to 1
Set newposition to 1
While left <= n do
• If Aleft is non-zero
• Copy A left into B newposition
(Copy it into position newposition in new list
• Increase left by 1
• Increase newposition by 1
• Else increase left by 1
• Stop
•
•
•
•
Data Cleanup: The Shuffle-Left Algorithm
• Idea:
– go over the list from left to right. Every time we see a
zero, shift all subsequent elements one position to the
left.
– Keep track of nb of legitimate (non-zero) entries
• How does this work?
• How many loops do we need?
Shuffle-Left Algorithm (Fig 3.1)
Get values for n and the list of n values A1, A2, …, An
Set legit to n
Set left to 1
Set right to 2
Repeat steps 6-14 until left > legit
6
if Aleftt ≠ 0
7 Increase left by 1
8 Increase right by 1
9
else
10 Reduce legit by 1
11 Repeat 12-13 until right > n
12 Copy Aight into Aright-1
13 Increase right by 1
14 Set right to left + 1
15 Stop
1
2
3
4
5
Exercising the Shuffle-Left Algorithm
5
d1
3 4 0 6 2
d2 d3 d4 d5 d6
4 0
d7 d8
legit
Data Cleanup: The Converging-Pointers Algorithm
• Idea:
– One finger moving left to right, one moving
right to left
– Move left finger over non-zero values;
– If encounter a zero value then
• Copy element at right finger into this position
• Shift right finger to the left
Converging Pointers Algorithm (Fig 3.3)
1
2
3
4
5
Get values for n and the list of n values A1,
A2,…,An
Set legit to n
Set left to 1
Set right to n
Repeat steps 6-10 until left ≥ right
6 If the value of Aleft≠0 then increase left by 1
7 Else
8 Reduce legit by 1
9 Copy the value of Aright to Aleft
10 Reduce right by 1
11 if Aleft=0 then reduce legit by 1.
12 Stop
Exercising the Converging Pointers Algorithm
5
d1
3 4 0 6 2
d2 d3 d4 d5 d6
4 0
d7 d8
legit
Measuring Efficiency by Counting Steps
The efficiency of an algorithm is the number of steps that it
takes to complete its task. Sometimes, this is called the
complexity of an algorithm.
Efficiency depends on the data. E.g., the search algorithm
takes fewer steps to locate a value at the beginning of a
list than to locate a value at the end of the list.
The “worst case” efficiency is the maximum number of steps
that an algorithm can take for any collection of data
values.
If the input has size n, efficiency will be a function of n