L20Searching Sorting.ppt

Download Report

Transcript L20Searching Sorting.ppt

Searching & Sorting
Manipulating lists of data
for quick retrieval
CMSC 104
1
Common Problems



CMSC 104
There are some very common problems that
computers are asked to solve:
Searching through a lot of records for a
specific record.
Who uses this ?
o Airlines
o Companies that take phone orders
o Credit Card Companies
o ... almost any company
2
Searching
Does this search have to be fast ?
 How can we make the search faster ?

o By keeping the records in some order
o By using an efficient search algorithm

Search algorithms
o Sequential search
o Binary search
CMSC 104
3
Sequential Search
on an Unordered File
Get the search criterion from the user
 Get the first record from the file
 While the record doesn’t match the
criterion && there are still more records
in the file get the next record


CMSC 104
When do we know that there wasn’t a
record in the file that matched.
4
Sequential Search
of an Ordered File
Get the search criterion from the user
 Get the first record from the file
 While the record is less than the
criterion get the next record
 If the record matches the criterion then
success else there is no match in the
file.
 When do we know that there wasn’t a
CMSC 104 record in the file that matched ?

5
Sequential Search of
Ordered vs. Unordered List

If the order was ascending alphabetical
on customer’s last names, how would
the search for John Adams on the
ordered list compare to the search on
the unordered list
o if John Adams was in the list ?
o if John Adams was not in the list ?
CMSC 104
6
Ordered vs Unordered
(continued)

How about George Washington ?
o unordered
• in the file
• not in the file
o ordered
• in the file
• not in the file

CMSC 104
James Madison ?
7
More Searching
Overall, we don’t really see much
improvement if we’re using the
sequential search.
 Maybe we need a better search
algorithm.
 How else could we search an ordered
file ?

CMSC 104
8
Binary Search
If we have an ordered list and we know
how many things are in the list (i.e. # of
records in a file), we can use a different
strategy.
 Binary Search gets it’s name, because
we are always going to divide things
into two parts.

CMSC 104
9
How Binary Search Works
Always look at the
center value.
Each time you get to
get to discard half of
the remaining list.
CMSC 104
Is this fast ?
10
How fast is Binary Search ?
Worst case : 11 Items in the list took 4
tries
 How about a list with 32 items ?

o 1st try - list has 16 items
o 2nd try - list has 8 items
o 3rd try - list has 4 items
o 4th try - list has 2 items
o 5th try - list has 1 item
CMSC 104
11
More examples
List has 250 items
1st try - 125 items
2nd try - 63 items
3rd try - 32 items
4th try - 16 items
5th try - 8 items
6th try - 4 items
7th try - 2 items
8th try - 1 item
CMSC 104
List has 512 items
1st try - 256 items
2nd try - 128 items
3rd try - 64 items
4th try - 32 items
5th try - 16 items
6th try - 8 items
7th try - 4 items
8th try - 2 items
9th try - 1 item
12
What’s the pattern ?
List of 11 took 4 tries
 List of 32 took 5 tries
 List of 250 took 8 tries
 List of 512 took 9 tries

32 = 25 and 512 = 29
 8 < 11 < 16 23 < 11 < 24
 128 < 250 < 256 27 < 250 < 28
CMSC 104

13
The fastest !

How long (worst case) will it take to find
an item in a list 30,000 items long ?
210 = 1024
211 = 2048
212 = 4096
213 = 8192
214 = 16384
215 = 32768
So it will take 15 tries.
 It only takes 15 tries to find what we
want out of 30,000 items - that’s
awesome !!!

CMSC 104
14
Lg n
We say that the binary search algorithm
runs in lg n time.
 Lg n means the log to the base 2 of
some value of n
 8 = 23 lg 8 = 3 16 = 24 lg 16 = 4
 There are no algorithms that run faster
than lg n time.

CMSC 104
15
Searching and Sorting
(continued)
We have a very fast search algorithm Binary search
 But, the list has to be sorted, before we
can search it with binary search.
 To be really efficient, we also need a
fast sort algorithm.

CMSC 104
16
Some Sort Algorithms
Bubble Sort
Selection Sort
Insertion Sort
Heap Sort
Merge Sort
Quick Sort
In an effort to find a very fast sorting
algorithm, we have many known sorting
algorithms. Bubble sort is the slowest,
running in n2 time. Too slow !
CMSC 104
17
Speed of Sorting Algorithms
Most Sorting algorithms run in n lg n
time for the worst case.
 Quick Sort runs a little faster for the
average case. So it is usually the sort
that’s used. The algorithm for Quick
Sort is quite complicated. It is shown in
your book. There is a pre-written
function called qsort in the C standard
library.

CMSC 104
18
Bubble Sort
void BubbleSort (int a[ ] , int size)
{
int i, j, temp;
for (i = 0; i < size; i++)
{
for (j = 0; j < size - 1; j++)
{
if (a[j] > a[j+1])
{
temp = a[j];
a[j] = a[j + 1];
a[j+1] = temp;
}
}
}
CMSC 104
}
19
Insertion Sort
Insertion sort is slower than quick sort,
but not as slow as bubble sort, and it is
easy to understand
 Insertion sort works the same way as
arranging your hand when playing
cards.

CMSC 104
o Out of the pile of unsorted cards that were
dealt to you, you pick up a card and place
it in your hand in the correct position
relative to the cards you’re already holding.
20
Arranging Your Hand
7
5
CMSC 104
7
21
Arranging Your Hand
5
CMSC 104
7
5
6
7
5
6
7
K
5
6
7
8
K
22
Insertion Sort
7
K
7
5
7
2
>
7
5 < 7
CMSC 104
Unsorted - shaded
Look at 2nd item - 5
1 Compare 5 to 7
5 is smaller, so move 5
v
to temp, leaving
5
an empty slot in
position 2
Move 7 into the empty
slot, leaving position 1
open
3
Move 5 into the open
position
23
Insertion Sort
5
5
5
7
>
CMSC 104
6
Look at next item - 6
v
6
7
2
K
1
7
5
5
6
<
7
7
Compare to 1st - 5
6 is larger, so leave 5
Compare to next - 7,
6 is smaller, so move
6 to temp, leaving an
empty slot
Move 7 into the empty
slot, leaving position 2
open
3 Move 6 to the open
2nd position
24
Insertion Sort
5
6
7
K
Look at next item - King
Compare to 1st - 5
King is larger, so
leave 5 where it is
Compare to next - 6,
King is larger, so
leave 6 where it is
Compare to next - 7
King is larger, so
leave 7 where it is
CMSC 104
25
Insertion Sort
5
6
7
K
8
5
6
7
K
8
5
6
7
5
6
7
5
CMSC 104
6
7
v
8
K
2
8
1
>
<
K
K
3
26
Courses at UMBC

Algorithms - CMSC 441
o Studies algorithms and their speed

Cryptology - CMSC 443
o The study of making & breaking codes write programs that can break the code like
Pdeo eo pda yknnayp wjosan
CMSC 104
27