Computer Algorithms Lecture 8 Quicksort Ch. 7

Download Report

Transcript Computer Algorithms Lecture 8 Quicksort Ch. 7

Computer Algorithms
Lecture 10
Quicksort
Ch. 7
Some of these slides are courtesy of D. Plaisted et al, UNC and M. Nicolescu, UNR
Quicksort
• A triumph of analysis by C.A.R. Hoare
– The quicksort algorithm was developed in 1960 by C. A. R.
Hoare while in the Soviet Union, as a visiting student at Moscow
State University. At that time, Hoare worked in a project
on machine translation for the National Physical Laboratory. He
developed the algorithm in order to sort the words to be translated,
to make them more easily matched to an already-sorted Russianto-English dictionary that was stored on magnetic tape.
• Worst-case execution time – (n2).
• Average-case execution time – (n lg n).
– How do the above compare with the complexities of other sorting
algorithms?
• Empirical and analytical studies show that quicksort can
be expected to be twice as fast as its competitors.
Design
• Follows the divide-and-conquer paradigm.
• Divide: Partition (separate) the array A[p..r] into two (possibly empty)
subarrays A[p..q–1] and A[q+1..r].
– Each element in A[p..q–1]  A[q].
– A[q]  each element in A[q+1..r].
– Index q is computed as part of the partitioning procedure.
• Conquer: Sort the two subarrays by recursive calls to quicksort.
• Combine: The subarrays are sorted in place – no work is needed to
combine them.
• How do the divide and combine steps of quicksort compare with
those of merge sort?
Pseudocode
Quicksort(A, p, r)
if p < r then
q := Partition(A, p, r);
Quicksort(A, p, q – 1);
Quicksort(A, q + 1, r)
A[p..r]
5
A[p..q – 1] A[q+1..r]
Partition
5
5
5
Partition(A, p, r)
x, i := A[r], p – 1;
for j := p to r – 1 do
if A[j]  x then
i := i + 1;
A[i]  A[j]
A[i + 1]  A[r];
return i + 1
Example
initially:
p
r
2 5 8 3 9 4 1 7 10 6
i j
next iteration:
2 5 8 3 9 4 1 7 10 6
i j
next iteration:
2 5 8 3 9 4 1 7 10 6
i j
next iteration:
2 5 8 3 9 4 1 7 10 6
i
j
next iteration:
2 5 3 8 9 4 1 7 10 6
i
j
note: pivot (x) = 6
Partition(A, p, r)
x A[r]
i  p – 1;
for j  p to r – 1
do if A[j]  x
then
i  i + 1;
A[i]  A[j]
A[i + 1]  A[r];
return i + 1
Example (Continued)
next iteration:
2 5 3 8 9 4 1 7 10 6
i
j
next iteration:
2 5 3 8 9 4 1 7 10 6
i
j
next iteration:
2 5 3 4 9 8 1 7 10 6
i
j
next iteration:
2 5 3 4 1 8 9 7 10 6
i
j
next iteration:
2 5 3 4 1 8 9 7 10 6
i
j
next iteration:
2 5 3 4 1 8 9 7 10 6
i
j
after final swap:
2 5 3 4 1 6 9 7 10 8
i
j
note: pivot (x) = 6
Partition(A, p, r)
x A[r]
i  p – 1;
for j  p to r – 1
do if A[j]  x
then
i  i + 1;
A[i]  A[j]
A[i + 1]  A[r];
return i + 1
Partitioning
•
•
•
Select the last element A[r] in the subarray A[p..r] as the pivot – the
element around which to partition.
As the procedure executes, the array is partitioned into four
(possibly empty) regions.
1. A[p..i] — All entries in this region are  pivot.
2. A[i+1..j – 1] — All entries in this region are > pivot.
3. A[r] = pivot.
4. A[j..r – 1] — Not known how they compare to pivot.
The above hold before each iteration of the for loop, and constitute
a loop invariant. (4 is not part of the LI.)
Correctness of Partition
• Use loop invariant.
5
r
i p,j
• Initialization:
– Before first iteration
A[p..q – 1] A[q+1..r]
Partition
• A[p..i] and A[i+1..j – 1] are
empty – Conds. 1 and 2 are
satisfied (trivially).
• r is the index of the pivot –
Cond. 3 is satisfied.
• Maintenance:
1.
2.
3.
4.
A[p..i]:  pivot.
A[i+1..j – 1]: > pivot.
A[r] = pivot.
A[j..r – 1]: Not known how they compare
to pivot.
5
5
5
Partition(A, p, r)
x A[r]
i  p – 1;
for j  p to r – 1
do if A[j]  x
then
i  i + 1;
A[i]  A[j]
A[i + 1]  A[r];
return i + 1
Correctness of Partition
Case 1: A[j] > x
Increment j only.
LI is maintained.
i
p
x
 A[r] is unaltered.
 Condition 3 is maintained.
r
>x
x
>x
j
i
p
j
r
x
x
1.
2.
3.
4.
>x
A[p..i]:  pivot.
A[i+1..j – 1]: > pivot.
A[r] = pivot.
A[j..r – 1]: Not known how they compare to pivot.
Correctness of Partition
• Case 2: A[j]  x
 A[r] is unaltered.
– Increment i
– Swap A[i] and A[j]
 Condition 3 is maintained.
1.
2.
3.
4.
• Condition 1 is maintained.
– Increment j
• Condition 2 is maintained.
i
p
j
A[p..i]:  pivot.
A[i+1..j – 1]: > pivot.
A[r] = pivot.
A[j..r – 1]: Not known how they
compare to pivot.
r
x
x
>x
i
p
x
j
r
x
x
>x
Correctness of Partition
• Termination:
– When the loop terminates, j = r, so all elements in A are
partitioned into one of the three cases:
• A[p..i]  pivot
• A[i+1..j – 1] > pivot
• A[r] = pivot
• The last two lines swap A[i+1] and A[r].
– Pivot moves from the end of the array to between the two
subarrays.
– Thus, procedure partition correctly performs the divide
step.
Complexity of Partition
• PartitionTime(n) is given by the number of
iterations in the for loop.
• (n) : n = r – p + 1.
Partition(A, p, r)
x A[r]
i  p – 1;
for j  p to r – 1
do if A[j]  x
then
i  i + 1;
A[i]  A[j]
A[i + 1]  A[r];
return i + 1
Analysis of Quicksort: Worst case
• In the worst case, partitioning always divides the
size n array into these three parts:
– A length one part, containing the pivot itself
– A length zero part, and
– A length n-1 part, containing everything else
• We don’t recur on the zero-length part
• Recurring on the length n-1 part requires (in the
worst case) recurring to depth n-1
Worst case partitioning
Performance of Quicksort
• Worst-case partitioning
– One region has 1 element and one has n – 1 elements
– Maximally unbalanced
• Recurrence
n
1
T(n) = T(n – 1) + T(1) + (n)
= T(n – 1) + (n)
= k=1 to n(k)
= (k=1 to n k )
= (n2)
n-1
1
Running time for worst-case
n
partitions at each recursive level:
T(n) = T(n – 1) + T(0) +
PartitionTime(n)
n
n
n-1
n-2
1
n-2
n-3
1
3
2
1
1
2
(n2)
=
 n 
n    k   1  (n 2 )
 k 1 
Analysis of Quicksort: Best case
• Best-case partitioning
– Partitioning produces two regions of size n/2
• Recurrence
T(n) = 2T(n/2) + (n)
T(n) = (nlgn) (Master theorem)
16
Partitioning at various levels
Analysis of Quicksort
• Balanced partitioning
– Average case is closer to best case than to worst case
– (if partitioning always produces a constant split)
• E.g.: 9-to-1 proportional split
T(n) = T(9n/10) + T(n/10) + n
18
Typical case for quicksort
•
•
•
•
If the array is sorted to begin with, Quicksort is terrible: O(n2)
It is possible to construct other bad cases
However, Quicksort is usually O(n log2n)
The constants are so good that Quicksort is generally the fastest
algorithm known
• Most real-world sorting is done by Quicksort
• Is average-case closer to best-case or worst-case?
Performance of Quicksort
• Average case
– All permutations of the input numbers are equally likely
– On a random input array, we will have a mix of well balanced and
unbalanced splits
– Good and bad splits are randomly distributed throughout the tree
n
1
n-1
(n – 1)/2
combined cost:
2n-1 = (n)
(n – 1)/2
Alternation of a bad
and a good split
n
(n – 1)/2 + 1
combined cost:
n = (n)
(n – 1)/2
Nearly well
balanced split
• Running time of Quicksort when levels alternate
between good and bad splits is O(n lg n)
Picking a better pivot
• Before, we picked the last element of the subarray to use as a
pivot
– If the array is already sorted, this results in O(n2) behavior
– It’s no better if we pick the last element
– Note that an array of identical elements is already sorted!
• We could do an optimal quicksort (guaranteed
O(n log n)) if we always picked a pivot value that exactly cuts
the array in half
– Such a value is called a median: half of the values in the
array are larger, half are smaller
– The easiest way to find the median is to sort the array and
pick the value in the middle (!)
Median of three
• Obviously, it doesn’t make sense to sort the array in order to find
the median to use as a pivot
• Instead, compare just three elements of our (sub)array—the first,
the last, and the middle
– Take the median (middle value) of these three as pivot
– It’s possible (but not easy) to construct cases which will make
this technique O(n2)
• Suppose we rearrange (sort) these three numbers so that the
smallest is in the first position, the largest in the last position, and
the other in the middle
– This lets us simplify and speed up the partition loop
Randomized Algorithms
• The behavior is determined in part by values produced by a randomnumber generator
– RANDOM(a, b) returns an integer r, where a ≤ r ≤ b and each of the ba+1 possible values of r is equally likely
• Algorithm generates randomness in input
• No input can consistently elicit worst case behavior
– Worst case occurs only if we get “unlucky” numbers from the random
number generator
• Randomized PARTITION
Alg.: RANDOMIZEDPARTITION(A, p, r)
i ← RANDOM(p, r)
exchange A[r] ↔ A[i]
return
PARTITION(A, p, r)
Alg. : RANDOMIZED-QUICKSORT(A, p, r)
if p < r
then q ← RANDOMIZED-PARTITION(A, p, r)
RANDOMIZED-QUICKSORT(A, p, q)
RANDOMIZED-QUICKSORT(A, q + 1, r)
Final comments
• Until 2002, quicksort was the fastest known general
sorting algorithm, on average.
• Still the most common sorting algorithm in standard
libraries.
• For optimum speed, the pivot must be chosen
carefully.
– Median of three
– Randomization
• There will be some cases where Quicksort runs in
O(n2) time.