Transcript 03. Sorting

241-423 Advanced Data Structures and Algorithms

Semester 2, 2013-2014

3. Sorting Algorithms

• Objective – examine popular sorting algorithms, with an emphasis on divide and conquer ADSA : Sorting/3 1

Contents

1. Insertion Sort 2. Divide and Conquer Algorithms 3. Merge Sort 4. Quicksort 5. Comparison of Sorting Algorithms 6. Finding the kth Largest Element ADSA : Sorting/3 2

1. Insertion Sort

• Each pass inserts an element (x) into a sorted sublist (sub-array) on the left.

• Items larger than x move to the right to make room for its insertion.

ADSA : Sorting/3 3

ADSA : Sorting/3

Insertion Sort Diagram

4

Outline Algorithm

• • Assume the first array element is in the right position.

• In the ith pass (1 ≤ i ≤ n-1), the elements in the range 0 to i-1 are already sorted.

Insert ith position target into correct position j by moving elements in the range [j, i-1] to the right until there is space in arr[j]. ADSA : Sorting/3 5

Simple Insertion Sort

public static void insertion_srt(int arr[]) { int n = arr.length; for (int i = 1; i < n; i++) { int j = i; int target = arr[i]; // sort ith elem while ((j > 0) && (arr[j-1] > target)){ arr[j] = arr[j-1]; // move right j--; } arr[j] = target; } } ADSA : Sorting/3 6

insertionSort()

public static

>

void insertionSort(

T[]

arr) { int n = arr.length; for (int i = 1; i < n; i++) { int j = i;

T

target = arr[i]; while (j > 0 && target.

compareTo

(arr[j-1]) < 0) { arr[j] = arr[j-1]; j--; } arr[j] = target; } } // end of insertionSort() ADSA : Sorting/3 7

Insertion Sort Efficiency

• Best case running time is O(n) – when the array is already sorted • The worst and average case running times are O(n 2 ).

• Insertion sort is very efficient when the array is "almost sorted".

ADSA : Sorting/3 8

2. Divide and Conquer Algorithms

• Divide a problem into smaller versions of the same problem, using recursion.

• Solve the smaller versions.

• Combine the small versions solutions together to get an answer for the big, original problem.

ADSA : Sorting/3 9

Examples

• • • Binary search Merge sort and quicksort (here) Binary tree traversal ADSA : Sorting/3 10

3. Merge Sort

• Sort an array with n elements by splitting it into two halves. Keep splitting in half recursively.

• Sort the small elements.

• Merge the small elements recursively back together into a single sorted array.

ADSA : Sorting/3 11

Merge Sort Diagram

ADSA : Sorting/3 12

General Sort Methods

• Ford and Topp's Arrays class provides two versions of the merge sort algorithm.

– one version takes an Object array arr[] as input; – the other version is generic and specifies arr[] as an array of type T • Both methods call msort() to carry out the merge sort.

ADSA : Sorting/3 13

sort() - with Object Array

public static void sort(Object[] arr) { // create a temporary array Object[] tempArr = arr.clone();

msort

(arr, tempArr, 0, arr.length); } sort the entire array (the range 0-arr.length) ADSA : Sorting/3 14

sort() - Generic Version

public static > void sort(T[] arr) { // create a temporary array T[] tempArr = (T[])arr.clone();

msort

(arr, tempArr, 0, arr.length); } ADSA : Sorting/3 15

msort()

Split

into two lists by computing the midpoint of the index range: int midpt = (last + first)/2; • • Call msort() recursively on the index range [first, mid) and on the index range [mid, last).

When the resulting lists are small, start

merging

them back together into sorted order.

ADSA : Sorting/3 16

Tracing msort()

split merge 17 ADSA : Sorting/3

msort()

private static void msort(Object[] arr, Object[] tempArr, int first, int last) { // if sublist has more than 1 elem.

if ((first + 1) < last){ int midpt = (last + first)/2;

msort

(arr, tempArr, first, midpt);

msort

(arr, tempArr, midpt, last); // if arr[] is now sorted, finish if (((Comparable)arr[midpt-1]).compareTo

(arr[midpt]) <= 0) return; : ADSA : Sorting/3 18

// indexA scans arr[] in range [first, mid) int indexA = first; // indexB scans arr[] in range [mid, last) int indexB = midpt; int indexC = first; // for merged temp list /* while both sublists are not finished, compare arr[indexA] and arr[indexB]; copy the smaller into the temp list */ while (indexA < midpt && indexB < last) { if (((Comparable)arr[indexA]).compareTo

(arr[indexB]) < 0) { tempArr[indexC] = arr[indexA]; indexA++; } ADSA : Sorting/3 19

} else { tempArr[indexC] = arr[indexB]; indexB++; } indexC++; // copy over what's left of sublist A while (indexA < midpt) { tempArr[indexC] = arr[indexA]; indexA++; indexC++; } : ADSA : Sorting/3 20

// copy over what's left of sublist B while (indexB < last) { tempArr[indexC] = arr[indexB]; indexB++; indexC++; } } // copy temp array back to arr[] for (int i = first; i < last; i++) arr[i] = tempArr[i]; } // end of msort() ADSA : Sorting/3 21

msort() Notes

• Continue only as long as first+1 < last • Do not merge arr if arr[mid-1] < arr[mid] ADSA : Sorting/3 22

Recursion Tree for Merge Sort

ADSA : Sorting/3 23

Efficiency of Merge Sort

• Total number of comparisons = no. of levels * no. of comparisons at a level • • • msort() starts with a list of size n msort() recurses until the sublist size is 1 Each level roughly halves the sublist size: – n, n/2, n/4, ..., 1 – no. of levels = log 2 n (roughly) ADSA : Sorting/3 24

• No. of msort() calls at a level: – at level 0: 1 msort() call – at level 1: 2 calls – at level 2: 4 calls – ...

– at level i: 2 i calls ADSA : Sorting/3 25

• No of comparisons in 1 msort call at a level: – at level 0: a msort() call compares n elements – at level 1: n/2 comparisons – at level 2: n/4 comparisons – ... – at level i: n/2 i elements • Total no. of comparisons at a level: – no. of calls at a level * comparisons in 1 msort()call – 2 i * n/2 i = n ADSA : Sorting/3 26

• Total number of comparisons = no. of levels * no. of comparisons at a level = log 2 n * n • So the worst case running time is = O(n log 2 n) ADSA : Sorting/3 27

4. Quicksort

• Uses a divide-and-conquer strategy like merge sort.

• But, unlike merge sort, quicksort is an

in-place

sorting algorithm – elements are exchanged within the list without the need for temporary lists/arrays – space efficient ADSA : Sorting/3 28

Quicksort Steps

• • Pick an element, called a

pivot

, from the list.

Reorder the list so that all elements which are less than the pivot come before the pivot and so that all elements greater than the pivot come after it ADSA : Sorting/3 29

• Recursively call quicksort on the sublist of lesser elements and the sublist of greater elements.

• The stopping case for the recursion are lists of size zero or one, which are always sorted.

ADSA : Sorting/3 30

Quicksort Diagram

pivot ADSA : Sorting/3 31

Partitioning a List

• The pivot is the element at index – mid = (first + last)/2. • Separate the elements of arr[] into two sublists, S l – and S h . S l contains the elements ≤ pivot – S h contains the elements ≥ pivot (l = low) (h = high) ADSA : Sorting/3 32

• • Exchange arr[first] and arr[mid] Scan the list with index range [first+1, last) – scanUp starts at first+1 and moves up the list, finding elements for S l . – scanDown starts at position last -1 and moves down the list, finding elements for S h .

ADSA : Sorting/3 33

• When arr[scanUp]  arr[scanDown]  pivot and pivot then the two elements are in the wrong sublists.

• Exchange the elements at the two positions and then resume scanning. ADSA : Sorting/3 34

ADSA : Sorting/3 35

ADSA : Sorting/3 36

• scanUp and scanDown move toward each other until they meet or pass one another (scanDown  scanUp).

ADSA : Sorting/3 37

• scanDown is at the place where the pivot should appear – exchange arr[0] and arr[scanDown] to correctly position the pivot ADSA : Sorting/3 38

pivotIndex()

• The method public static > int pivotIndex(T[] arr, int first, int last) takes array arr and index range [first, last) and returns the index of the pivot after partitioning arr[].

ADSA : Sorting/3 39

public static > int pivotIndex(T[] arr, int first, int last) { int mid; // index for the midpoint T pivot; if (first == last) // empty sublist return last; else if (first == (last-1)) // 1-element sublist return first; else { mid = (last + first)/2; pivot = arr[mid]; : ADSA : Sorting/3 40

// exchange pivot and bottom end of range arr[mid] = arr[first]; arr[first] = pivot; int scanUp = first + 1; // scanning indices int scanDown = last - 1; while(true) { /* move up the lower sublist while scanUp is less than or equal to scanDown and the array value is less than pivot */ while ((scanUp <= scanDown) && (arr[scanUp].compareTo(pivot) < 0)) scanUp++; ADSA : Sorting/3 41

/* move down upper sublist while array value is greater than the pivot */ while (pivot.compareTo(arr[scanDown]) < 0) scanDown--; /* if indices are not in their sublists, partition is complete */ if (scanUp >= scanDown) break; // found two elements in wrong sublists; exchange T temp = arr[scanUp]; arr[scanUp] = arr[scanDown]; arr[scanDown] = temp; scanUp++; scanDown--; } : ADSA : Sorting/3 42

// copy pivot to index posn (scanDown) that // partitions the sublists arr[first] = arr[scanDown]; arr[scanDown] = pivot; return scanDown; } } // end of pivotIndex() ADSA : Sorting/3 43

quicksort()

• quicksort() sorts a generic array arr[] by calling qsort() with the index range [0, arr.length).

public static > void quicksort(T[] arr) { qsort(arr, 0, arr.length); } ADSA : Sorting/3 44

qsort()

• Recursively partition the elements in the index range into smaller and smaller sublists, terminating when the size of a list is 0 or 1.

• For efficiency, handle a list of size 2 by comparing the elements and exchanging them if necessary. ADSA : Sorting/3 45

• For larger lists, call pivotIndex() to reorder the elements and determine the pivot. • Make two calls to qsort(): – the first call specifies the index range for the lower sublist – the second call specifies the index range for the upper sublist ADSA : Sorting/3 46

qSort() Diagram

ADSA : Sorting/3 47

private static > void qsort(T[] arr, int first, int last) { // if range is less than two elements if ((last – first) <= 1) return; // if sublist has two elements else if ((last – first) == 2) { : ADSA : Sorting/3 48

/* compare arr[first] and arr[last-1] and exchange if necessary */ if (arr[last-1].compareTo(arr[first]) < 0) { T temp = arr[last-1]; arr[last-1] = arr[first]; arr[first] = temp; } return; } else { int pivotLoc =

pivotIndex

(arr, first, last);

qsort

(arr, first, pivotLoc);

qsort

(arr, pivotLoc +1, last); } } // end of qsort() ADSA : Sorting/3 49

Running Time of Quicksort

• The average case running time is O(n log 2 n).

• The best case occurs when the array is already sorted.

ADSA : Sorting/3 50

• Quicksort is efficient even when the array is in descending order.

ADSA : Sorting/3 51

• The worst-case occurs when the chosen pivot is always the largest or smallest element in its sublist. – the running time is O(n 2 ) – highly unlikely ADSA : Sorting/3 52

5. Comparison of Sorting Algorithms

• An

inversion

in an array, arr[], is an ordered pair (arr[i], arr[j]), i < j, where arr[i] > arr[j]. • When sorting in ascending order, arr[i] and arr[j] are out of order. ADSA : Sorting/3 53

• The O(n 2 ) sorting algorithms compare

adjacent

elements, generally

remove one inversion

with each iteration – e.g. selection and insertion sort • The O(n log 2 n) sorting algorithms compare

non-adjacent

elements, and generally

remove more than one inversion

with each iteration.

– e.g. quicksort and merge sort ADSA : Sorting/3 54

Timing Sorts

import java.util.Random;

import ds.util.Arrays; import ds.time.Timing;

public class TimingSorts { public static void main(String[] args) { final int SIZE = 75000; Integer[] arr1 = new Integer[SIZE], arr2 = new Integer[SIZE], arr3 = new Integer[SIZE]; Random rnd = new Random(); : ADSA : Sorting/3 55

/* load each array with the same sequence of random numbers in the range 0 to 999999 */ int rndNum; for (int i=0; i < SIZE; i++) { rndNum = rnd.nextInt(1000000); arr1[i] = arr2[i] = arr3[i] = rndNum; } // call timeSort() for each sort

timeSort

(arr1, 0, "Merge sort");

timeSort

(arr2, 1, "Quick sort");

timeSort

(arr3, 2, "Insertion sort"); } // end of main() ADSA : Sorting/3 56

public static > void timeSort(T[] arr, int sortType, String sortName) { Timing t = new Timing(); t.start(); if(sortType == 0) Arrays.sort(arr); // merge sort in F&T else if (sortType == 1) Arrays.quicksort(arr); else Arrays.insertionSort(arr); double timeRequired = t.stop();

outputFirst_Last

(arr); System.out.print(" " + sortName + " time is " + timeRequired + "\n\n"); } // end of timeSort() } ADSA : Sorting/3 57

public static void outputFirst_Last(Object[] arr) // output first 3 elements and last 3 elements { for (int i=0; i < 3; i++) System.out.print(arr[i] + " "); System.out.print(". . . "); for (int i=n-3; i < arr.length; i++) System.out.print(arr[i] + " "); } System.out.println(); ADSA : Sorting/3 58

Output

26 38 47 . . . 999980 999984 999984 Merge sort time is 0.109

26 38 47 . . . 999980 999984 999984 Quick sort time is 0.078

26 38 47 . . . 999980 999984 999984 Insertion sort time is 100.611

O(n log O(n 2 ) 2 n) ADSA : Sorting/3 59

6. Finding the k

th

Largest Element

Sort

the array and then access the element at position k. – running time is O(n log 2 n) is we use quicksort or merge sort • For a more efficient solution, locate the position of the k th -largest value by

partitioning

the elements into two sublists. ADSA : Sorting/3 60

values ≤ kth-largest 0 ... k-1 kth-largest k values ≥ kth-largest k+1 ... n-1 • The lower sublist contains k elements that are ≤ the k th -largest.

• The upper sublist contains elements that are ≥ the k th -largest. • The elements in the sublists do not need to be ordered.

ADSA : Sorting/3 61

• Use the pivoting technique from the quicksort algorithm to create a partition.

• The algorithm is recursive: – index = pivotIndex() – If index == k, done, return arr[index]; – otherwise, call pivotIndex() with range [first, index) if k < index, or with range [index+1, last) if k > index.

• ADSA : Sorting/3 examine only one of the lists 62

public static > int findKth(T[] arr, int first, int last, int k) { if (first > last) return -1; // partition range (first, last) in arr about the // pivot arr[index] int index =

pivotIndex

(arr, first, last); // if index == k, we are done. kth largest is arr[k] if (index == k) return arr[index]; // return array value else if(k < index) // search in lower sublist (first, index)

findKth

(arr, first, index, k); else // search in upper sublist (index+1, last)

findKth

(arr, index+1, last, k); } ADSA : Sorting/3 63

Running Time of findKth()

• The running time is O(n) – no of comparisons = n + n/2 + n/4 + n/8 + ...

= 2n • This is faster than the O(n log 2 n) result for a sorted array – this is to be expected since findKth() only uses one of its sublists at each recursive call compared to quicksort or merge sort which use both ADSA : Sorting/3 64