www.scs.ryerson.ca

Download Report

Transcript www.scs.ryerson.ca

Lecture 9: Array Algorithms
Problems on Arrays
• Arrays give our loops something to do
• Useful and interesting problems about a large block of data,
instead of just one integer or decimal number
• Work here as good examples of C loops and arrays
• Of course, these algorithms also important general
knowledge of computer science, independent of the actual
programming language used
Searching for an Element
• Already saw linear search: loop through the indices of the
array, checking each element if that's the one
• Can be sped up with micro-optimizations, but in general, you
•
•
•
•
will have to look at every element before you can say that
the element can't be found
That is the best we can do for unsorted arrays
However, if the array is already sorted, we can find the
element much faster using binary search
Idea the same as looking up a name in a phone book
Use a finger to guesstimate where the name roughly would
be, compare name to what you are looking for, jump back or
forth depending on the result of comparison, eventually
zeroing in the correct name
Binary Search Algorithm
• Binary search is a tricky algorithm to get correct
• Simplify giving up the phone book intuition of one finger
•
•
•
jumping back and forth, and instead use two fingers, one
from the beginning of the array, the other from the end
Idea: ensure that the correct element remains between the
fingers even as they jump towards each other
When the fingers eventually meet somewhere, that's where
the element has to be, if it is anywhere in the array
Still has tricky asymmetries hard to get correct
Binary Search (Incomplete)
int binary_search(int* a, int n, int x) {
int left = 0, right = n-1, mid;
while(left < right) {
mid = (left + right) / 2;
if(a[mid] < x) {
/* Move left to the middle */
}
else {
/* Move right to the middle */
}
}
return x == a[left]; /* Or a[right] */
}
How to Fill in the Gaps?
• Note that the midpoint element index mid, calculated as the
•
•
•
•
•
•
•
integer average of left and right, is at least equal to left, but
strictly less than right
This asymmetry will make the algorithm asymmetric
How far can be jump with left and right in each case?
First of all, we have to jump at least one step: otherwise, the
algorithm gets stuck in an infinite loop
On the other hand, we can't jump too far and accidentally
jump over the element x that we are looking for
Either left = mid; or left = mid + 1;
Either right = mid; or right = mid - 1;
In both branches, only one is correct
Binary Search (Complete)
int binary_search(int* a, int n, int x) {
int left = 0, right = n - 1, mid;
while(left < right) {
mid = (left + right) / 2;
if(a[mid] < x) {
left = mid + 1; /* Jump at least one step */
}
else {
right = mid; /* We could have a[mid] == x */
}
}
return x == a[left]; /* Or a[right], since left == right */
}
Comparing the Search Algorithms
• For a sorted array, binary search can be tremendously faster
than linear search
• In linear search, each comparison eliminates one element
•
•
•
•
from consideration, so we need as many comparisons as
there are elements in the array
In binary search, each comparison eliminates an entire half
of the array from consideration, so the number of
comparisons is logarithmic to the array size
For example, assume a million-element array
log2 106 is roughly 20, since it takes 20 repeated halvings to
get from one million elements to just one
Hitch your wagons to an exponential horse if possible
Binary Search (Recursive Version)
int binary_search_rec(int* a, int n, int x) {
int left = 0, right = n-1, mid;
if(left == right) { return a[left] == x; }
else {
mid = (left + right) / 2;
if(a[mid] < x) { /* Search in right half */
return binary_search_rec(a + mid + 1, n - mid - 1, x);
}
else { /* Search in left half */
return binary_search_rec(a, mid, x);
}
}
}
A Simple Speedup?
• The condition in binary search handles the case a[mid] < x
in one branch, and implicitly a[mid] >= x in the else
• What if we handled equality test explicitly?
• if(a[mid] == x) return 1;
• When modifying any algorithm, ask two questions:
• Does the algorithm still return the correct answer? (Yes.)
• Does the algorithm run faster? (Yes for all the arrays in
•
which the element x is located in the middle, no for all others.
The others massively overwhelm.)
On the other hand, with this change, we can also modify the
second move operation to be right = mid - 1;
Sorting
• Historically, sorting has been an important algorithm on
•
•
•
•
•
•
arrays both in theory and practice
These days very much a solved problem, in that the best
algorithms are pretty much as fast as they can be
Sorting algorithms are a standard teaching tool to discuss
issues in programming and algorithm design and analysis
A comparison sort means that our basic operation is to
compare two elements in the array, and swap them if they
are in the wrong order
Comparison sort works for any type of data that has <
Measure running time in number of comparisons and swaps
with rough order of magnitude
By convention, we sort arrays in ascending order
Bubble Sort
• Perhaps the best known sorting algorithm due to its
whimsical name and simple to understand behaviour
• Repeated passes through the array from left to right
• Along the way in each pass, compare each element to its
•
•
•
predecessor (or successor, depending on implementation)
and swap the two elements if they are in wrong order
Keep doing these passes until one pass does nothing
Inefficient due to redundant comparisons: later passes
compare the same element pairs that the previous passes
already compared and found to be in correct order
Each pass puts the largest remaining element in its place, so
later passes can always be one step shorter
Selection Sort
• To sort an array, loop to find its smallest element; assume
that was in location k
• Move that element to a[0], and as not to lose the element
that was there, move that one to a[k] that just got vacant
• Now the smallest element of the array is where it belongs
• The remaining problem is to sort the rest of the array
• Selection sort can be easily modified to shuffle the elements
•
of the array to a random order so that resulting shuffle is fair,
each of the n! permutations equally likely
Instead of finding the location k so that a[k] is the smallest
element in the remaining array, simply choose k at random
from the set of remaining array indices
Selection Sort Implementation
void selection_sort(int* a, int n) {
int curr, next, min, tmp;
for(curr = 0; curr < n - 1; curr++) {
/* Find the index of the smallest remaining element */
min = curr;
for(next = curr + 1; next < n; next++) {
if(a[next] < a[min]) { min = next; }
}
/* Swap a[curr] and a[min] */
tmp = a[curr]; a[curr] = a[min]; a[min] = tmp;
}
}
Insertion Sort
• The best one of the simple sorting algorithms in that it is both
the fastest and the simplest to implement
• Loop through the locations of the array left to right
• For each location, grab the element originally in it, and move
•
•
it left towards the beginning by swapping it over all the
preceding elements that are larger than it
Stop moving when you reach the left edge of the array, or
when the previous element is not larger than the element
that you are moving
When you have done this to each element in turn, the entire
array has become sorted
Insertion Sort Implementation
void insertion_sort(int* a, int n) {
int curr, prev, tmp;
for(curr = 1; curr < n; curr++) {
prev = curr;
while(prev > 0 && a[prev] < a[prev - 1]) {
tmp = a[prev-1]; a[prev-1] = a[prev]; a[prev] = tmp;
prev--;
}
}
}
Quicksort
• Simple idea: to sort an array, partition the elements into two
subarrays of "small" and "large" elements
• What constitutes "small" and "large" can vary depending on
•
•
the implementation, as long as every "small" element is less
than equal to any "large" element
Typical implementation selects one of the elements as pivot
that determines the dividing line
Recursively sort these two partitions, resulting in the entire
array being sorted
Quicksort Example
• Assume original array [5,1,9,8,4,6,7,3]
• Choosing the first element as the pivot, the small elements
are [5,1,4,3] and the large elements are [9,8,6,7]
• Lucky choice of pivot here produced a nice 50-50 split
• After sorting both partitions, [1,3,4,5,6,7,8,9]
• Had we used 3 as pivot, the partition would have been less
balanced, producing [1,3] and [5,9,8,4,6,7]
• Any choice of pivot works, but choosing a pivot element that
is close to the array median makes this algorithm execute
faster than with the choice of unbalanced pivot
Mergesort
• An older sorting algorithm based on the idea of partitioning
elements in two and sorting partitions recursively
• However, in mergesort initial partitioning is done in the
middle, without any concern of element values
• For example, [5,1,9,8,4,6,7,3] is partitioned into two halves
[5,1,9,8] and [4,6,7,3]
• After sorting these partitions recursively, a linear-time
•
merging step combines two sorted subarrays into one
sorted subarray
Start from beginning of both subarrays, always copy the
smaller element into the result subarray
Shuffling an Array
• Problem: given an n-element array and a random number
generator, shuffle the elements in a random order
• Intuitive but incorrect solution: for some large number of
•
•
•
times, choose two random locations in the array and swap
the elements
Inefficient: needs more random bits than necessary
Incorrect: does not produce every permutation with the
same probability 1 / n!
In fact, if you always choose two different indices, can only
produce half of the possible permutations
Knuth Shuffle
• The correct Knuth shuffle algorithm (modification of
•
•
•
•
•
•
selection sort) requires us to drop the appealing idea of
shuffling so that any element can move at any time
First, choose one of the n indices randomly
Swap that element with the one in the first location
After this, the element you move to the first place is nailed
there and never moves again
Feels counterintuitive, but is perfectly OK
Every element still has the same 1 / n chance of ending up
the first, as should happen in the correct shuffling
Then, shuffle the remaining n - 1 elements
Knuth Shuffle Implementation
void knuth_shuffle(int* a, int n) {
int i, j, tmp;
for(i = 0; i < n - 1; i++) {
j = rand() % (n - i) + i;
tmp = a[i]; a[i] = a[j]; a[j] = tmp;
}
}
/* For the correctness, it is necessary for j to possibly be
equal to i, so that any element can remain in its original
place, which should happen with prob. 1/n */
More on Knuth Shuffle
• The previous algorithm is trivial to modify to solve the
•
•
•
•
important problem "Choose a subset of k elements from the
given array of n elements, without repetitions"
Each possible permutation of n elements is produced with
the same probability 1 / n! as it should be
Modifying the algorithm so that j is chosen over the entire
array of n elements, instead of the n - i remaining elements,
breaks this correctness
There are now nn possible execution paths instead of n!
Since nn cannot be divisible by n!, different permutations end
up having a different probability
Two-Dimensional Arrays
• A scalar is just one single value, so you need no indices to
access that value
• An array is one-dimensional, so you need one index to
specify which location you want to access
• An array can be a two-dimensional grid of elements, so
now you need two indices to specify the location
• Three-dimensional array needs three indices, etc.
• In principle, there is no upper limit to how many dimensions
•
a multidimensional C array could have, but every compiler
and hardware imposes some limit (and not just to this, but to
everything else)
In practice, we never see dimensions more than 3
Two-Dimensional Array Example
#include <stdio.h>
int main() {
int row, col, a[10][10];
for(row = 0; row < 10; row++) {
for(col = 0; col < 10; col++) {
a[row][col] = (row + 1) * (col + 1);
}
}
printf("%d\n", a[6][4]); /* 35 */
return 0;
}
2D Array as Function Parameter
void output_array(int a[][10], int rows) {
int row, col;
for(row = 0; row < rows; row++) {
for(col = 0; col < 10; col++) {
printf("%7d ", a[row][col]);
}
printf("\n");
}
}
Row, Row, Row Your Boat...
• In the previous example, why did we have to hardcode the
•
•
•
second dimension to be 10, instead of making the function
more general and allowing it to take in arrays with an
arbitrary number of not just rows but also columns?
To be able to index a 2D array, the compiler must know the
number of columns in each row
Memory address of the element a[row][col] depends on the
length of the rows that precede the row
In a 1D array, the memory address of the element a[i]
depends only on the start address a and the element
index i (implicitly times sizeof one element), but not on how
many elements the entire array contains
Arrays of Arrays
• In a 2D array, each row must have the same length, because
•
•
•
•
rows are stored in the memory consecutively and we want
indexing to be a constant-time operation
Another way to simulate a 2D array is to declare a 1D array
of pointers, each of which points to an ordinary 1D array
(that is, to its first element) containing that row
We can write functions for such arrays of any number of
rows and columns, not just any number of rows
Fortunately, the array indexing syntax a[row][col] works also
for our arrays of arrays
This is *(a[row] + col) which is *( *(a + row) + col)), and
since these formulas use pointers to 1D arrays, this is
perfectly ordinary pointer arithmetic
Example: Creating an Array of Arrays
int** create_array(int rows, int cols) {
int row, col;
int** a;
a = malloc(sizeof(int*) * rows);
for(row = 0; row < rows; row++) {
a[row] = calloc(cols, sizeof(int));
}
return a;
}
Releasing an Array of Arrays
/* We don't need to know the number of columns in each row
to be able to release that row using free. */
void release_array(int** a, int rows) {
int row;
for(row = 0; row < rows; row++) {
free(a[row]); /* Release the row */
}
free(a); /* Now we can release the row pointer array */
}
Example: Using an Array of Arrays
void array_demo() {
int rows = 10, cols = 10, row, col;
int** a = create_array(rows, cols);
for(row = 0; row < rows; row++) {
for(col = 0; col < cols; col++) {
a[row][col] = (row + 1) * (col + 1);
}
}
printf("%d\n", a[5][6]); /* 42 */
release_array(a, rows);
}
Data That Knows Itself
• In the previous example, the user code of these arrays didn't
•
•
•
•
need to use malloc or free, but the allocation details were
encapsulated in create_array and release_array
What the user code can't do, the user code also can't do
wrong, causing memory leaks and dangling references
The data type still isn't perfectly opaque, though
The user code still has to remember rows and cols
With techniques that we encounter next week, we can create
a proper data type that would remember its own numbers of
rows and cols, so that the user code does not need to
remember them
Array of Strings
• Q: If a string is an array of characters terminated with null
character, what is an array of strings then?
• A: It is a two-dimensional array of characters, where each
row contains one null-terminated string
• Could be implemented either as a real 2D array, or as an
array of char* elements pointing to individual strings
• Again, in the second technique, the individual rows don't
need to be of the same length, especially if you allocate
these rows individually with malloc
Array vs Pointer Revisited
• How is it different to say char* p = "Jimmy"; versus saying
char a[6] = "Jimmy";
• First, p is a pointer and can be reassigned to point some
place else, whereas an array a can never be reassigned
• You can freely modify the contents of array a at runtime
•
(within its size), but whether you can modify contents of
array that p points to, depends on compiler and its settings
Compiler allocates string literals in your program at compile
time, and since they are not dynamically allocated with
malloc, you also should not (and must not) try to free them
Creating an Array of Strings
void names_2D_array() {
char[4][10] names =
{ "Tommy", "Jimmy", "Jane", "Robert" };
int i;
for(i = 0; i < 4; i++) {
printf("%s\n", names[i]);
}
}
Allocating an Array of Strings
void names_array_of_arrays() {
char** names; int i;
names = malloc(4 * sizeof(char*));
names[0] = "Tommy"; names[1] = "Jimmy";
names[2] = "Jane"; names[3] = "Robert";
names[0] = "Wilfred von Ballenberg III"; /* ok */
for(i = 0; i < 4; i++) {
printf("%s\n", names[i]);
}
free(names);
}