Transcript Slides
Tradeoffs, intuition
analysis,
understanding big-Oh
aka O-notation
Owen Astrachan
[email protected]
http://www.cs.duke.edu/~ola
SIGCSE 2004
1
Analysis
Vocabulary to discuss performance and to
reason about alternative algorithms and
implementations
It’s faster! It’s more elegant! It’s safer! It’s
cooler!
Use mathematics to analyze the algorithm,
Implementation is another matter
cache, compiler optimizations, OS,
memory,…
SIGCSE 2004
2
What do we need?
Need empirical tests and mathematical tools
Compare by running
•30 seconds vs. 3 seconds,
•5 hours vs. 2 minutes
•Two weeks to implement code
We need a vocabulary to discuss tradeoffs
SIGCSE 2004
3
Analyzing Algorithms
Three solutions to online problem sort1
Sort strings, scan looking for runs
Insert into Set, count each unique string
Use map of (String,Integer) to process
We want to discuss trade-offs of these
solutions
Ease to develop, debug, verify
Runtime efficiency
Vocabulary for discussion
SIGCSE 2004
4
What is big-Oh about?
Intuition: avoid details when they don’t
matter, and they don’t matter when input size
(N) is big enough
For polynomials, use only leading term,
ignore coefficients: linear, quadratic
y = 3x
y = x2
SIGCSE 2004
y = 6x-2
y = x2-6x+9
y = 15x + 44
y = 3x2+4x
5
O-notation, family of functions
first family is O(n), the second is O(n2)
Intuition: family of curves, same shape
More formally: O(f(n)) is an upperbound, when n is large enough the
expression cf(n) is larger
Intuition: linear function: double input,
double time, quadratic function: double
input, quadruple the time
SIGCSE 2004
6
Reasoning about algorithms
We have an O(n) algorithm,
For 5,000 elements takes 3.2 seconds
For 10,000 elements takes 6.4 seconds
For 15,000 elements takes ….?
We have an O(n2) algorithm
For 5,000 elements takes 2.4 seconds
For 10,000 elements takes 9.6 seconds
For 15,000 elements takes …?
SIGCSE 2004
7
More on O-notation, big-Oh
Big-Oh hides/obscures some empirical
analysis, but is good for general
description of algorithm
Compare algorithms in the limit
20N hours v. N2 microseconds:
•which is better?
SIGCSE 2004
8
More formal definition
O-notation is an upper-bound, this means
that N is O(N), but it is also O(N2); we try to
provide tight bounds. Formally:
A function g(N) is O(f(N)) if there exist
constants c and n such that g(N) <
cf(N) for all N > n
cf(N)
g(N)
x = n
SIGCSE 2004
9
Big-Oh calculations from code
Search for element in array:
What is complexity (using O-notation)?
If array doubles, what happens to time?
for(int k=0; k < a.length; k++) {
if (a[k].equals(target)) return true;
}
return false;
Best case? Average case? Worst case?
SIGCSE 2004
10
Measures of complexity
Worst case
Good upper-bound on behavior
Never get worse than this
Average case
What does average mean?
Averaged over all inputs? Assuming
uniformly distributed random data?
SIGCSE 2004
11
Some helpful mathematics
1+2+3+4+…+N
N(N+1)/2 = N2/2 + N/2 is O(N2)
N + N + N + …. + N (total of N times)
N*N = N2 which is O(N2)
1 + 2 + 4 + … + 2N
2N+1 – 1 = 2 x 2N – 1 which is O(2N )
SIGCSE 2004
12
106 instructions/sec, runtimes
N
O(log N)
O(N)
O(N log N)
O(N2)
10 0.000003 0.00001
0.000033
0.0001
100 0.000007 0.00010
0.000664
0.1000
1,000 0.000010 0.00100
0.010000
1.0
10,000 0.000013 0.01000
0.132900
1.7 min
100,000 0.000017 0.10000
1.661000
2.78 hr
19.9
11.6 day
18.3 hr
318
centuries
1,000,000 0.000020 1.0
1,000,000,000 0.000030 16.7 min
SIGCSE 2004
13
Multiplying and adding big-Oh
Suppose we do a linear search then we do
another one
What is the complexity?
If we do 100 linear searches?
If we do n searches on an array of size n?
SIGCSE 2004
14
Multiplying and adding
Binary search followed by linear search?
What are big-Oh complexities? Sum?
50 binary searches? N searches?
What is the number of elements in the list
(1,2,2,3,3,3)?
What about (1,2,2, …, n,n,…,n)?
How can we reason about this?
SIGCSE 2004
15
Reasoning about big-O
Given an n-list: (1,2,2,3,3,3, …n,n,…n)
If we remove all n’s, how many left?
If
SIGCSE 2004
we remove all larger than n/2?
16
Reasoning about big-O
Given an n-list: (1,2,2,3,3,3, …n,n,…n)
If we remove every other element?
If
we remove all larger than n/1000?
Remove
SIGCSE 2004
all larger than square root n?
17
Money matters
while (n != 0) {
count++;
n = n/2;
}
Penny on a statement each time executed
What’s the cost?
SIGCSE 2004
18
Money matters
void stuff(int n){
for(int k=0; k < n; k++)
System.out.println(k);
}
while (n != 0) {
stuff(n);
n = n/2;
}
Is a penny always the right cost/statement?
What’s the cost?
SIGCSE 2004
19
Find k-th largest
Given an array of values, find k-th largest
0th largest is the smallest
How can I do this?
Do it the easy way…
Do it the fast way …
SIGCSE 2004
20
Easy way, complexity?
public int find(int[] list,
int index)
{
Arrays.sort(list);
return list[index];
}
SIGCSE 2004
21
Fast way, complexity?
public int find(int[] list,
int index)
{
return findHelper(list,index,
0,
list.length-1);
}
SIGCSE 2004
22
Fast way, complexity?
private int findHelper(int[] list,
int index,
int first,
int last)
{
int lastIndex = first;
int pivot = list[first];
for(int k=first+1; k <= last; k++){
if (list[k] <= pivot){
lastIndex++;
swap(list,lastIndex,k);
}
}
swap(list,lastIndex,first);
SIGCSE 2004
23
Continued…
if (lastIndex == index)
return list[lastIndex];
else if (index < lastIndex )
return
findHelper(list,index,
first,lastIndex-1);
else
return
findHelper(list,index,
lastIndex+1,last);
SIGCSE 2004
24
Recurrences
int length(ListNode list)
{
if (0 == list) return 0;
else return 1+length(list.getNext());
}
What is complexity? justification?
T(n) = time of length for an n-node list
T(n) = T(n-1) + 1
T(0) = O(1)
SIGCSE 2004
25
Recognizing Recurrences
T must be explicitly identified
n is measure of size of
input/parameter
T(n) time to run on an array of size n
T(n) = T(n/2) + O(1)
binary search
O( log n )
T(n) = T(n-1) + O(1)
sequential search O( n )
SIGCSE 2004
26
Recurrences
T(n) = 2T(n/2) + O(1)
tree traversal
O( n )
T(n) = 2T(n/2) + O(n)
quicksort
O( n log n)
T(n) = T(n-1) + O(n)
selection sort
O( n2 )
SIGCSE 2004
27
Compute bn, b is BigInteger
What is complexity using BigInteger.pow?
What method does it use?
How do we analyze it?
Can we do exponentiation ourselves?
Is there a reason to do so?
What techniques will we use?
SIGCSE 2004
28
Correctness and Complexity?
public BigInteger pow(BigInteger b,
int expo)
{
if (expo == 0) {
return BigInteger.ONE;
}
BigInteger half = pow(b,expo/2);
half = half.multiply(half);
if (expo % 2 == 0) return half;
else
return half.multiply(b);
}
SIGCSE 2004
29
Correctness and Complexity?
public BigInteger pow(BigInteger b,
int expo)
{
BigInteger result =BigInteger.ONE;
while (expo != 0){
if (expo % 2 != 0){
result = result.multiply(b);
}
expo = expo/2;
b = b.multiply(b);
}
return result;
}
SIGCSE 2004
30
Solve and Analyze
Given N words (e.g., from a file)
What are 20 most frequently occurring?
What are k most frequently occurring?
Proposals?
Tradeoffs in efficiency
Tradeoffs in implementation
SIGCSE 2004
31