Transcript Slide 1
Instructor
Neelima Gupta
[email protected]
Expected Running Times and
Randomized Algorithms
Instructor
Neelima Gupta
[email protected]
Expected Running Time of Insertion
Sort
(at rth position)
x1,x2,........., xi-1,xi,.......…,xn
For I = 2 to n
Insert the ith element xi in the partially
sorted list x1,x2,........., xi-1.
Expected Running Time of Insertion
Sort
Let Xi be the random variable which represents
the number of comparisons required to insert ith
element of the input array in the sorted sub array
of first i-1 elements.
Xi : can take values 1…i-1 (denoted by
xi1,xi2,..................…,xii)
E(Xi) = Σj xijp(xij )
where E(Xi) is the expected value Xi
And, p(xij) is the probability of inserting xi in the jth
position 1≤j≤i
Expected Running Time of Insertion
Sort
(at jth position)
x1,x2,........., xi-1,xi,.......…,xn
How many comparisons it makes to insert
ith element in jth position?
Position
i
i-1
i-2
.
.
.
2
1
# of Comparisions
1
2
3
.
.
.
i-1
i-1
Note: Here, both position 2 and 1 have # of Comparisions equal to i-1. Why?
Because to insert element at position 2 we have to compare with previously
first element. and after that comparison we know which of them come first
and which at second.
Thus, E(Xi) = (1/i)
{ i-1Σk=1k + (i-1) }
where 1/i is the probability to insert at jth
position in the i possible positions.
For n elements,
E(X1 + X2 + .............+Xn)
= nΣi=2 E(Xi)
= nΣ
i=2 (1/i)
{ i-1Σk=1k + (i-1)
}
= (n-1)(n-4)/4
For n number of elements, expected time taken is,
T = nΣi=2 (1/i)
{ i-1Σk=1k + (i-1) }
where 1/i is the probability to insert at rth
position in the i possible positions.
E(X1 + X2 + .............+Xn) = nΣi=1 E(Xi)
Where,Xi is expected value of inserting Xi element.
T = (n-1)(n-4)/4
Therefore average case of insertion sort takes
Θ(n2)
Quick-Sort
Pick the first item from the array--call it the pivot
Partition the items in the array around the pivot so all elements
to the left are to the pivot and all elements to the right are
greater than the pivot
Use recursion to sort the two partitions
partition 1: items pivot
pivot
partition: items > pivot
Quicksort: Expected number
of comparisons
Partition may generate splits
(0:n-1, 1:n-2, 2:n-3, … , n-2:1, n-1:0)
each with probability 1/n
If T(n) is the expected running time,
n-1
1
T ( n) = å[T ( k) + T ( n -1- k )] + Q( n)
n k= 0
Randomized Quick-Sort
Pick an element from the array--call it the pivot
Partition the items in the array around the pivot so all elements
to the left are to the pivot and all elements to the right are
greater than the pivot
Use recursion to sort the two partitions
partition 1: items pivot
pivot
partition: items > pivot
Remarks
Not much different from the Q-sort except that
earlier, the algorithm was deterministic and the
bounds were probabilistic.
Here the algorithm is also randomized. We pick an
element to be a pivot randomly. Notice that there
isn’t any difference as to how does the algorithm
behave there onwards?
In the earlier case, we can identify the worst case
input. Here no input is worst case.
Randomized Select
1 n1
T n max{T k , T n 1 k } n
n k 0
Randomized Algorithms
A randomized algorithm performs coin tosses (i.e.,
uses random bits) to control its execution
i ← random()
if i = 0
do A …
else { i.e. i = 1}
do B …
Its running time depends on the outcomes of the coin
tosses
Assumptions
coins are unbiased, and
coin tosses are independent
The worst-case running time of a randomized
algorithm may be large but occurs with very low
probability (e.g., it occurs when all the coin
tosses give “heads”)
Monte Carlo Algorithms
Running times are guaranteed but the output may
not be completely correct.
Probability of error is low.
Las Vegas Algorithms
Output is guaranteed to be correct.
Bounds on running times hold with high probability.
What type of algorithm is Randomized Qsort?
Why expected running
times?
Markov’s inequality
P( X > k E(X)) < 1/k
i.e. the probability that the algorithm will take more
than O(2 E(X)) time is less than 1/2.
Or the probability that the algorithm will take more than
O(10 E(X)) time is less than 1/10.
This is the reason why Qsort does well in practice.
Markov’s Bound
P(X<kM)< 1/k ,
where k is a constant.
Chernouff’s Bound
P(X>2μ)< ½
A More Stronger Result
P(X>k μ )< 1/nk,
where k is a constant.
Binary Search Tree
What is a binary search tree?
A BST is a possibly empty rooted tree with a key value, a
possible empty left subtree and a possible empty right subtree.
Each of the left subtree and the right subtree is a BST.
Binary Search Tree
Pick the first item from the array--call it the pivot…it becomes
the root of the BST.
Partition the items in the array around the pivot so that all
elements to the left are the pivot and all elements to the right
are greater than the pivot
Recursively Build a BST on each partition. They become the
left and the right sub-tree of the root.
Binary Search Tree
Consider the following input:
1,2,3 …………………10,000.
What is the time for construction?
Search Time?
Randomly Built Binary
Search Tree
Pick an item from the array randomly --call it the pivot…it
becomes the root of the BST.
Partition the items in the array around the pivot so that all
elements to the left are the pivot and all elements to the right
are greater than the pivot
Recursively Build a BST on each partition. They become the
left and the right sub-tree of the root.
Example
Consider the input
10, 20, 30, 40, 50, 60, 70, 80, 90, 100.
Height of the RBST
WLOG, assume that the keys are distinct. (What if
they are not?)
Rank(x) = number of elements < x
Let Xi : height of the tree rooted at a node with
rank=i.
Let Yi : exponential height of the tree=2^Xi
Let H : height of the entire BST, then
H=max{H1,H2} + 1
where H1 : ht. of left subtree
H2 : ht.of right subtree
Y=2^H
=2.max{2^H1,2^H2}
E(EH(T(n))): Expected value of exponential ht. of the
tree with ‘n’ nodes.
E(EH(T(n)))
=2/n ∑ max{EH(T(k)),EH(T(n-1-k))}
=O(n^3)
E(H(T(n))) =E(log (EH(T(n)))) = O(log n)
Construction Time?
Search Time?
What is the worst case input?
Acknowledgements
Kunal Verma
Nidhi Aggarwal
And other students of MSc(CS) batch 2009.
Hashing
Motivation: symbol tables
A compiler uses a symbol table to relate symbols to
associated data
Symbols: variable names, procedure names, etc.
Associated data: memory location, call graph, etc.
For a symbol table (also called a dictionary), we care
about search, insertion, and deletion
We typically don’t care about sorted order
Hash Tables
More formally:
Given a table T and a record x, with key (= symbol)
and satellite data, we need to support:
Insert (T, x)
Delete (T, x)
Search(T, x)
We want these to be fast, but don’t care about sorting
the records
The structure we will use is a hash table
Supports all the above in O(1) expected time!
Hash Functions
Next problem: collision
U
(universe of keys)
k2
0
h(k1)
h(k4)
k1
k4
K
(actual
keys)
T
k5
h(k2) = h(k5)
k3
h(k3)
m-1
Resolving Collisions
How can we solve the problem of collisions?
One of the solution is : chaining
Other solutions: open addressing
Chaining
Chaining puts elements that hash to the same slot in
a linked list:
U
(universe of keys)
k4
K
k5
(actual
k7
keys)
k6
k8
k1
k4 ——
k5
k2
——
——
——
k1
k2
T
——
——
k3
k3 ——
k8
——
k6 ——
k7 ——
Chaining
How do we insert an element?
U
(universe of keys)
k4
K
k5
(actual
k7
keys)
k6
k8
k1
k4 ——
k5
k2
——
——
——
k1
k2
T
——
——
k3
k3 ——
k8
——
k6 ——
k7 ——
Chaining
How do we delete an element?
U
(universe of keys)
k4
K
k5
(actual
k7
keys)
k6
k8
k1
k4 ——
k5
k2
——
——
——
k1
k2
T
——
——
k3
k3 ——
k8
——
k6 ——
k7 ——
Chaining
How do we search for a element with a
given key?
U
(universe of keys)
k4
K
k5
(actual
k7
keys)
k6
k8
k1
k4 ——
k5
k2
——
——
——
k1
k2
T
——
——
k3
k3 ——
k8
——
k6 ——
k7 ——
Analysis of Chaining
Assume simple uniform hashing: each key in table is
equally likely to be hashed to any slot
Given n keys and m slots in the table: the
load factor = n/m = average # keys per slot
What will be the average cost of an unsuccessful
search for a key?
Analysis of Chaining
Assume simple uniform hashing: each key in table is
equally likely to be hashed to any slot
Given n keys and m slots in the table, the
load factor = n/m = average # keys per slot
What will be the average cost of an unsuccessful
search for a key? A: O(1+)
Analysis of Chaining
Assume simple uniform hashing: each key in
table is equally likely to be hashed to any slot
Given n keys and m slots in the table, the
load factor = n/m = average # keys per slot
What will be the average cost of an
unsuccessful search for a key? A: O(1+)
What will be the average cost of a successful
search?
Analysis of Chaining
Assume simple uniform hashing: each key in
table is equally likely to be hashed to any slot
Given n keys and m slots in the table, the
load factor = n/m = average # keys per slot
What will be the average cost of an
unsuccessful search for a key? A: O(1+)
What will be the average cost of a successful
search? A: O((1 + )/2) = O(1 + )
Analysis of Chaining
Continued
So the cost of searching = O(1 + )
If the number of keys n is proportional to the number
of slots in the table, what is ?
A: = O(1)
In other words, we can make the expected cost of
searching constant if we make constant
If we could prove this,
P(failure)<1/k
(we are sort of happy)
P(failure)<1/nk
(most of times this is
true and we’re happy )
P(failure)<1/2n
(this is difficult but still
we want this)
Acknowledgements
Kunal Verma
Nidhi Aggarwal
And other students of MSc(CS) batch 2009.
END