Transcript Slide 1

Instructor
Neelima Gupta
[email protected]
Expected Running Times and
Randomized Algorithms
Instructor
Neelima Gupta
[email protected]
Expected Running Time of Insertion
Sort
(at rth position)
x1,x2,........., xi-1,xi,.......…,xn
For I = 2 to n
Insert the ith element xi in the partially
sorted list x1,x2,........., xi-1.
Expected Running Time of Insertion
Sort
 Let Xi be the random variable which represents
the number of comparisons required to insert ith
element of the input array in the sorted sub array
of first i-1 elements.
 Xi : can take values 1…i-1 (denoted by
xi1,xi2,..................…,xii)
E(Xi) = Σj xijp(xij )
where E(Xi) is the expected value Xi
And, p(xij) is the probability of inserting xi in the jth
position 1≤j≤i
Expected Running Time of Insertion
Sort
(at jth position)
x1,x2,........., xi-1,xi,.......…,xn
How many comparisons it makes to insert
ith element in jth position?
 Position
i
i-1
i-2
.
.
.
2
1
# of Comparisions
1
2
3
.
.
.
i-1
i-1
Note: Here, both position 2 and 1 have # of Comparisions equal to i-1. Why?
Because to insert element at position 2 we have to compare with previously
first element. and after that comparison we know which of them come first
and which at second.
Thus, E(Xi) = (1/i)
{ i-1Σk=1k + (i-1) }
where 1/i is the probability to insert at jth
position in the i possible positions.
For n elements,
E(X1 + X2 + .............+Xn)
= nΣi=2 E(Xi)
= nΣ
i=2 (1/i)
{ i-1Σk=1k + (i-1)
}
= (n-1)(n-4)/4
For n number of elements, expected time taken is,
T = nΣi=2 (1/i)
{ i-1Σk=1k + (i-1) }
where 1/i is the probability to insert at rth
position in the i possible positions.
E(X1 + X2 + .............+Xn) = nΣi=1 E(Xi)
Where,Xi is expected value of inserting Xi element.
T = (n-1)(n-4)/4
Therefore average case of insertion sort takes
Θ(n2)
Quick-Sort



Pick the first item from the array--call it the pivot
Partition the items in the array around the pivot so all elements
to the left are  to the pivot and all elements to the right are
greater than the pivot
Use recursion to sort the two partitions
partition 1: items  pivot
pivot
partition: items > pivot
Quicksort: Expected number
of comparisons
 Partition may generate splits
(0:n-1, 1:n-2, 2:n-3, … , n-2:1, n-1:0)
each with probability 1/n
 If T(n) is the expected running time,
n-1
1
T ( n) = å[T ( k) + T ( n -1- k )] + Q( n)
n k= 0
Randomized Quick-Sort



Pick an element from the array--call it the pivot
Partition the items in the array around the pivot so all elements
to the left are  to the pivot and all elements to the right are
greater than the pivot
Use recursion to sort the two partitions
partition 1: items  pivot
pivot
partition: items > pivot
Remarks
 Not much different from the Q-sort except that
earlier, the algorithm was deterministic and the
bounds were probabilistic.
 Here the algorithm is also randomized. We pick an
element to be a pivot randomly. Notice that there
isn’t any difference as to how does the algorithm
behave there onwards?
 In the earlier case, we can identify the worst case
input. Here no input is worst case.
Randomized Select
1 n1
T n    max{T k , T n  1  k }  n 
n k 0
Randomized Algorithms
 A randomized algorithm performs coin tosses (i.e.,
uses random bits) to control its execution
 i ← random()
if i = 0
do A …
else { i.e. i = 1}
do B …
 Its running time depends on the outcomes of the coin
tosses
Assumptions
 coins are unbiased, and
 coin tosses are independent
 The worst-case running time of a randomized
algorithm may be large but occurs with very low
probability (e.g., it occurs when all the coin
tosses give “heads”)
Monte Carlo Algorithms
 Running times are guaranteed but the output may
not be completely correct.
 Probability of error is low.
Las Vegas Algorithms
 Output is guaranteed to be correct.
 Bounds on running times hold with high probability.
 What type of algorithm is Randomized Qsort?
Why expected running
times?
 Markov’s inequality
P( X > k E(X)) < 1/k
i.e. the probability that the algorithm will take more
than O(2 E(X)) time is less than 1/2.
Or the probability that the algorithm will take more than
O(10 E(X)) time is less than 1/10.
This is the reason why Qsort does well in practice.
Markov’s Bound
P(X<kM)< 1/k ,
where k is a constant.
Chernouff’s Bound
P(X>2μ)< ½
A More Stronger Result
P(X>k μ )< 1/nk,
where k is a constant.
Binary Search Tree

What is a binary search tree?

A BST is a possibly empty rooted tree with a key value, a
possible empty left subtree and a possible empty right subtree.
Each of the left subtree and the right subtree is a BST.

Binary Search Tree



Pick the first item from the array--call it the pivot…it becomes
the root of the BST.
Partition the items in the array around the pivot so that all
elements to the left are  the pivot and all elements to the right
are greater than the pivot
Recursively Build a BST on each partition. They become the
left and the right sub-tree of the root.
Binary Search Tree
Consider the following input:
1,2,3 …………………10,000.
What is the time for construction?
Search Time?
Randomly Built Binary
Search Tree



Pick an item from the array randomly --call it the pivot…it
becomes the root of the BST.
Partition the items in the array around the pivot so that all
elements to the left are  the pivot and all elements to the right
are greater than the pivot
Recursively Build a BST on each partition. They become the
left and the right sub-tree of the root.
Example
 Consider the input
10, 20, 30, 40, 50, 60, 70, 80, 90, 100.
Height of the RBST
WLOG, assume that the keys are distinct. (What if
they are not?)
 Rank(x) = number of elements < x
 Let Xi : height of the tree rooted at a node with
rank=i.
 Let Yi : exponential height of the tree=2^Xi
 Let H : height of the entire BST, then
H=max{H1,H2} + 1
where H1 : ht. of left subtree
H2 : ht.of right subtree
 Y=2^H
=2.max{2^H1,2^H2}
 E(EH(T(n))): Expected value of exponential ht. of the
tree with ‘n’ nodes.
 E(EH(T(n)))
=2/n ∑ max{EH(T(k)),EH(T(n-1-k))}
=O(n^3)
 E(H(T(n))) =E(log (EH(T(n)))) = O(log n)
 Construction Time?
 Search Time?
 What is the worst case input?
Acknowledgements
 Kunal Verma
 Nidhi Aggarwal
 And other students of MSc(CS) batch 2009.
Hashing
 Motivation: symbol tables
 A compiler uses a symbol table to relate symbols to
associated data


Symbols: variable names, procedure names, etc.
Associated data: memory location, call graph, etc.
 For a symbol table (also called a dictionary), we care
about search, insertion, and deletion
 We typically don’t care about sorted order
Hash Tables
 More formally:
 Given a table T and a record x, with key (= symbol)
and satellite data, we need to support:



Insert (T, x)
Delete (T, x)
Search(T, x)
 We want these to be fast, but don’t care about sorting
the records
 The structure we will use is a hash table
 Supports all the above in O(1) expected time!
Hash Functions
 Next problem: collision
U
(universe of keys)
k2
0
h(k1)
h(k4)
k1
k4
K
(actual
keys)
T
k5
h(k2) = h(k5)
k3
h(k3)
m-1
Resolving Collisions
 How can we solve the problem of collisions?
 One of the solution is : chaining
 Other solutions: open addressing
Chaining
 Chaining puts elements that hash to the same slot in
a linked list:
U
(universe of keys)
k4
K
k5
(actual
k7
keys)
k6
k8
k1
k4 ——
k5
k2
——
——
——
k1
k2
T
——
——
k3
k3 ——
k8
——
k6 ——
k7 ——
Chaining
 How do we insert an element?
U
(universe of keys)
k4
K
k5
(actual
k7
keys)
k6
k8
k1
k4 ——
k5
k2
——
——
——
k1
k2
T
——
——
k3
k3 ——
k8
——
k6 ——
k7 ——
Chaining
 How do we delete an element?
U
(universe of keys)
k4
K
k5
(actual
k7
keys)
k6
k8
k1
k4 ——
k5
k2
——
——
——
k1
k2
T
——
——
k3
k3 ——
k8
——
k6 ——
k7 ——
Chaining
 How do we search for a element with a
given key?
U
(universe of keys)
k4
K
k5
(actual
k7
keys)
k6
k8
k1
k4 ——
k5
k2
——
——
——
k1
k2
T
——
——
k3
k3 ——
k8
——
k6 ——
k7 ——
Analysis of Chaining
 Assume simple uniform hashing: each key in table is
equally likely to be hashed to any slot
 Given n keys and m slots in the table: the
load factor  = n/m = average # keys per slot
 What will be the average cost of an unsuccessful
search for a key?
Analysis of Chaining
 Assume simple uniform hashing: each key in table is
equally likely to be hashed to any slot
 Given n keys and m slots in the table, the
load factor  = n/m = average # keys per slot
 What will be the average cost of an unsuccessful
search for a key? A: O(1+)
Analysis of Chaining
 Assume simple uniform hashing: each key in
table is equally likely to be hashed to any slot
 Given n keys and m slots in the table, the
load factor  = n/m = average # keys per slot
 What will be the average cost of an
unsuccessful search for a key? A: O(1+)
 What will be the average cost of a successful
search?
Analysis of Chaining
 Assume simple uniform hashing: each key in
table is equally likely to be hashed to any slot
 Given n keys and m slots in the table, the
load factor  = n/m = average # keys per slot
 What will be the average cost of an
unsuccessful search for a key? A: O(1+)
 What will be the average cost of a successful
search? A: O((1 + )/2) = O(1 + )
Analysis of Chaining
Continued
 So the cost of searching = O(1 + )
 If the number of keys n is proportional to the number
of slots in the table, what is ?
 A:  = O(1)
 In other words, we can make the expected cost of
searching constant if we make  constant
If we could prove this,
P(failure)<1/k
(we are sort of happy)
P(failure)<1/nk
(most of times this is
true and we’re happy )
P(failure)<1/2n
(this is difficult but still
we want this)
Acknowledgements
 Kunal Verma
 Nidhi Aggarwal
 And other students of MSc(CS) batch 2009.
END