Iterative Aggregation/Disaggregation(IAD)

Download Report

Transcript Iterative Aggregation/Disaggregation(IAD)

Iterative
Aggregation/Disaggregation(IAD)
By: Jesse Ehlert
Dustin Wells
Li Zhang
Introduction
 What are we trying to do?
 We are trying to find a more efficient way than the power method to
compute the pagerank vector.
 How are we going to do this?
 We are going to use an IAD from the theory of Markov Chains to
compute the pagerank vector.
 We are going to apply the power method to
Markov Chains
 We will represent the web by a Markov chain.
 Markov chain is a stochastic process describing a chain of
events.
 Consist of a set of states S = {s1, … , sn}
 Web pages will be the states
 Probability to move from state si to state sj in one step is pij.
 We can represent this by a stochastic matrix with entries pij
 Probabilistic vector v is a stationary distribution if:
vT = vT G
 This means that the PageRank vector is also a stationary
distribution vector of the Markov chain represented by the
matrix G
Aggregation/Disaggregation Approach
 Main idea to compute the pagerank vector v is to block the
matrix G so the size of the problem is reduced to about the
size of one of the diagonal blocks.
 In fact (I – G11) is non singular. Then we define:
and S to be
Aggregation/Disaggregation Approach
Cont.
 From the previous slide we can show that
I – G = LDU
 Thus, because U is nonsingular, we have:

From the last equation, we can get
v2T = v2TS
which implies that V2 is a
stationary distribution of S.
 If u2 is the unique stationary distribution of S with
then we have:
Aggregation/Disaggregation Approach
Cont.
We need to find an expression for v1
 Let A be the aggregated matrix associated to G, defined as:
 What we want to do now is find the stationary distribution of A.
 From vTLD = 0, we can get:
v1T(I – G11) – v2TG21 = 0
 If we rearrange things, we can get
Aggregation/Disaggregation Approach
Cont.
 From v2T = v2TS, we also have:
 From the previous three statements we can get an expression
for v1.
Theorem 3.20
(Exact aggregation/disaggregation)
 Theorem 3.20
Theorem 3.20 Cont.
 Instead of finding the stationary distribution of G, we have
broken it down to find the stationary distribution of two
smaller matrices.
 Problem Forming the matrix S and computing its stationary distribution
u2 is very expensive and not very efficient.
 Solution:
 Use an approximation
 This leads us to Approximate Aggregation Matrix
Approximate Aggregation Matrix
 We now define the approximate aggregation matrix as:
 The only difference between this matrix and the previous
aggregation matrix is the last row where we use an arbitrary
probabilistic vector that plays the role of the exact
stationary distribution u2.
 In general this approach does not give a very good
approximation to the stationary distribution of the original
matrix G.
 To improve the accuracy, we add a power method step.
Approximate Aggregation Matrix
 Typically, we will have
so that the actual algorithm to be
implemented consists of repeated applications of the algorithm
above.
 This gives us an iterative aggregation/disaggregation algorithm (IAD)
Iterative Aggregation/Disaggregation
Algorithm (IAD) using Power Method
• As you can see from above, we still need to compute
, the stationary distribution of
IAD Cont.
 First, we write
so that we get rid of G22
 We then let
 From
we have:
IAD Cont.
 Now we will try to get some sparsity out of G. We will write
G like we did before:
G = αH + αauT + (1 − α)euT .
 From the blocking of G, we will block the matrices H, auT
and euT
for some matrices A, B, C, D, E, F, J, K.
 From here you can see
IAD Cont.
 We now take G11, G12 and G21 and plug them into
 We get:
 For the iterative process of power method within IAD, we
give an arbitrary initial guess
and iterate according to
the formulas above for the next approximation
until
our tolerance is reached.
Combine Linear Systems and IAD
 Before, we had
 This can be written as
Combine Linear Systems and IAD Cont.

 The problem with this is the matrices G11, G12 and G21 are
full matrices which means the computations at each step are
generally very expensive
Combine Linear Systems and IAD Cont.
 We will return to the original matrix H to find some sparsity.
 From this equation,
we can look at G11 in more depth to get:
We will use the fact that
Note: we used
Note: we used
to simplify the equation
to get:
Using Dangling Nodes
 We can reorder H by dangling nodes so that H21 is a matrix
of zeros
 Then our equation from before reduces to:
 We approximate as:
 We can show that:
Linear Systems and IAD Process
Combined
 Now, we combine ideas from IAD and linear systems, with H
arranged by dangling nodes, to get the process below:
Conclusion
 Instead of finding the stationary distribution of G directly, we
have broken it down to find the stationary distribution of
smaller matrices, S and A, which gives us the stationary
distribution of G
 The problem with this is that it was very inefficient.
 So we found the approximation of the stationary distribution
and used power method techniques to improve accuracy.
 Then we used linear systems along with our iterative
aggregation/disaggregation algorithm to find another
solution to the pagerank vector.