PageRank & Random Walk

Download Report

Transcript PageRank & Random Walk

PageRank
&
Random Walk
“The important of a Web page is depends on
the readers interest, knowledge and attitudes…”
–By Larry Page, Co-Founder of Google
Presentation Outline
1.
2.
3.
4.
5.
6.
Introduction on PageRank
Calculation of PageRank on Webpage
Original algorithm on PageRank
Modifications of the original algorithm
Result on the modifications
Applications of PageRank
Introduction on PageRank
• PageRank is a link analysis algorithm … with the
purpose of "measuring" its (Webpage) relative
importance within the set.
– From Wikipedia, the free encyclopedia
• Developed by Larry Page as his PhD research topic
• 3 years later, he quitted Stanford and founded
Google with Brin
• He lost his PhD qualification. In return, his net worth
now is …
Introduction on PageRank
• PageRank = Importance of the Webpage
• Concept is simple:
20
20
Bloomberg
20
PageRank=60
10
PageRank=
20+10 = 30
Introduction on PageRank
An example of Webpage system
Calculation of PageRank on Webpage
A
B
C
D
Calculation of PageRank on Webpage
• R(.) = PageRank of a Webpage
1. R(A) = 100%R(B) + 50%R(C)
2. R(B) = 50%R(C) + 100%R(D)
3. R(C) = 100%R(A)
(
1
-1
-0.5
0
0
1
-0.5
-1
-1
0
1
0
A
)( ) = ( )
B
C
D
0
0
0
Calculation of PageRank on Webpage
• Let A = 50, then, B = 25, C = 50 and D = 0
• Normalize the PageRank by dividing the number by
100. (A+B+C+D = 50+25+50+0)
• Therefore,
–
–
–
–
A = 0.5
B = 0.25
C = 0.5
D=0
• In general:
Calculation of PageRank on Webpage
• There are 2 PROBLEMS !!!
– Problem 1:
What if there are over 280,000 Web-pages, over 3 millions
hyperlinks and the?
– Problem 2:
The PageRank of D = 0
It will be a bias
Rank Sink may appear
Original algorithm on PageRank
• In order to tackle the 2 problems, an calculation
algorithm was introduced:
Where:
c - Normalization factor
N - No. of links on the page v
E - A factor to tackle rank sink
Original algorithm on PageRank
• Multiply Rk by matrix, A, to form Rk+1 (i.e. ARk = Rk+1)
• A is a square matrix.
– Au,v = 1/Nu if there is an edge from u to v.
– Au,v = 0 if there is no edge from u to v.
• R = cAR, where c is the eigenvalue, and R is the
eigenvector
• We can treat c=1/normalization factor and R is the
PageRank vector
Original algorithm on PageRank
• The algorithm is:
Modifications of the original algorithm
• The run time of the original algorithm is not efficient
• Because the Web-page with low PageRank converge
faster while the one with high rank spend more time
to converge
Modifications of the original algorithm
Modifications of the original algorithm
• Modification 1 – Main concept: For the Webpage which PageRank is
converged already, we could ignore them
– Therefore we separate the matrix and vector into 2 parts
– N = not yet converge; C = converged
Modifications of the original algorithm
• Modification 1 –
Modifications of the original algorithm
• Modification 2 – Disadvantage on modification 1: the reordering cost of
matrix A is expensive
– Set AC be 0
Modifications of the original algorithm
• Modification 2 –
Result on the modifications
Applications of PageRank
• Searching machine
• Type 1
– Title search
– Finds all the webpages which titles contain all of the query
words. Then it sorts the results by PageRank
• Type 2
– Google
– Full-text search engine using PageRank