Transcript Document

Models of Web-Like
Graphs: Integrated
Approach
Igor Kanovsky, Shaul Mazor
Emek Yezreel College, Israel
[email protected]
University of Haifa, Israel
[email protected]
© Igor Kanovsky & Shaul Mazor @ SCI2003, Orlando, FL, July 2003
The Web as a graph
A huge digraph with similar to the Web graph
statistical characteristics is called a Web-like
graph.
The known significant properties of the Web
as a graph are:
1.Power-law distributions.
2.Small world topology.
3.Bipartite cliques.
4.“Bow-tie" shape.
2
2003 yluJ ,LF ,odnalrO ,2003Igor Kanovsky & Shaul Mazor @ SCI ©
Power-Law distributions (PLD)
PLD of in- and out-degrees of vertices. The number
of web pages having kin links on the page or kout links from
the page is proportional to k- for some constants in, out > 2
Andrei Broder, Ravi Kumar and others. Graph structure in the web.2001
3
2003 yluJ ,LF ,odnalrO ,2003Igor Kanovsky & Shaul Mazor @ SCI ©
The Small World
Small diameter of the graph.The average distance
between any two connected web graph vertices is bounded by
log N, where N is the number of the vertices in the graph.
Big clustering coefficient.
Clustering coefficient C(v) for a vertex v is a percentage of
neighbours of v connected to each other. For graph C = <C(v)>.
Clustering coefficient of the Web graph is significant
bigger in comparison to a random graph.
4
2003 yluJ ,LF ,odnalrO ,2003Igor Kanovsky & Shaul Mazor @ SCI ©
The Small World (2)
Lada A. Adamic. The Small World Web. 2000.
5
2003 yluJ ,LF ,odnalrO ,2003Igor Kanovsky & Shaul Mazor @ SCI ©
Bipartite Small Cores
A bipartite core Ci,j is a graph on i+j
nodes that contains at least one
bipartite clique Ki,j as a subgraph.
There are a lot of bipartite small
cores Ci,j (with i,j ≥ 3) in the Web
graph (a random graph does not have
small cliques).
K3,3
This small cliques are the cores of the web communities –
set of connected sites with a common content topic.
6
2003 yluJ ,LF ,odnalrO ,2003Igor Kanovsky & Shaul Mazor @ SCI ©
Bipartite Small Cores (2)
Number of Cij as functions of i.j
Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan and Andrew Tomkins.
Extracting large-scale knowledge bases from the web.2000.
7
2003 yluJ ,LF ,odnalrO ,2003Igor Kanovsky & Shaul Mazor @ SCI ©
"bow-tie" shape
The major part of web pages can be
divided into four sets: a core made by
the strongly connected components
(SCC), i.e. pages that are mutually
connected to each other, 2 sets
(upstream and downstream) made by
the pages that can only reach (or be
reached by) the pages in the core,
and a set (tendril) containing pages
that can neither reach nor be reached
from the core.
The Web graph has a "bow-tie" shape,
8
2003 yluJ ,LF ,odnalrO ,2003Igor Kanovsky & Shaul Mazor @ SCI ©
Web-like Graph Modeling
The aim is to find stochastic processes
yields web-like graph.
Our integrated approach is based on well
known Web graph models extended in
order to satisfy all mentioned above
statistical properties.
We try to keep a web-like graph model as
simple as possible, thus it has to have a
minimum set of parameters.
9
2003 yluJ ,LF ,odnalrO ,2003Igor Kanovsky & Shaul Mazor @ SCI ©
Web-like graph models
• Erdös-Rènyi Model. Classical pure random graph model.
• A “Small world” Model. Regular lattice with small number
of random long-range links. Has no power-law distributions
(PLD).
• The Preferential-Attachment (PA) Model. At every time
step, a new node is added and linked to other nodes randomly
with probability proportional to node’s in degree. Has in-degree
PLD (the slope is -3).
• The Copying Model. At every time step, a new node is added
and linked to other nodes or by coping an existing link from
random chosen node (with probability 1-p) , or randomly (with
probability p). Has in-degree PLD (the slope is –(2-p)/(1-p) ).
10
2003 yluJ ,LF ,odnalrO ,2003Igor Kanovsky & Shaul Mazor @ SCI ©
Extended scale-free model (1)
1. At each time step, a new vertex is added and is
connected to existing vertex through random
number m ( z) of new edges, where the average
number of edges per node (z) is constant for a
growing graph. The probability that an existing
vertex gains an edge is proportional to its in-degree.
kin, i  Ain
(kin, i) 
 j (kin, j  Ain)
11
2003 yluJ ,LF ,odnalrO ,2003Igor Kanovsky & Shaul Mazor @ SCI ©
Extended scale-free model (2)
2. Simultaneously, z-m directed edges are distributed
among all the vertices in the graph by the following
rules: (i) the source is chosen with a probability
proportional to their out degree, (ii) the target ends is
chosen with a probability proportional to their indegree.
The model has 3 parameters: average
degree z, initial attractiveness of vertex
to gain in and out edge Ain , Aout .
12
2003 yluJ ,LF ,odnalrO ,2003Igor Kanovsky & Shaul Mazor @ SCI ©
Simulation results.In-degree distribution.
Our model.
N = 30 K.<k>=8
Ain = 2.Aout = 6.
Web.
N = 500 M.
13
2003 yluJ ,LF ,odnalrO ,2003Igor Kanovsky & Shaul Mazor @ SCI ©
Simulation results.Out-degree distribution.
Our model.
N = 30 K.<k>=8
Ain = 2.Aout = 6.
Web.
N = 500 M.
14
2003 yluJ ,LF ,odnalrO ,2003Igor Kanovsky & Shaul Mazor @ SCI ©
Simulation results
5
in
exponent
4
out
3
2
1
0
0
10
20
30
40
average degree
Degree distribution for various average degree
15
2003 yluJ ,LF ,odnalrO ,2003Igor Kanovsky & Shaul Mazor @ SCI ©
50
5
exponent
4
3
2
in
1
out
0
0
10
20
30
40
A in
Degree distribution for various initial “in”
attractiveness Ain
16
2003 yluJ ,LF ,odnalrO ,2003Igor Kanovsky & Shaul Mazor @ SCI ©
50
5
exponent
4
3
2
in
1
out
0
0
10
20
30
40
A o ut
Degree distribution for various initial “out”
attractiveness Aout
17
2003 yluJ ,LF ,odnalrO ,2003Igor Kanovsky & Shaul Mazor @ SCI ©
50
12
Dd
diameter
10
Du
8
6
4
2
0
0
10
20
30
40
50
average degree
Diameter for various average degree.
18
2003 yluJ ,LF ,odnalrO ,2003Igor Kanovsky & Shaul Mazor @ SCI ©
diameter
9
Dd
Du
7
5
3
1
0
10
20
30
40
50
Ai n
Diameter for various initial “in” attractiveness Ain
19
2003 yluJ ,LF ,odnalrO ,2003Igor Kanovsky & Shaul Mazor @ SCI ©
clustering coefficient
0.03
0.025
0.02
0.015
0.01
0.005
0
0
10
20
30
40
50
average degree
Clustering coefficient for various average degree
20
2003 yluJ ,LF ,odnalrO ,2003Igor Kanovsky & Shaul Mazor @ SCI ©
c lust ering c oef f ic ient
0.005
0.004
0.003
0.002
0.001
0
0
10
20
30
40
A in
Clustering coefficient for various initial “in”
attractiveness Ain
21
2003 yluJ ,LF ,odnalrO ,2003Igor Kanovsky & Shaul Mazor @ SCI ©
50
300
250
K3,3
200
150
100
50
0
0
10
20
30
40
50
average degree
Number of bipartite cliques K3,3 for various average
degree.
22
2003 yluJ ,LF ,odnalrO ,2003Igor Kanovsky & Shaul Mazor @ SCI ©
550
K3,3
450
350
250
150
0
10
20
30
40
50
A in
Number of bipartite cliques K3,3 for various initial
“in” attractiveness Ain.
23
2003 yluJ ,LF ,odnalrO ,2003Igor Kanovsky & Shaul Mazor @ SCI ©
1
largest SCC
0.8
0.6
0.4
0.2
0
0
10
20
30
40
average degree
Largest SCC for various average degree
24
2003 yluJ ,LF ,odnalrO ,2003Igor Kanovsky & Shaul Mazor @ SCI ©
50
1
largest SCC
0.8
0.6
0.4
0.2
0
0
10
30
20
40
50
A in
Largest SCC for various initial “in” attractiveness
Ain.
25
2003 yluJ ,LF ,odnalrO ,2003Igor Kanovsky & Shaul Mazor @ SCI ©
Characteristics of several web-like models
Models
Our
in
-2.18
out
-2.85
K3,3
201
C
0.0031
PA
-2.94
NA
0
0.0029
Copying
-2.14
NA
986
0.0022
Small World
NA
NA
0
0.6191
NA – not applicable
26
2003 yluJ ,LF ,odnalrO ,2003Igor Kanovsky & Shaul Mazor @ SCI ©
Advantages of our approach
Only our extended scale-free model
capture all known statistical properties of
the Web graph.
The model is very simple. It has only
three parameters.
The model may be used for developing and
testing different algorithms for Web (like
search, ranking, site promotion).
27
2003 yluJ ,LF ,odnalrO ,2003Igor Kanovsky & Shaul Mazor @ SCI ©
Thank you.
For contacts:
igor kanovsky, [email protected],
http://www.yvc.ac.il/ik/
28
2003 yluJ ,LF ,odnalrO ,2003Igor Kanovsky & Shaul Mazor @ SCI ©