School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.
Download
Report
Transcript School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.
School of Information
University of Michigan
SI 614
Basic network concepts and intro to Pajek
Lecture 2
Instructor: Lada Adamic
Outline
Basic network metrics
Bipartite graphs
Graph theory in math
Pajek
Network elements: edges
Directed (also called arcs)
A -> B
A likes B, A gave a gift to B, A is B’s child
Undirected
A <-> B or A – B
A and B like each other
A and B are siblings
A and B are co-authors
Edge attributes
weight (e.g. frequency of communication)
ranking (best friend, second best friend…)
type (friend, relative, co-worker)
properties depending on the structure of the rest of the graph:
e.g. betweenness
Directed networks
girls’ school dormitory dining-table partners (Moreno, The sociometry reader, 1960)
first and second choices shown
Louise
Ada
Lena
Adele
Marion
Jane
Frances
Cora
Eva
Maxine
Mary
Anna
Ruth
Edna
Robin
Betty
Martha
Jean
Laura
Alice
Hazel
Helen
Ellen
Ella
Irene
Hilda
Edge weights can have positive or negative values
One gene
activates/inhibits
another
One person
trusting/distrusting
another
Research challenge:
How does one
‘propagate’ negative
feelings in a social
network? Is my
enemy’s enemy my
friend?
Transcription regulatory
network in baker’s yeast
Adjacency matrices
Representing edges (who is adjacent to whom) as a
matrix
Aij = 1 if node i has an edge to node j
j
i
= 0 if node i does not have an edge to j
i
Aii = 0 unless the network has self-loops
i
Aij = Aji if the network is undirected,
or if i and j share a reciprocated edge
Example:
2
3
1
5
4
A=
0
0
0
0
0
0
0
1
1
0
0
1
0
1
0
0
0
0
0
1
1
1
0
0
0
j
Adjacency lists
Edge list
23
24
32
34
45
52
51
2
Adjacency list
is easier to work with if network is
large
sparse
quickly retrieve all neighbors for a node
3
1
1:
2: 3 4
3: 2 4
4: 5
5: 1 2
5
4
Nodes
Node network properties
from immediate connections
indegree
how many directed edges (arcs) are incident on a node
outdegree
how many directed edges (arcs) originate at a node
degree (in or out)
number of edges incident on a node
from the entire graph
centrality (betweenness, closeness)
indegree=3
outdegree=2
degree=5
2
Node degree from matrix values
3
1
5
4
Outdegree =
n
A
j 1
ij
A=
example: outdegree for node 3 is 2, which
we obtain by summing the number of nonn
zero entries in the 3rd row
A
j 1
A
i 1
ij
A=
example: the indegree for node 3 is 1,
which we obtain by summing the number of
non-zero entries in the 3rd column
n
A
i 1
i3
0
0
0
0
0
0
1
1
0
0
1
0
1
0
0
0
0
0
1
1
1
0
0
0
0
0
0
0
0
0
0
1
1
0
0
1
0
1
0
0
0
0
0
1
1
1
0
0
0
3j
n
Indegree =
0
Other node attributes
take your pick…
geographical location
function
musical tastes…
Homophily: tendency of like individuals to associate with one
another
Network metrics: degree sequence and degree
distribution
Degree sequence: An ordered list of the (in,out) degree of each node
In-degree sequence:
[2, 2, 2, 1, 1, 1, 1, 0]
Out-degree sequence:
[2, 2, 2, 2, 1, 1, 1, 0]
(undirected) degree sequence:
[3, 3, 3, 2, 2, 1, 1, 1]
Degree distribution: A frequency count of the occurrence of each degree
5
4
frequency
In-degree distribution:
[(2,3) (1,4) (0,1)]
Out-degree distribution:
[(2,4) (1,3) (0,1)]
(undirected) distribution:
[(3,3) (2,2) (1,3)]
3
2
1
0
0
1
indegree
2
Network metrics: connected components
Strongly connected components
Each node within the component can be reached from every other node
in the component by following directed links
B
Strongly connected components
BCDE
A
GH
F
F
G
C
A
E
H
D
Weakly connected components: every node can be reached from every
other node by following links in either direction
Weakly connected components
ABCDE
GHF
B
G
C
A
In undirected networks one talks simply about
‘connected components’
F
E
D
H
Network metrics: shortest paths
Shortest path (also called a geodesic path)
The shortest sequence of links connecting two nodes
Not always unique
B
3
A and C are connected by 2 shortest
A
paths
A–E–B-C
A–E–D-C
C
2
1
3
E 2
D
Diameter: the largest geodesic distance in the graph
The distance between A and C is the
maximum for the graph: 3
Caution: some people use the term ‘diameter’ to be the average shortest
path distance, in this class we will use it only to refer to the maximal distance
Giant components and the web graph
if the largest component encompasses a significant fraction of the graph,
it is called the giant component
The bowtie model of the web
The Web is a directed graph:
webpages link to other
webpages
The connected components
tell us what set of pages can
be reached from any other just
by surfing (no ‘jumping’ around
by typing in a URL or using a
search engine)
Broder et al. 1999 – crawl of
over 200 million pages and 1.5
billion links.
SCC – 27.5%
IN and OUT – 21.5%
Tendrils and tubes – 21.5%
Disconnected – 8%
image: Mark Levene
bipartite (two-mode) networks
edges occur only between two groups of nodes, not
within those groups
for example, we may have individuals and events
directors and boards of directors
customers and the items they purchase
metabolites and the reactions they participate in
going from a bipartite to a one-mode graph
group 1
Two-mode network
One mode projection
two nodes from the first
group are connected if
they link to the same
node in the second
group
some loss of information
naturally high
occurrence of cliques
group 2
Now in matrix notation
Bij
i
= 1 if node i from the first group
links to node j from the second group
= 0 otherwise
j
B is usually not a square matrix!
for example: we have n customers and m products
B=
1
0
0
0
1
0
0
0
1
1
0
0
1
1
1
1
0
0
0
1
Collapsing to a one-mode network
i
i and k are linked if they both link to j
k
Aik= j Bij Bkj
A= B BT
j=1
j=2
the transpose of a matrix swaps Bxy and Byx
if B is an nxm matrix, BT is an mxn matrix
B=
1
0
0
0
1
0
0
0
1
1
0
0
1
1
1
1
0
0
0
1
BT =
1
1
1
1
0
0
0
1
1
0
0
0
0
1
0
0
0
0
1
1
Matrix multiplication
general formula for matrix multiplication Zij= k Xik Ykj
let Z = A, X = B, Y = BT
1
0
0
0
1
1
1
1
0
1
0
0
0
0
0
1
1
0
1
1
0
0
0
0
0
1
0
1
1
1
1
0
0
0
1
1
0
0
0
1
1 1 1 1
1
A=
= 1*1+1*1
1
+ 1*0 + 1*0
0 =2
0
1
1
1
1
0
1
1
1
1
0
= 1
1
2
2
0
1
1
2
4
1
0
0
0
1
1
1
1
1
1
2
Collapsing a two-mode network to a one mode-network
Assume the nodes in group 1 are people and the nodes
in group 2 are movies
The diagonal entries of A give the number of movies
each person has seen
The off-diagonal elements of A give the number of
movies that both people have seen
A is symmetric
A=
1
1
1
1
0
1
1
1
1
0
1
1
2
2
0
1
1
2
4
1
0
0
0
1
1
1
1
1
1
2
Networks of actors
History: Graph theory
Euler’s Seven Bridges of Königsberg – one of the first problems in
graph theory
Is there a route that crosses each bridge only once and returns to
the starting point?
Eulerian paths
If starting point and end point are the same:
only possible if no nodes have an odd degree
each path must visit and leave each shore
If don’t need to return to starting point
can have 0 or 2 nodes with an odd degree
Eulerian path: traverse each
edge exactly once
Hamiltonian path: visit
each vertex exactly once
Bi-cliques (cliques in bipartite graphs)
Km,n is the complete bipartite graph with m and n vertices of the
two different types
K3,3 maps to the utility graph
Is there a way to connect three utilities, e.g. gas, water, electricity to
three houses without having any of the pipes cross?
Utility graph
K3,3
Planar graphs
A graph is planar if it can be drawn on a plane without
any edges crossing
When graphs are not planar
Two graphs are homeomorphic if you can make one
into the other by adding a vertex of degree 2
Cliques and complete graphs
Kn is the complete graph (clique) with K vertices
each vertex is connected to every other vertex
there are n*(n-1)/2 undirected edges
K3
K5
K8
Peterson graph
Example of using edge contractions to show a graph is
not planar
Edge contractions defined
A finite graph G is planar if and only if it has no subgraph that is
homeomorphic or edge-contractible to the complete graph in five vertices
(K5) or the complete bipartite graph K3, 3. (Kuratowski's Theorem)
graph density
Of the connections that may exist between n nodes
directed graph
emax = n*(n-1)
each of the n nodes can connect to (n-1) other nodes
undirected graph
emax = n*(n-1)/2
since edges are undirected, count each one only once
What fraction are present?
density = e/ emax
For example, out of 12
possible connections, this graph
has 7, giving it a density of
7/12 = 0.583
But it is more difficult for a larger network
to achieve the same density
measure not useful for comparing networks of different densities
#s of planar graphs of different sizes
1:1
2:2
3:4
4:11
Every planar graph
has a straight line
embedding
(homework exercise)
Trees
Trees are undirected graphs that contain no cycles
examples of trees
In nature
trees
river networks
arteries (or veins, but not both)
Man made
sewer system
Computer science
binary search trees
decision trees (AI)
Network analysis
minimum spanning trees
from one node – how to reach all other nodes most quickly
may not be unique, because shortest paths are not always unique
depends on weight of edges
Using Pajek for exploratory social network analysis
Pajek – (pronounced in Slovenian as Pah-yek) means ‘spider’
website: vlado.fmf.uni-lj.si/pub/networks/pajek/
download application (free)
tutorials
lectures
data sets
Windows only (works on Linux via Wine)
can be installed via NAL in the student lab (DIAD)
helpful book: ‘Exploratory Social Network Analysis with Pajek’ by
Wouter de Nooy, Andrej Mrvar and Vladimir Batagelj
first 2 chapters are required reading and on cTools
Pajek interface
things we’ll use right away
Drop down list of networks opened or created with pajek. Active is displayed
Drop down list of network partitions by discrete variables, e.g. degree, mode, label
Drop down list of continuous node attributes, e.g. centrality, clustering coefficients
things we’ll use later for clustering
opening a network file
click on folder icon
to open a file
Save changes to your network, network partitions, etc., if you’d like to keep them
Working with network files in Pajek
The active network, partition, etc is shown on top of the
drop down list
Draw the network
Pajek data format
Louise
Ada
number of vertices
Cora
*Vertices 26
1 "Ada"
2 "Cora"
3 "Louise"
..
directed edges
from Ada(1) to Louise(3) as
choice “2” and color Black
undirected edges
between Ada(1) to Cora(2) as
choice “1” and color Black
*Arcs
1 3 2 c Black
..
*Edges
1 2 1 c Black
..
0.1646
0.0481
0.3472
vertex x,y,z coordinates (optional)
0.2144
0.3869
0.1913
0.5000
0.5000
0.5000
Live demo of Pajek
Opening a network
Visualization
Essential measurements
Final project guidelines
Work individually or in groups (up to 4 people)
Important dates
Feb. 13th Project proposals due (5%)
1 page abstract & 5 minute class presentation
March 20th Project status report due (5%)
3-6 pages of
result summaries (including figures and tables)
plan of remaining work
April 17th in class student presentations of results (5%)
April 24th final project reports due (25%)
6-12 pages of
related work
main results
‘future’ work/extensions
Final Project
Option 1: Analyze a network
What it should be
More than just a measurement of the average shortest path, clustering
coefficient, and degree distribution
An interpretation of measurement results
If applicable:
discovery of community or other structure
assortativity
motifs
weights, thresholds
longitudinal data (how the network changes over time)
Visualizations of all or part of the network that point out a particular feature
Qualitative comparison with other networks
What it should not be
a literature review
The data can be artificially generated or a real-world dataset
If you intend to work on data concerning human subjects, you may need
to start an IRB application ASAP
Final Project
Option 2: New network model
What it should be
Method for generating a network
e.g. preferential attachment
optimization wrt. different criteria
Analysis of resulting network
comparison with random graphs
how do attributes change depending on model parameters
What it should not be
an already thoroughly explored model
Final Project
Option 3: Novel algorithm
What it should be
An algorithm to analyze the network
e.g. clustering or community detection algorithm
webpage ranking algorithm
OR a process that is influenced by the network
gossip spreading
games such as the prisoner’s dilemma
Analysis of algorithm on several different networks
What it should not be
an exact replica of an existing algorithm applied to a network where
it has already been studied