Exploring Blog Networks Patterns and a Model for Information Propagation Mary McGlohon In collaboration with Jure Leskovec, Christos Faloutsos Natalie Glance, Matthew Hurst Sandia National Labs-

Transcript Exploring Blog Networks Patterns and a Model for Information Propagation Mary McGlohon In collaboration with Jure Leskovec, Christos Faloutsos Natalie Glance, Matthew Hurst Sandia National Labs-

Exploring Blog Networks
Patterns and a Model for
Information Propagation
Mary McGlohon
In collaboration with Jure Leskovec, Christos Faloutsos
Natalie Glance, Matthew Hurst
Sandia National Labs- July 6, 2007
1
Long-term Goals
●
●
●
How does information on the Web propagate?
With what pattern do ideas catch on, diffuse,
and decrease in popularity?
Can we build a model for this propagation?
2
Why blogs?
●
●
●
Blogs are a widely used medium of information
for many topics and have become an important
mode of communication.
Blogs cite one another, creating a record of
how information and ideas spread through a
social network.
This record is publicly available.
3
Why do we care?
●
Understanding how the blog network works is
important for:
–
–
Social issues: Political mapping, social trends and
change, reactions to mass media.
Economic issues: Marketing, predicting
commercial success, discovering links between
companies.
Example: blogs in the
2004 election.
[Adamic, Glance 2005]
4
Immediate Goals
●
●
●
Temporal questions: Does popularity have
half-life? Is there periodicity?
Topological questions: What topological
patterns do posts and blogs follow? What
shapes do cascades take on? Stars?
Chains? Something else?
Generative model: Can we build a generative
model that mimics properties of cascades?
5
Outline
Motivation
Preliminaries
Concepts and terminology
Data
Temporal Observations
Topological Observations
Cascade Generation Model
Discussion & Conclusions
6
What is a blog?
●
A blog is a frequently-updated webpage.
●
A blog’s author updates the blog using posts.
●
Each post has a permanent hyperlink, and may
contain links to other blog posts.
slashdot
boingboing
7
What is a blog?
●
A blog is a frequently-updated webpage.
●
A blog’s author updates the blog using posts.
●
Each post has a permanent hyperlink, and may
contain links to other blog posts.
The iPhone is
here, hooray!
slashdot
boingboing
8
What is a blog?
●
A blog is a frequently-updated webpage.
●
A blog’s author updates the blog using posts.
●
Each post has a permanent hyperlink, and may
contain links to other blog posts.
The iPhone is
here, hooray!
slashdot
At this link, Slashdot says the
iPhone has arrived. But I’m
not buying one, because …
boingboing
9
What is a blog?
●
A blog is a frequently-updated webpage.
●
A blog’s author updates the blog using posts.
●
Each post has a permanent hyperlink, and may
contain links to other blog posts.
The iPhone is
here, hooray!
Here Boingboing
says they’re not
buying an iPhone.
slashdot
They’re just
jealous.
At this link, Slashdot says the
iPhone has arrived. But I’m
not buying one, because …
boingboing
10
From blogs to networks
slashdot
boingboing
B1
B2
MichelleMalki
B3
n
Dlisted
B4
Blogosphere network
slashdot
B1
1
1
B2
1
MichelleMalki
1
B3
n
a
boingboing
2
Dlisted
3
B4
Blog network
1
b
c
d
e
Post network
11
From networks to cascades
slashdot
boingboing
Non-trivial vs. trivial cascades
MichelleMalki
n
Dlisted
Blogosphere network
12
Cascades
From networks to cascades
slashdot
boingboing
Non-trivial vs. trivial cascades
Cascade initiators are first
sources of information
We also have stars and chains
MichelleMalki
n
Dlisted
Blogosphere network
13
Cascades
Dataset (Nielsen Buzzmetrics)
●
Gathered from August-September 2005*
●
Used set of 44,362 blogs, traced cascades
2.4 million posts, ~5 million out-links, 245,404 blogto-blog links
Number of posts
●
Time [1 day]
14
Outline
Motivation
Preliminaries
Concepts and terminology
Data
Temporal Observations
Does blog traffic behave periodically?
How does popularity change over time?
Topological Observations
Cascade Generation Model
Discussion & Conclusions
Future Work
15
Temporal Observations
Does blog traffic behave periodically?
• Posts have “weekend effect”, less traffic on
Saturday/Sunday.
16
Temporal Observations
Does blog traffic behave periodically?
Number in-links (log)
Number in-links (log)
• Monday appears to compensate for this behavior, but
it is not actually the case.
• We normalize data: countnorm = count / pd
where pd is percentage of links on that day.
Monday post dropoff- days
after post
Same data, normalized
17
Temporal Observations
Observation 1: The
probability that a post
written at time tp
acquires a link at time
tp +  is:
p(tp+)  1.5
Number of in-links
How does post popularity change over time?
Post popularity dropoff follows a power law
identical to that found in communication response
times in [Vazquez2006].
Days after post
18
Outline
Motivation
Preliminaries
Temporal Observations
Does blog traffic behave periodically?
How does post popularity change over time?
Topological Observations
What are graph properties for blog networks?
What shapes do cascades take on? Stars, chains,
or something else?
Cascade Generation Model
Discussion & Conclusions
Future Work
19
Topological Observations
What graph properties does the blog network
exhibit?
B1
1
1
B2
1
1
B3
2
B4
3
20
Topological Observations
What graph properties does the blog network
exhibit? How connected?
● 44,356 nodes, 122,153 edges
● Half of blogs belong to largest connected
component.
B1
1
1
B2
1
1
B3
2
B4
3
21
Topological Observations
Count (log scale)
Count (log scale)
What power laws does the blog network exhibit?
Number of blog in-links (log scale)
Number of blog out-links (log scale)
Both in- and out-degree follows a power law
distribution, in-link PL exponent -1.7, out-degree
PL exponent near -3.
This suggests strong rich-get-richer phenomena.
22
Topological Observations
How are blog in- and out-degree related?
(log scale)
Number of blog out-links
In-links and out-links are not correlated.
(correlation coefficient 0.16)
Number of blog in-links (log scale)
23
Topological Observations
What graph properties does the post network
exhibit?
a
b
c
d
e
24
Topological Observations
What graph properties does the post network
exhibit?
Very sparsely
connected: 98% of
posts are isolated.
a
b
c
d
e
25
Topological Observations
Count
Count
What power laws does the post network exhibit?
• Both in-and out-degree follow power laws:
• In-degree has PL exponent -2.15, out-degree
has PL exponent -2.95.
Post in-degree
Post out-degree
26
Topological Observations
How do we measure how information flows
through the network?
We gather cascades using the following procedure:
–
Find all initiators (out-degree 0).
a
b
c
d
e
27
Topological Observations
How do we measure how information flows
through the network?
We gather cascades using the following procedure:
–
Find all initiators (out-degree 0).
–
Follow in-links.
a
a
b
b
c
d
c
d
e
e
28
Topological Observations
How do we measure how information flows
through the network?
We gather cascades using the following procedure:
–
Find all initiators (out-degree 0).
–
Follow in-links.
–
Produces directed acyclic graph.
a
a
b
b
c
d
c
d
e
d
b
e
c
a
e
e
29
Topological Observations
How do we measure how information flows
through the network?
Common cascade shapes are extracted using
algorithms in [Leskovec2006].
30
Topological Observations
How do we measure how information flows
through the network?
Effective diameter
Number of edges
Number of edges increases linearally with
cascade size, while effective diameter increases
logarithmically, suggesting tree-like structures.
Cascade size (# nodes)
31
Cascade size
Topological Observations
How do we measure how information flows
through the network?
We work with a bag of cascades– each
cascade is a disconnected subgraph.
We now explore some graph properties of
cascades.
32
Topological Observations
What graph properties do cascades exhibit?
Count
Count
As before, in- and out-degree in bag of
cascades follow power laws.
Cascade node in-degree
Cascade node out-degree
33
Topological Observations
What graph properties do cascades exhibit?
Cascade size distributions also follow power law.
34
Topological Observations
What graph properties do cascades exhibit?
Cascade size distributions also follow power law.
Observation 2: The probability of observing a
cascade on n nodes follows a Zipf distribution:
Count
p(n)  n-2
35
Cascade size (# of nodes)
Topological Observations
What graph properties do cascades exhibit?
Stars and chains also follow a power law, with
different exponents (star -3.1, chain -8.5).
36
Topological Observations
What graph properties do cascades exhibit?
Count
Count
Stars and chains also follow a power law, with
different exponents (star -3.1, chain -8.5).
Size of star (# nodes)
Size of chain (# nodes)
37
Outline
Motivation
Preliminaries
Temporal Observations
Topological Observations
What are graph properties for blog networks?
What shapes and patterns do cascades take on?
Cascade Generation Model
Epidemiological Background
Proposed Model
Experimental Validation
Discussion & Conclusions
Future Work
38
Epidemiological models
●
●
We consider modeling cascade generation as
an epidemic, with ideas as viruses.
We use the SIS model:
–
–
–
At any time, an entity is in one of two states:
susceptible or infected.
One parameter  determines how easily spreading
conversations are.
[Hethcote2000]
39
Epidemiological models
●
●
We consider modeling cascade generation as
an epidemic, with ideas as viruses.
We use the SIS model:
–
–
–
At any time, an entity is in one of two states:
susceptible or infected.
One parameter  determines how easily spreading
conversations are.
[Hethcote2000]
40
Epidemiological models
●
●
We consider modeling cascade generation as
an epidemic, with ideas as viruses.
We use the SIS model:
–
–
–
At any time, an entity is in one of two states:
susceptible or infected.
One parameter  determines how easily spreading
conversations are.
[Hethcote2000]
41
Epidemiological models
●
●
We consider modeling cascade generation as
an epidemic, with ideas as viruses.
We use the SIS model:
–
–
–
At any time, an entity is in one of two states:
susceptible or infected.
One parameter  determines how easily spreading
conversations are.
[Hethcote2000]
42
Epidemiological models
●
●
We consider modeling cascade generation as
an epidemic, with ideas as viruses.
We use the SIS model:
–
–
–
At any time, an entity is in one of two states:
susceptible or infected.
One parameter  determines how easily spreading
conversations are.
[Hethcote2000]
43
Epidemiological models
●
●
We consider modeling cascade generation as
an epidemic, with ideas as viruses.
We use the SIS model:
–
–
–
At any time, an entity is in one of two states:
susceptible or infected.
One parameter  determines how easily spreading
conversations are.
[Hethcote2000]
44
Epidemiological models
●
●
We consider modeling cascade generation as
an epidemic, with ideas as viruses.
We use the SIS model:
–
–
–
At any time, an entity is in one of two states:
susceptible or infected.
One parameter  determines how easily spreading
conversations are.
[Hethcote2000]
45
Epidemiological models
●
●
We consider modeling cascade generation as
an epidemic, with ideas as viruses.
We use the SIS model:
–
–
–
At any time, an entity is in one of two states:
susceptible or infected.
One parameter  determines how easily spreading
conversations are.
[Hethcote2000]
46
Cascade Generation Model
0. Begin with Blog Net.
1
B1
B2
1
2
1
1
B3
B4
3
47
Cascade Generation Model
0. Begin with Blog Net, but ignore edge
weights.
Example–
B1
B3
B2
B4
B1 links to B2,
B2 links to B1,
B4 links to B2
and B1, as well
as itself
B3 is isolated,
linking to itself.
48
Cascade Generation Model
1. Randomly pick a blog to infect,
add node to cascade
B1
B1
B3
B2
B4
49
Cascade Generation Model
2. Infect each in-linked neighbor
with probability .
B1
B1
B3
B2
B4
50
Cascade Generation Model
2. Infect each in-linked neighbor
with probability .
DO NOT
INFECT
B1
B1
B2
INFECT
B3
B4
51
Cascade Generation Model
3. Add infected neighbors to cascade.
B1
B1
B2
B4
B3
B4
52
Cascade Generation Model
4. Set “old” infected nodes to
uninfected.
B1
B1
B2
B4
B3
B4
53
Cascade Generation Model
4. Set “old” infected nodes to
uninfected. Repeat steps 2-4 until no
nodes are infected.
B1
B1
B2
B4
B3
B4
54
Cascade Generation Model
4. Set “old” infected nodes to
uninfected. Repeat steps 2-4 until no
nodes are infected.
B1
B1
B2
DO NOT
INFECT
B3
B4
B4
55
Cascade Generation Model
4. Set “old” infected nodes to
uninfected. Repeat steps 2-4 until no
nodes are infected.
Completed
cascade!
B1
B1
B2
B4
B3
B4
56
CGM matches observations
●
●
●
After trying several values, we decide on =.025.
10 simulations, 2 million cascades each
Most frequent cascades: 7 of 10 matched exactly.
model
data
57
CGM matches observations
Count
Cascade size in this model also follows a
power law-- the model distribution is
shown with the real data points.
Cascade size (number of nodes)
58
CGM matches observations
Count
Stars and chains both follow power laws, close
to those observed in real data.
Count
●
Star size
Chain size
59
Results in brief
●
●
●
●
●
Analyzed one of largest available collections of
blog information.
Two networks: “Post network” and “blog
network”.
Discovered several properties of the networks.
Also analyzed properties of “cascades”.
Presented generative model for cascades.
60
Immediate questions: answered
Temporal questions: Does popularity have
half-life? Is there periodicity?
–
Popularity dropoff follows a power-law distribution
exactly as found in response times in other work.
We do find that posts follow weekly periodicity.
Number of in-links
●
61
Days after post
Immediate questions: answered
Topology: What topological patterns do posts
and blogs follow? What shapes to cascades
take on? Stars? Chains? Something else?
We find power law distributions in almost every
topological property. In cascade shapes, stars are
more common than chains, and size of cascades
follow a power law. Cascades are tree-like.
Count
–
Count
●
62
Size of star (# nodes)
Size of chain (# nodes)
Immediate questions: answered
Can a simple model replicate this behavior?
Yes. We developed a model based on the SIS
model in epidemiology. It is a simple model with
only one parameter, and it produces behavior
remarkably similar to that found in the dataset.
Count
–
Count
●
Star size
Chain size
63
Future work and applications
●
●
●
This work suggested that ideas may behave
like viruses under an SIS model.
This may be useful for mapping social/political
trends.
Further investigation into these properties may
also allow us early detection of changes in
social or economic structure.
64
Related work
●
For explanation of SIS model:
–
●
For algorithms for extracting cascade shapes:
–
●
[Hethcote2000] H.W. Hethcote. The mathematics of
infectious diseases. SIAM Rev., 42(4):599–653, 2000.
[Leskovec2006] J. Leskovec, A. Singh, and J. Kleinberg.
Patterns of influence in a recommendation network.
PAKDD 2006.
For some modeling of power laws:
–
[Vazquez2006] A. Vazquez, J. G. Oliveira, Z. Dezso, K. I.
Goh, I. Kondor, and A. L. Barabasi. Modeling bursts and
heavy tails in human dynamics. Physical Review E,
73:036127, 2006.
65
Additional Info
Mary McGlohon
www.cs.cmu.edu/~mmcgloho
[email protected]
66
Acknowledgments
●
●
Mary McGlohon was partially supported by an NSF
Graduate Fellowship.
Jure Leskovec was partially supported by a
Microsoft Fellowship.
6767
Questions?
68
●
EXTRA SLIDES BEGIN HERE!
69
Preliminaries- PCA
●
●
We will work with very high-dimensional data
(~9,000 dimensions).
Principal Component Analysis is a method of
dimensionality reduction.
Hypothetically,
for each blog...
Depth
upwards
Conversation mass
upwards
7070
Preliminaries- PCA
●
●
We will work with very high-dimensional data
(~9,000 dimensions).
Principal Component Analysis is a method of
dimensionality reduction.
Hypothetically,
for each blog...
Depth
upwards
Conversation mass
upwards
7171
Preliminaries- PCA
●
●
We will work with very high-dimensional data
(~9,000 dimensions).
Principal Component Analysis is a method of
dimensionality reduction.
Hypothetically,
for each blog...
Depth
upwards
Conversation mass
upwards
7272
Preliminaries- PCA
We can represent any real N x M matrix
X as X= U x  x Vt
X
1
2
1
5
0
0
0
1
2
1
5
0
0
0
1
2
1
5
0
0
0
U
0
0
0
0
2
3
1
0
0
0
0
2
3
1
=
0 .1 8
0 .3 6
0 .1 8
0 .9 0
0
0
0
0
0
0
0
0 .5 3
0 .8 0
0 .2 7

x
9.64 0
0
5.29
x
Vt
v1
0
.
5
8
0
.
5
8
0
.
5
8
0 0
0 0 0 0
.
7
1
0
.
7
1
73
Preliminaries- PCA
●
1
2
1
5
0
0
0
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
Reduce dimensionality by setting all
other components of  to zero.
0
0
0
0
2
3
1
=
0 .1 8
0 .3 6
0 .1 8
0 .9 0
0
0
0
0
0
0
0
0 .5 3
0 .8 0
0 .2 7
x
9.64 0
0
5.29
x
0
.
5
8
0
.
5
8
0
.
5
8
0 0
0 0 0 0
.
7
1
0
.
7
1
74
Preliminaries- PCA
1
2
1
5
0
0
0
1
2
1
5
0
0
0
1
2
1
5
0
0
0
0
0
0
0
2
3
1
0
0
0
0
2
3
1
~
0 .1 8
0 .3 6
0 .1 8
0 .9 0
0
0
0
0
0
0
0
0 .5 3
0 .8 0
0 .2 7
x
9.64 0
0
0
x
0
.
5
8
0
.
5
8
0
.
5
8
0 0
0 0 0 0
.
7
1
0
.
7
1
Reference: Fukunaga, K. (1990). Introduction to
Statistical Pattern Recognition, Academic Press.
75
Preliminaries- Regularizing data
Not everything in life is normally distributed. 
Blog properties, linear-linear scale
Total In-links
●
76
Total Conversation Mass Downwards
Preliminaries- Regularizing data
Not everything in life is normally distributed. 
Blog properties, linear-linear scale
Total In-links
●
99.4% of
points!
77
Total Conversation Mass
Downwards
Preliminaries: Regularizing data
Not everything in life is normally distributed. 
Blog properties, linear-linear scale
Total In-links
●
Try to fit a line...
78
Total Conversation Mass Downwards
Preliminaries: Regularizing data
Not everything in life is normally distributed. 
Blog properties, linear-linear scale
Total In-links
●
Try to fit a line...
Outliers
dramatically
affect fit.
79
Total Conversation Mass Downwards
Preliminaries: Regularizing data
●
Not everything in life is normally distributed. 
Therefore, we propose to take log(count+1).
Blog properties, log-log scale
Total In-links
●
80
Total Conversation Mass Downwards
Preliminaries: Regularizing data
●
Not everything in life is normally distributed. 
Therefore, we propose to take log(count+1).
Blog properties, log-log scale
Total In-links
●
Outliers’ effects
are minimized.
81
Total Conversation Mass Downwards
●
Suppose we want to cluster blogs based on
content. What features do we use per blog?
82
CascadeType
• Perform PCA on
sparse matrix.
• Use log(count+1)
• Project onto 2 PC…
~9,000 cascade types
~44,000 blogs
…………
slashdot
boingboing
4.6
2.1
3.2
1.1
…
…
…
…
…
4.2
.09
3.4
.07
5.1
2.1
.67
1.1
.07
.01
83
CascadeType: Results
●
Observation: Content of blogs and cascade behavior
are often related.
• Distinct clusters for
“conservative” and
“humorous” blogs
(hand-labeling).
8484
CascadeType: Results
●
Observation: Content of blogs and cascade behavior
are often related.
• Distinct clusters for
“conservative” and
“humorous” blogs
(hand-labeling).
8585
●
Suppose we want to cluster blog posts. What
features do we use?
86
Preliminaries- Blogs
●
There are several terms we use to describe cascades:
●
In-link, out-link
●
●
–
Green node has one out-link
–
Yellow node has one in-link.
Depth downwards/upwards
–
Pink node has an upward depth of 1,
–
downward depth of 2.
Conversation mass upwards/downwards
–
Pink node has upward CM 1,
–
downward CM 3
8787
~2,400,000 posts
PostFeatures
slashdot-p001
4.5
slashdot-p002
.3
2.2
…
…
.2
4.5
1.2
2.4
Run
PCA…
boingboing-p001 4.2 6.2
boingboing-p002 .6
1.1
…
.6
.1
8888
PostFeatures: Results
• Observation: Posts within a blog tend to
retain similar network characteristics.
89
PostFeatures: Results
• Observation: Posts within a blog tend to
retain similar network characteristics.
– PC1 ~ CM upward
– PC2 ~ CM
downward
– We show this
scatter plot instead.
MichelleMalkin
Dlisted
90
Ranking blogs by PostFeatures
●
●
Conversation mass up/down gives a better
understanding of the blog posts than in-links
and out-links.
Therefore, we may choose to rank blogs
based on these attributes.
9191
Blogs ranked by CM vs in-links
Top blogs by conversation mass
Top blogs by in-links
1 michellemalkin.com
1 boingboing.net
2 boingboing.net
2 michellemalkin.com
3 imao.us (75)
3 instapundit.com
4 captainsquartersblog.com/mt
4 waxy.org/links
5 instapundit.com
5 kottke.com/reminder
6 radioequalizer.blogspot.com (53) 6 patriotdaily.com (11)
7 powerlineblog.com
7 captainsquartersblog.com/mt
8 waxy.org/links
8 powerlineblog.com
9 washingtonmonthly.com
9 washingtonmonthly.com
10 kottke.org/reminder
10 petashon.com (30)
9292
Blogs ranked by CM vs in-links
Top blogs by conversation mass
Top blogs by in-links
1 michellemalkin.com
1 boingboing.net
2 boingboing.net
2 michellemalkin.com
3 imao.us (75)
3 instapundit.com
4 captainsquartersblog.com/mt 4 waxy.org/links
.....
10 petashon.com (30)
in-links: 2
CM: 6
in-links: 5
CM: 5
– Perhaps IMAO has longer cascades, just fewer inlinks.
– While petashun has “stars”.
9393
BlogTimeFractal: some time series
●
Problem: time series data is nonuniform and
difficult to analyze.
in-links over time
●
●
Any patterns?
Any measures?
94
BlogTimeFractal: Definitions
●
●
●
●
Any patterns?
Self similarity!
The 80-20 law describes self-similarity.
For any sequence, we divide it into two equallength subsequences. 80% of traffic is in one,
20% in the other.
–
Repeat recursively.
9595
Self-similarity
●
The bias factor for the 80-20 law is b=0.8.
20
80
96
Self-similarity
●
The bias factor for the 80-20 law is b=0.8.
20
80
Q: How do we
estimate b?
97
Self-similarity
●
The bias factor for the 80-20 law is b=0.8.
20
80
Q: How do we
estimate b?
A: Entropy plots!
98
BlogTimeFractal
●
●
●
An entropy plot plots entropy vs. resolution.
From time series data, begin with resolution R=
T/2.
Record entropy HR
9999
BlogTimeFractal
●
●
●
●
An entropy plot plots entropy vs. resolution.
From time series data, begin with resolution R=
T/2.
Record entropy HR
Recursively take finer resolutions.
100
100
BlogTimeFractal
●
●
●
●
An entropy plot plots entropy vs. resolution.
From time series data, begin with resolution r=
T/2.
Record entropy Hr
Recursively take finer resolutions.
101
101
BlogTimeFractal: Definitions
●
●
Entropy measures the non-uniformity of histogram at
a given resolution.
We define entropy of our sequence at given R :
where p(t) is percentage of posts from a blog on interval
t, R is resolution and 2R is number of intervals.
102
BlogTimeFractal
●
●
For a b-model (and self similar cases), entropy
plot is linear. The slope s will tell us the bias
factor.
Lemma: For traffic generated by a b-model, the
bias factor b obeys the equation:
s= - b log2 b – (1-b) log2 (1-b)
103
103
Entropy Plots
Linear plot  Self-similarity
Entropy
●
Resolution
104
Entropy Plots
●
●
Linear plot  Self-similarity
Uniform: slope s=1. bias=.5
Point mass: s=0. bias=1
Entropy
●
Resolution
105
Entropy Plots
●
●
Linear plot  Self-similarity
Uniform: slope s=1. bias=.5
Point mass: s=0. bias=1
Michelle Malkin in-links,
s= 0.85
Entropy
●
By Lemma 1, b= 0.72
Resolution
106
BlogTimeFractal: Results
●
●
Observation: Most time series of interest are
self-similar.
Observation: Bias factor is approximately 0.7-that is, more bursty than uniform (70/30 law).
Entropy plots:
MichelleMalkin
in-links, b=.72
conversation mass, b=.76
number of posts, b=.70
107
107
●
Other related work
108
[Ali-Hasen, Adamic 2007]
Expressing Social Relationships on the Blog
through Links and Comments
Analyzed three blog communities:
Dallas-Fort Worth
UAE
Kuwait
-Most links are
external to
community (91%)
-Fewer links external
to community
-Fewest links
external to
community (53%)
-Low centralization
-Low reciprocity
-More centralization
-Obvious “hub”
structure
-Highly centralized
-Much reciprocity
109
[Duarte et. al. 2007]
Classified blogs into parlor, register, and
broadcast.
Fractions of sessions
with comments
●
register
parlor
broadcast
Total sessions
110
[Adar et. al. 2004]
●
Implicit Structure and the Dynamics of
Blogspace
Suggested that ideas behaved like epidemics.
Presented iRank based on how “infectious” a
blog was.
(giant microbes, a site
infectious in more ways
than one)
111

Exploring Blog Networks Patterns and a Model for Information Propagation Mary McGlohon In collaboration with Jure Leskovec, Christos Faloutsos Natalie Glance, Matthew Hurst Sandia National Labs-

Transcript Exploring Blog Networks Patterns and a Model for Information Propagation Mary McGlohon In collaboration with Jure Leskovec, Christos Faloutsos Natalie Glance, Matthew Hurst Sandia National Labs-

Directory