Transcript sews1 7331

Building Networks
from Networks
Mining Network Data
to Model User Behavior
IPAM Workshop -- October 2, 2007
Personal Introduction
Relevance to Workshop
Relevance to Workshop
Relevance to Workshop
Network Flow Data
Collaborators:
• Filippo Menzcer
(IU, ISI/Torino)
• Alessandro Vespignani
(IU, ISI/Torino)
Network Flow Data
•
•
•
•
What is it?
Where do you get it?
How do you process it?
What can it tell you?
Network Flow Data
•
•
•
•
What is it?
Where do you get it?
How do you process it?
What can it tell you?
Credit: Morehouse University
Credit: Cisco Systems
The Internet2/Abilene network
Network Flow Data
•
•
•
•
What is it?
Where do you get it?
How do you process it?
What can it tell you?
Flows are exported in Cisco’s
netflow-v5 format
and anonymized before being
written to disk.
Data Dimensions
• Abilene on April 14, 2005
– About 200 terabytes of data exchanged
– This is roughly 25,000 DVDs of information
• 600 million flow records
– Almost 28 gigabytes on disk
– 15 million unique hosts involved
A flow is an edge.
Weighted Bipartite Digraph
Port 80 (Web)
Port 6346 (Gnutella)
Port 25 (Mail)
Port 19101 (???)
Network Flow Data
•
•
•
•
What is it?
Where do you get it?
How do you process it?
What can it tell you?
Application Correlation
• Consider the out-strength of a client in the
networks for ports p and q:
Application Correlation
• Build a pair of vectors from the distribution of
strength values:
Application Correlation
• Examine the cosine similarity of the vectors:
• When σ = 0, applications p and q are never used
together.
• When σ = 1, applications p and q are always
used together, and to the same extent.
Clustering Applications
• We now have σ(p, q) for every pair of ports
• Convert these similarities into distances:
• If σ = 0, then d is large; if σ = 1, then d = 0
• Now apply Ward’s hierarchical clustering
algorithm
Next Stop:
Behavioral Web Data
(Clicks)
Behavioral Web Data
Collaborators:
• Filippo Menczer
(IU, ISI/Torino)
• Santo Fortunato
(ISI/Torino)
• Alessandro Vespignani
(IU, ISI/Torino)
• Alessandro Flammini
(IU)
Thanks to my collaborators!
Flow Analysis
• Filippo Menczer (IU, ISI/Torino)
• Alessandro Vespignani (IU, ISI/Torino)
Click Analysis
• Filippo Menczer (IU, ISI/Torino)
• Santo Fortunato (ISI/Torino)
• Alessandro Vespignani (IU, ISI/Torino)
• Alessandro Flammini (IU)
Thank you!