Transcript sews1 7331
Building Networks from Networks Mining Network Data to Model User Behavior IPAM Workshop -- October 2, 2007 Personal Introduction Relevance to Workshop Relevance to Workshop Relevance to Workshop Network Flow Data Collaborators: • Filippo Menzcer (IU, ISI/Torino) • Alessandro Vespignani (IU, ISI/Torino) Network Flow Data • • • • What is it? Where do you get it? How do you process it? What can it tell you? Network Flow Data • • • • What is it? Where do you get it? How do you process it? What can it tell you? Credit: Morehouse University Credit: Cisco Systems The Internet2/Abilene network Network Flow Data • • • • What is it? Where do you get it? How do you process it? What can it tell you? Flows are exported in Cisco’s netflow-v5 format and anonymized before being written to disk. Data Dimensions • Abilene on April 14, 2005 – About 200 terabytes of data exchanged – This is roughly 25,000 DVDs of information • 600 million flow records – Almost 28 gigabytes on disk – 15 million unique hosts involved A flow is an edge. Weighted Bipartite Digraph Port 80 (Web) Port 6346 (Gnutella) Port 25 (Mail) Port 19101 (???) Network Flow Data • • • • What is it? Where do you get it? How do you process it? What can it tell you? Application Correlation • Consider the out-strength of a client in the networks for ports p and q: Application Correlation • Build a pair of vectors from the distribution of strength values: Application Correlation • Examine the cosine similarity of the vectors: • When σ = 0, applications p and q are never used together. • When σ = 1, applications p and q are always used together, and to the same extent. Clustering Applications • We now have σ(p, q) for every pair of ports • Convert these similarities into distances: • If σ = 0, then d is large; if σ = 1, then d = 0 • Now apply Ward’s hierarchical clustering algorithm Next Stop: Behavioral Web Data (Clicks) Behavioral Web Data Collaborators: • Filippo Menczer (IU, ISI/Torino) • Santo Fortunato (ISI/Torino) • Alessandro Vespignani (IU, ISI/Torino) • Alessandro Flammini (IU) Thanks to my collaborators! Flow Analysis • Filippo Menczer (IU, ISI/Torino) • Alessandro Vespignani (IU, ISI/Torino) Click Analysis • Filippo Menczer (IU, ISI/Torino) • Santo Fortunato (ISI/Torino) • Alessandro Vespignani (IU, ISI/Torino) • Alessandro Flammini (IU) Thank you!